Local Dev Environments For Newbies Part 2: AMP on Windows 7

Previously, we discussed the benefits of installing a local AMP stack (Apache, MySQL & PHP) for the purposes of development and testing, and walked through installing a stack in the Mac environment.  In this post, we will turn our attention to Windows.  (If you have not read Local Dev Environments for Newbies Part 1, and you are new to the AMP stack, you might want to go read the Introduction and Tips sections before continuing with this tutorial.)

Much like with the Mac stack, there are Windows stack installers that will do all of this for you.  For example, if you are looking to develop for Drupal, there’s an install package called Acquia that comes with a stack installer.   There’s also WAMPserver and XAMPP.  If you opt to go this route, you should do some research and decide which option is the best for you.  This article contains reviews of many of the main players, though it is a year old.

However, we are going to walk through each component manually so that we can see how it all works together.

So, let’s get going with Recipe 2 – Install the AMP Stack on Windows 7.

Prerequisites:

Notepad and Wordpad come with most Windows systems, but you may want to install a more robust code editor to edit configuration files and eventually, your code.  I prefer Notepad++, which is open source and provides much of the basic functionality needed in a code editor.  The examples here will reference Notepad++ but feel free to use whichever code editor works for you.

For our purposes, we are not going to allow traffic from outside the machine to access our test server.  If you need this functionality, you will need to open a port in your firewall on port 80.  Be very careful with this option.

As a prerequisite to installing Apache, we need to install the Visual C++ 2010 SP1 Redistributable Package x86.  As a pre-requisite to installing PHP, we need to install the Visual C++ 2008  SP1 Redistributable Package x86.

I create a directory called opt\local in my C drive to house all of the stack pieces.  I do this because it’s easier to find things on the command line when I need to and I like keeping development environment applications separate from Program Files.  I also create a directory called sites to house my web files.

wamp-opt copy

The last two prerequisites are more like common gotchas.  The first is that while you are manipulating configuration and initialization files throughout this process, you may find the Windows default view settings are getting in your way.  If this is the case, you can change it by going to Organize > Folder and search options > View tab.

wamp-windows

This will bring up a dialog which allows you to set preferences for the folder you are currently viewing.  You can select the option to “show hidden files” and uncheck the “hide file extensions” option, both of which make developing easier.

The other thing to know is that in our example, we will work with a Windows 7 installation – a 64-bit operating system.  However, when we get to PHP, you’ll notice that their website does not provide a 64-bit installer.  I have seen errors in the past when a 32-bit PHP installer and a 64-bit Apache version were both used, so we will install the 32-bit versions for both components.

Ok, I think we’re all set.  Let’s install Apache.

Apache

We want to download the .zip file for latest version.  For Windows binaries, I use apachelounge, which builds windows installer files.  For this example we’ll download httpd-2.4.4-win32.zip to the Desktop of our Windows machine.

wamp-apache1

Next, we want to extract files into chosen location for Apache directory, eg c:\opt\local\Apache24.  You can accomplish this a variety of ways but if you have WinZip, you can follow these steps:

  1. Copy the .zip folder to c:\opt\local
  2. Right-click and select “Extract all files”.
  3. Open the extracted folder, right-click on the Apache24 folder and select Cut.
  4. Go back up one directory and right-click to Paste the Apache24 folder, so that it now resides inside c:\opt\local.

No matter what unzip program you use, this is the configuration we are shooting for:wamp-apachedir

This extraction “installs” Apache; there is no installer to run, but we will need to configure a few things.

We want to open httpd.conf: this file contains all of the configuration settings for our web server.  If you followed the directions above, you can find the file in C:\opt\local\Apache24\conf\httpd.conf – we want to open it with our code editor and make the following changes:

1.  Find this line (in my copy, it’s line 37):

ServerRoot “c:/Apache24”

Change it to match the directory where you installed Apache.  In my case, it reads:

ServerRoot “c:/opt/local/Apache24”

You might notice that our slashes slant in the opposite direction from the usual Windows sytax.  In Windows, backslash ( \ ) delineates different directories, but in Unix, it’s forward slash ( / ).  Apache reads the configuration file in the Unix manner, even though we are working in Windows.  If you get a “directory not found” error at any point, check your slashes.

2.  At Line 58, we are going to change the listen command to just listen to our machine.  Change

Listen 80

to

Listen localhost:80

3.  There are 100 lines around 72-172 that all start with LoadModule.  Some of these are comments (they begin with a “#”).  Later on, you may need to uncomment some of these for a certain web program to work, like SSL.  For now, though, we’ll leave these as is.

4.  Next, we want to change our Document Root and the directory directive to the directory which has the web files.  These lines (beginning on line 237 in my copy) read:

DocumentRoot “c:/Apache24/htdocs”

Later, we’ll want to change this to our “sites” folder we created earlier.  For now, we’re just going to change this to the Apache installation directory for testing.  So, it should read:

DocumentRoot “c:/opt/local/Apache24/htdocs”

Save the httpd.conf file.  (In two of our test cases, after saving the file, closing and re-opening, the file appeared unchanged.  If you are having issues, try doing Save As and save the file to your desktop, then drag it into c:\opt\local\Apache24).

Next, we want to test our Apache configuration.  To do this, we open the command line.  In Windows, you can do this by going to the Start Menu, and typing

cmd.exe

in the Search box.  Then, press Enter.  Once you’re in the command prompt, type in

cd \opt\local\Apache24\bin

(Note that the first part of this path is the install directory I used above.  If you chose a different directory to install Apache, use that instead.)  Next, we start the web server with a “-t” flag to test it.  Type in:

httpd –t

If you get a Syntax OK, you’re golden.

wamp-apache4

Otherwise, try to resolve any errors based on the error message. If the error message does not make any sense after checking your code for typos, go back and make sure that your changes to httpd.conf did actually save.

Once you get Syntax OK, type in:

httpd

This will start the web server.  You should not get a message regarding the firewall if you changed the listen command to localhost:80.  But, if you do, decide what traffic you want to allow to your machine.  I would click “Cancel” instead of “Allow Access”, because I don’t want to allow outside access.

Now the server is running.  You’ll notice that you no longer have a C:\> prompt in the Command window.  To test our server, we open a browser and type in http://localhost  – you should get a website with text that reads “It works!”

wamp-apache5

Instead of starting up the server this way every time, we want to install it as a Windows service.  So, let’s go back to our command prompt and press Ctrl+C to stop web server.  You should now have a prompt again.

To install Apache as a service, type:

httpd.exe –k install

You will most likely get an error that looks like this:

wamp-apache6

We need to run our command prompt as an administrator.  So, let’s close the cmd.exe window and go back to our Start menu.  Go to Start > All Programs > Accessories and right-click on Command Prompt.  Select “Run As Administrator”.

(Note: If for some reason you do not have the ability to right-click, there’s a “How-To Geek” post with a great tip.  Go to the Start menu and in the Run box, type in cmd.exe as we did before, but instead of hitting Enter, hit Ctrl+Shift+Enter.  This does the same thing as the right-click step above.)

Click on Yes at the prompt that comes up, allowing the program to make changes.  You’ll notice that instead of starting in our user directory, we are starting in Windows\system32 So, let’s go back to our bin directory with:

cd \opt\local\Apache24\bin

Now, we can run our

httpd.exe –k install

command again, and it should succeed.  To start the service, we want to open our Services Dialog, located in the Control Panel (Start Menu > Control Panel) in the Administrative Tools section.  If you display your Control Panel by category (the default), you click on System & Security, then Administrative Tools.  If you display your control panel by small icon, Administrative Tools should be listed.

Double click on Services.

wamp-apache8

Find Apache2.4 in the list and select it.  Verify that the Startup Type is set to Automatic if you want the Service to start automatically (if you would prefer that the Service only start at certain times, change this to Manual, but remember that you have to come back in here to start it).  With Apache2.4 selected, click on Start Service in the left hand column.

wamp-apacheserv

Go back to the browser and hit Refresh to verify that everything is still working.  It should still say “It Works!”  And with that affirmation, let’s move to PHP.

 PHP

(Before installing PHP, make sure you have installed the Visual C++ 2008 Redistributable Package from the prerequisite section.)

For our purposes, we want to use the Thread Safe .zip from the PHP Downloads page.    Because we are running PHP under Apache, but not as a CGI, we use the thread safe version.  (For more on thread safe vs. non-thread safe, see this Wikipedia entry or this stackoverflow post)

PHP Download

Once you’ve downloaded the .zip file, extract it to your \opt\local directory.  Then, rename the folder to simply “php”.  As with Apache24, extracting the files does the “install”, we just need to configure everything to run properly.  Go to the directory where you installed PHP, (in my case, c:\opt\local\php) and find php.ini-development.

Make a copy of the file and rename the copy php.ini (this is one of those places where you may want to set the Folder and search options if you’re having problems).

PHP ini file

Open the file in Notepad++ (or your code editor of choice).  Note that here, comments are preceded by a “;” (without quotes) and the directories are delineated using the standard Windows format, with a “\”.  Most of the document is commented out, and includes a large section on recommended settings for production and development, so if you’re not sure of the changes to make you can check in the file (in addition to the PHP documentation).  For this tutorial, we want to make the following changes:

1.  On line 708, uncomment (remove semi-colon) include_path under “Windows” and make sure it matches the directory where you installed PHP (if the line numbers have changed, just search for Paths and Directories).

wamp-php2
2.  On line 730, uncomment the Windows directive for extension_dir and change extension_dir to match c:\opt\local\php\ext

wamp-php3
3.  Beginning on Line 868, in the Windows Extensions section, uncomment (remove the semi-colon) from the following lines (they are not right next to each other, they’re in a longer list, but we want these three uncommented):

extension=php_mysql.dll
extension=php_mysqli.dll
extension=php_pdo_mysql.dll

Save php.ini file.

You may want to double-check that the .dll files we enabled above are actually in the c:\opt\local\php\ext folder before trying to run php, because you will see an error if they are not there.

Next, we want to add the php directory to our path environment variables.  This section is a little tricky; be *extremely* careful when you are making changes to system settings like this.

First, we navigate to the Environment variables by opening the Control Panel and going to System & Security > System > Advanced System Settings > Environment Variables.

In the bottom scroll box, scroll until you find “Path”, click on it, then click on Edit.

wamp-php6

Append the following to the end of the Variable Value list (the semi-colon ends the previous item, then we add our installation path).

;c:\opt\local\php

wamp-php7

Click OK and continue to do so until you are out of the dialog.

Lastly, we need to add some lines to the httpd.conf so that Apache will play nice with PHP.  The httpd.conf file may still be open in your text editor.  If not, go back to c:\opt\local\Apache24\conf and open it.  At the bottom of this file, we need to add the following:

LoadModule php5_module "c:/opt/local/php/php5apache2_4.dll"
AddHandler application/x-httpd-php .php
PHPIniDir "c:/opt/local/php"

This tells Apache where to find php and loads the module needed to work with PHP.  (Note:  php5apache2_4.dll must be installed in the directory you specified above in the LoadModule statement.  It should have been extracted with the other files, but to download the file if it is not there, you can go to the apachelounge additional downloads page.)

While we’re in this file, we also want to tell Apache to look for an index.php file.  We’ll need this for testing, but also for some content management systems.  To do this, we change the DirectoryIndex directive on line 271.  It should look like

<IfModule dir_module>
  DirectoryIndex index.html

We want to change the DirectoryIndex line so it reads

DirectoryIndex index.php index.html

Save httpd.conf.

Before we restart Apache to pick up these changes, we’re going to do one last thing.  To test our php, we want to create a file called index.php with the following text inside:

<!--?php <span class="hiddenSpellError" pre="php "-->phpinfo() ?&gt;

Save it to c:\opt\local\Apache24\htdocs

wamp-php5

Restart Apache by going back to the Services dialog.  (If you closed it, it’s Control Panel > System & Security > Administrative Tools > Services).  Click on Apache2.4 and then click on Restart.

wamp-php8

If you get an error, you can always go back to the command line, navigate to c:\opt\local\Apache24\bin and run httpd.exe –t again.  This will check your syntax, which is most likely to the be problem.  (This page is also helpful in troubleshooting PHP 5.4 and Apache if you are having issues.)

Open a browser window and type in http://localhost – instead of “It Works!” you should see a list configuration settings for PHP.  (In one of our test cases, the tester needed to close Internet Explorer re-open it for this part to work.)

wamp-php9

Now, we move to the database.

MySQL

To install MySQL, we can follow the directions at the MySQL site.  For the purposes of this tutorial, we’re going to use the most recent version as of this writing, which is 5.6.11.  To download the files we need, we go to the Community Server download page.

MySQL Downloads

Again, we can absolutely use the installer here, which is the first option.  The MySQL installers will prompt you through the setup, and this video does a great job of walking through the process.

But, the since the goal of this tutorial is to see all the parts, I’m going to run through the setup manually.  First, we download the .zip archive.  Choose the .zip file which matches your operating system; I will choose 64-bit (there’s no agreement issue here).  Extract the files to c:\opt\local\mysql.  We do this in the same way we did the Apache24 files above.

Since we’re installing to our opt\local drive, we need to tell MySQL to look there for the program files and the data.  We do this by setting up an option file.  We can modify a file provided for us called my-default.ini.  Change the name to my.ini and open it with your code editor.

wamp-mysql2

In the MySQL config files, we use the Unix directory “/” again, and the comments are again preceded by a “#”.  So, to set our locations, we want to remove the # from the beginning of the basedir and datadir lines, and change to our installation directory as shown below.

wamp-mysql3

Then save my.ini.

As with Apache, we’re going to start MySQL for the first time from the command line, to make sure everything is working ok.  If you still have it open, navigate back there.  If not, remember to select the Run As Administrator option.

From your command prompt, type in

cd \opt\local\mysql\bin
mysqld --console

You should see a bunch of statements scroll by as the first database is created.  You may also get a firewall popup.  I hit Cancel here, so as not to allow access from outside my computer to the MySQL databases.

Ctrl+C to stop the server.  Now, let’s install MySQL as a service.  To do that, we type the command:

mysqld --install

wamp-mysql4

Next, we want to start the MySQL service, so we need to go back to Services.  You may have to Refresh the list in order to see the MySQL service.  You can do this by going to Action > Refresh in the menu.

wamp-mysql5

Then, we start the service my clicking on MySQL and clicking Start Service on the left hand side.

wamp-mysql6

 

One thing about installing MySQL in this manner is that the initial root user for the database will not have a password.  To see this, go back to your command line.  Type in

mysql -u root

This will open the command line MySQL client and allow you to run queries.  The -u flag sets the user, in this case, root.  Notice you are not prompted for a password.  Type in:

select user, host, password from mysql.user;

This command should show all the created user accounts, the hosts from which they can log in, and their passwords.  The semi-colon at the end is crucial – it signifies the end of a SQL command.

wamp-mysql8

Notice in the output that the password column is blank.  MySQL provides documentation on how to fix this on the Securing the Initial Accounts documentation page, but we’ll also step through it here.  We want to use the SET PASSWORD command to set the password for all of the root accounts.

Substituting the password you want for newpwd (keep the single quotes in the command), type in

SET PASSWORD FOR 'root'@'localhost' = PASSWORD('newpwd');
SET PASSWORD FOR 'root'@'127.0.0.1' = PASSWORD('newpwd');
SET PASSWORD FOR 'root'@'::1' = PASSWORD('newpwd');

You should get a confirmation after every command.  Now, if you run the select user command from above, you’ll see that there are values in the password field, equivalent to encrypted versions of what you specified.

A note about security: I am not a security expert and for a development stack we are usually less concerned with security.  But it is generally not a good idea to type in plain text passwords in the command line, because if the commands are being logged you’ve just saved your password in a plain text file that someone can access.  In this case, we have not turned on any logging, and the SET PASSWORD should not store the password in plain text.  But, this is something to keep in mind.

As before with Mac OS X, we could stop here.  But then you would have to administer the MySQL databases using the command line.  So we’ll install phpMyAdmin to make it a little easier and test to see how our web server works with our sites folder.

phpMyAdmin

Download the phpmyadmin.zip file from the phpmyadmin page to the sites folder we created all the way at the beginning.  Note that this does *not* go into the opt folder.

Extract the files to a folder called phpmyadmin using the same methods we’ve used previously.

wamp-phpmyadmin

Since we now want to use our sites folder instead of the default htdocs folder, we will need to change the DocumentRoot and Directory directives on lines 237 and 238 of our Apache config file.  So, open httpd.conf again.

We want to change the DocumentRoot to sites, and we’re going to set up the phpMyAdmin directory.

Change Document Root and Directory

Save the httpd.conf file.  Go back to Services and Restart the Apache2.4 service.

We will complete the configuration through the browser.  First, open the browser and try to navigate to http://localhost again.  You should get a 403 error.

wamp-phpmyadmin4

Instead, navigate to http://localhost/phpmyadmin/setup

wamp-phpmyadmin5

Click on the New Server button to set up a connection to our MySQL databases.  Double check that under the Basic Settings tab, the Server Name is set to localhost, and then click on Authentication.  Verify that the type is “cookie”.

At the bottom of the page, click on Save.  Now, change the address in the browser to http://localhost/phpmyadmin and log in with the root user, using the password you set above.

And that’s it.  Your Windows AMP stack should be ready to go.

In the next post, we’ll talk about how to install a content management system like WordPress or Drupal on top of the base stack.  Questions, comments or other recipes you would like to see?  Let us know in the comments.

 


Local Dev Environments For Newbies Part 1: AMP on Mac OSX

There are many cases where having a local development environment is helpful and it is a relatively straightforward thing to do, even if you are new to development.  However, the blessing and the curse is that there are many, many tutorials out there attempting to show you how.  This series of posts will aim to walk through some basic steps with detail, as well as pass on some tips and tricks for setting up your own local dev box.

First, what do I mean by a local development environment?  This is a setup on your computer which allows you to code and tweak and test in a safe environment.  It’s a great way to hammer on a new application with relatively low stakes.  I am currently installing dev environments for two purposes: to test some data model changes I want to make on an existing Drupal site and to learn a new language so I can contribute to an application.  For the purposes of this series, we’re going to focus on the AMP stack – Apache, MySQL and PHP – and how to install and configure those systems for use in web application development.

Apache is the web server which will serve the pages of your website or application to a browser.  You may hear Apache in conjunction with lots of other things – Apache Tomcat, Apache Solr – but generally when someone references just Apache, it’s the web server.  The full name of the project is the Apache HTTP Server Project.

PHP is a scripting language widely used in web development.  MySQL is a database application also frequently used in web development.  ”Stack” refers to the combination of several components needed to run a web application.  The AMP stack is the base for many web applications and content management systems, including Drupal and WordPress.

You may have also seen the AMP acronym preceded by an L, M or W.  This merely stands for the operating system of choice – Linux, Mac or Windows.  This can also refer to installer packages that purport to do the whole installation for you, like WAMP or MAMP.  Employing the installer packages can be useful, depending on your situation and operating system.  The XAMPP stack, distributed by Apache Friends, is another example of an installer package designed to set up the whole stack for you.  For this tutorial though, we’ll step through each element of the stack, instead of using a stack installer.

So, why do it yourself if there are installers?  To me, it takes out the mystery of how all the pieces play together and is a good way to learn about what’s going on behind the scenes.  When working on Windows, I will occasionally use a .msi installer for an individual component to make sure I don’t miss something.  But installing and configuring each component individually is actually helpful.

Tips

Before we begin, let’s look at some tips:

  • You will need administrative rights to the computer on which you’re installing.
  • Don’t be afraid of the command line.  There are lots of tutorials around the web on how to use the basic commands – for both Mac (based on UNIX) and Windows.  But, you don’t need to be an expert to set up a dev environment.  Most tutorials give the exact commands you need.
  • Try, if possible, to block off a chunk of time to do this.  Going through all the steps may take awhile, from an hour to an afternoon, especially if you hit a snag.  Several times during my own process, I had to step away from it because of a crisis or because it was the end of the day.  When I was able to come back later, I had some trouble remembering where I left off or the configuration options I had chosen.  If you do have to walk away, write down the last thing you did.
  • When you’re looking for a tutorial, Google away.  Search for the elements of your stack plus your OS, like “Apache MySQL PHP Mac OSX”.  You’ll find lots, and probably end up referencing more than one.  Use your librarian skills: is the tutorial recent?  Does it appear to be from a reputable source?  If it’s a blog, are there comments on the accuracy of the tutorial?  Does it agree with the others you’ve seen?
  • Once you’ve selected one or two to follow, read through the whole tutorial one time without doing anything.  Full disclosure: I never do this and it always bites me.

Let’s get going with Recipe 1 – Install the AMP Stack on Mac OS X

Install the XCode Developer Tools

First, we install the developer tools for XCode.  If you have Mac 10.7 and above, you can download the XCode application from the App Store.  To enable the developer tools, open XCode, go to the XCode menu > Preferences > Downloads tab, and then click on “Install” next to the developer tools.  This tutorial on installing Ruby by Moncef Belyamani has good screenshots of the XCode process.

If you have Snow Leopard (10.6) or below, you’ll need to track down the tools on the Apple Developer Downloads Page.  You will need to register as a developer, but it’s free.  Note:  you can get pretty far in this process without using the XCode command line tools, but down the road as you build more complicated stacks, you’ll want to have them.

Configure Apache and PHP

Next we need to configure Apache and PHP.  Note that I said “configure”, not “install”.  Apache and PHP both come with OS X, we just need to configure them to work together.

Here’s where we open the Terminal to access the command line by going to Applications > Utilities > Terminal.

Open Terminal

Once Terminal is open, a prompt appears where you can type in commands.  The ” ~ ” character indicates that you are at the “home” directory for your user.  This is where you’ll do a lot of your work.  The “$” character delineates the end of the prompt and the beginning of your command.

terminalprompt

Type in the following command:

cd /etc/apache2

“cd” stands for “change directory”.  This is the equivalent of double-clicking on etc, then apache2, if you were in the Finder (but etc is a hidden folder in the Finder).  From here, we want to open the necessary file in an editor.  Enter the following command:

sudo nano httpd.conf

“sudo” elevates your permission to administrator, so that you can edit the config file for Apache, which is httpd.conf.  You will need to type in your administrator password.  The “nano” command opens a text editor in the Terminal window.  (If you’re familiar with vi or emacs, you can use those instead.)

nano

The bottom of your window will show the available commands.  The “^” stands for the Control key.  So, we want to search for the part to change, we press Control + W.  Enter php and press Enter.  We are looking for this line:

#LoadModule php5_module        libexec/apache2/libphp5.so

The “#” at the beginning of this line is a comment, so Apache ignores the line.  We want Apache to see the line, and load the php module.  So, change the text by removing the #:

LoadModule php5_module        libexec/apache2/libphp5.so

Save the file by press Control + O (nano calls this “WriteOut”) and press Enter next to the file name.  The number of lines written displays at the bottom of the window.  Press Control + X to exit nano.

Next, we need to start the Apache server.  Type in the following command:

sudo apachectl start

Now, go to your browser and type in http://localhost.  You should see “It Works!”Apache Browser Test

Apache, as mentioned before, serves web files from a location we designate.  By default, this is /Library/Webserver/Documents.  If you have Snow Leopard (10.6) or below, Apache also automatically looks to username/sites, which is a convenient place to store and work with files.  If you have OS 10.7 or above, creating the Sites folder takes a few steps.  On 10.7, go to System Preferences > Sharing and click on Web Sharing.  If there’s a button that says “Create Personal Web folder”, it has not been created, go ahead and click that button.  If it says, “Open Personal Website folder”, you’re good to go.

On 10.8, the process is a little more involved.  First, go to the Finder, click on your user name and create your sites folder.

sites

Next, we need to open the command line again and create a .conf file for that directory, so that Apache knows where to find it.  Type in these commands:

cd /etc/apache2/users
ls

The ls at the end will list the directory contents.  If you see a file that’s yourusername.conf (ie, mfrazer.conf) in this directory, you’re good to go.  If you don’t, it’s easy to create one.  Type the following command:

sudo nano yourusername.conf

So, mine would be sudo nano mfrazer.conf.  This will create the file and take you into a text editor.  Copy and past the following, making sure to change YOURUSERNAME to your user name.

<Directory "/Users/YOURUSERNAME/Sites/">
  Options Indexes MultiViews
  AllowOverride None
  Deny from all
  Allow from localhost
</Directory>

The first directive, Options, can have lots of different…well, options.  The ones we have here are Indexes and MultiViews.  Indexes means that if a browser requests a directory and there’s no index.html or index.php file, it will serve a directory listing.  Multi-Views means that browsers can request the content in a different format if it exists in the directory (ie, in a different language).  AllowOverride determines if an .htaccess file elsewhere can to override the configuration settings.  For now, None will indicate that no part can be overridden.  For Drupal or other content management systems, it’s possible we’ll want to change these directives, but we’ll cover that later.

The last two lines indicate that traffic can only reach this directory from the local machine, by typing http://localhost/~username in the browser.  For more on Apache security, see the Apache documentation.  If you would like to set it so that other computers on your network can also access this directory, change those last two lines to:

Order allow,deny
Allow from all

Either way, press Control + O to save the file and Control + X to exit.  Restart Apache for the changes to take effect using this command:

sudo apachectl restart

You may also be prompted at some point by OS X to accept incoming network connections for httpd (Apache); I would deny these as I only want access to my directory from my machine, but it’s up to you depending on your setup.

We’ll test this setup with php in the next step.

Test PHP

If you want to check php, you can create a new text document using your favorite text editor.  Type in:

<?php phpinfo(); ?>

Save the file as phpinfo.php in your username/sites directory (so for me, this is mfrazer > Sites)

Then, point your browser to http://localhost/~yourUserName/phpinfo.php  You should see a page of information regarding PHP and the web server, with a header that looks like this:

PHP Info Header

 

 

MySQL

Now, let’s install MySQL.  There’s two ways to do this.  We could go to the MySQL downloads page and use the installers.  The fantastic tutorials at Coolest Guy on the Planet both recommend this, and it’s a fine way to go.

But we can also use Homebrew, mentioned previously on this blog, which is a really convenient way to do things as long as we’re already using the command line.

First, we need to install homebrew.  Enter this at the command prompt:

ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)"

Next, type in

brew doctor

If you receive the message: “Your system is raring to brew.” You’re ready to go.  If you get Warnings, don’t lose heart.  Most of them tell you exactly what you need to do to move forward.  Correct the errors and type in brew doctor again until you’re raring to go.  Then, type in the following command:

brew install mysql

That one’s pretty self-explanatory, no?  Homebrew will download and install MySQL, as of this writing version 5.6.10, but pay attention to the download to see the version – it’s in the URL.  After the installation succeeds, Homebrew will give some instructions on finishing the setup, including the commands we discuss below.

I’m going to pause for a second here and talk a little about permissions and directories.  If you get a “permission denied” error, trying running the command again using “sudo” at the beginning.  Remember, this elevates your permission to the administrator level.  Also, if you get a “directory does not exist” error, you can easily create the directory using “mkdir”.  Before we move on, let’s try to check for a directory you’re going to need coming up.  Enter:

cd /usr/local/var

If you are successfully able to change to that directory, great. If not, type in

sudo mkdir /usr/local/var

to create it. Then, let’s go back to our home directory by typing in

cd ~

Now, let’s continue with our procedure. First, we want to set up the databases to run with our user account.  So, we type in the following two commands:

unset TMPDIR
mysql_install_db --verbose --user=`whoami` --basedir="$(brew --prefix mysql)" --datadir=/usr/local/var/mysql --tmpdir=/tmp

The second command here installs the system databases; ‘whoami’ will automatically replace with your user name, so the above command should work verbatim.  But it also works to use your user name, with no quotes, (ie –user=mfrazer).

Next, we want to run the “secure installation” script. This helps you set root passwords without leaving the password in plain text in your editor. First we start the mysql server, then we run the installation scripts and follow the prompts to set your root password, etc:

mysql.server start
sudo /usr/local/Cellar/mysql/5.6.10/bin/mysql_secure_installation

After the script is complete, stop the mysql server.

mysql.server stop

Next, we want to set up MySQL so it starts at login. For that, we run the following two commands:

ln -sfv /usr/local/opt/mysql/*.plist ~/Library/LaunchAgents
launchctl load ~/Library/LaunchAgents/homebrew.mxcl.mysql.plist

The ln command, in this case, places a symbolic link to any .plist files in the mysql directory into the LaunchAgents directory.  Then, we load the plist using launchctl to start the server.

One last thing – we need to create one more link to the mysql.sock file.

cd /var/mysql/
sudo ln -s /tmp/mysql.sock

This creates a link to the mysql.sock file, which MySQL uses to communicate, but which resides by default in a tmp directory.  The first command places us in the directory where we want the link (remember, if it doesn’t exist, you can use “sudo mkdir /var/mysql/” to create it) and the second creates the link.

MySQL is ready to go!  And, so is your AMP stack.

But wait, there’s more…

One optional tool to install is phpMyAdmin.  This tool allows you to interact with your database through your browser so you don’t have to continue to use the command line.  I also think it’s a good way to test if everything is working correctly.

First, let’s download the necessary files from the phpMyAdmin website.  These will have a .tar.gz extension.  Place the file in your Sites directory, and double-click to unzip the file.

Rename the folder to remove the version number and everything after it.  I’m going to place the next steps below, but the Coolest Guy on the Planet tutorial referenced earlier does a good job of this step for OS 10.8 (just scroll down to phpMyAdmin) if you need screenshots.

Go to the command line and navigate to your phpMyAdmin directory.  Make a directory called config and change the permissions so that the installer can access the file.  This should looks something like:

cd ~/username/sites/phpMyAdmin
mkdir config
chmod o+w config

Let’s take a look at that last command: chmod changes the permissions on a file.  The o+w sets it so users who are not the directory’s owner can write to the file.

Now, in your browser, go to http://localhost/~username/sites/phpmyadmin/setup and follow these steps:

  1. Click on New Server (button on bottom)
  2. Click on Authentication tab, and enter the root password in the password field.
  3. Click on Save.
  4. Click on Save again on the main page.

Once the setup is finished, go to the Finder and move the config.inc.php file from the config directory into the main phpmyadmin directory and delete the config directory.  So in the end, it looks like this:

phpmyadminlast

Now, go to http://localhost/~username/sites/phpmyadmin in your browser and login with the root account.

You are ready to go!  In future parts of this series, we’ll look at building the AMP stack on Windows and adding Drupal or WordPress on top of the stack.  We will also look at maintaining your environment, as the AMP stack components will need updating occasionally.  Any other recipes you’d like to see?  Do you have questions? Let us know in the comments.

The following tutorials and pages were incredibly useful in writing this post.  While none of these tutorials are exactly the same as what we did here, they all contain useful pieces and may be helpful if you want to skip the explanation and just get to the commands:


Batch Renaming the Easy Way

Everyone occasionally dives right into a problem without researching (gasp!) the best solution.  For me, this once meant manually renaming hundreds of files and moving them into individual folders in preparation for upload to a digital repository.  Then finally a colleague said to me, and rightly so, “Are you crazy?  There’s scripts to do that for you.”

In my last post, I discussed file naming conventions and the best methods to ensure future access and use for files.  However, as librarians and archivists, we don’t always create the files we manage.  Donors bring hard drives and students bring USB drives and files get migrated…etc, etc.  Renaming existing files to bring them in line with our standards is often a daunting prospect, but there are lots of methods available to save time and sanity.

In this post, I’ll review a few easy methods for batch renaming files:

The first two methods do not require any knowledge of coding; the last is slightly more advanced.  There are some caveats: if you are an experienced developer, it’s likely that you know a more efficient way.  I also tried to avoid any third-party tools specifically touted as renaming applications, as I have not used them and therefore cannot recommend which is best.  Lastly, while Photoshop and other photo editing software may help with this when working with image files, the options listed below should work with all file types.

In my example, I am using a set of 43 images waiting for upload to our digital library.  The files originated on a faculty member’s camera, so the names are in the following format:

DSCN2956.jpg
DSCN2957.jpg
DSCN2958.jpg
...

The images are of the Olympic Stadium in Beijing, China, and I would like the file names to reflect that, i.e. Beijing-OlympicStadium-01.jpg

Mac Automator

One of the features included in Mac OS X (10.4 and above) is Automator, the “personal automation assistant”, according to Apple Support.  The tool allows you to define a group of actions to take, automatically, on a given set of triggers.  For example, after discovering this tool I created a script which, when prompted, quickly grabs a screenshot and save it as a jpeg in a folder I specified.

For this post, let’s step through using the tool to batch re-name files.  First, I found a tutorial online.  These are everywhere, but specifically, I looked at “10 Awesome Uses for Automator Explained” by Matt Reich.  Reich gives a good succinct tutorial, placed in the context of personal photos.  We’re going to make a few changes in our steps, place it in the context of a digital collection and walk a little more slowly through the process.  I’ll be using Mac OS 10.8 in the steps and screenshots.

1.  Go to Finder, Open Applications and double-click on Automator.

2.  We’re going to create an Application.  Reich uses a Folder Action, which means that you would copy the items into the folder which would trigger the rename.  That approach makes sense as you move personal photos from a camera into the same Photos folder over and over again (in fact, I plan to use it myself).  However, in working with existing digital files that we just want to rename, which may need to live in many different folders, the Application is a more direct approach.  This will allow us to act on the files in place.  So, click on the Application Icon, and click on Choose.

3.  Now we need to add some Actions.  In the Library along the far left-hand pane, select “Files & Folders”.  The middle pane will now show all of the options for acting on Files & Folders.

Automator User Interface

4.  Click on “Rename Finder Items” and drag it to the large empty pane on the right.

5.  The system will prompt you as to whether or not you want to “Copy the Finder items.”  For this example, I opted not to, but if you prefer to make a copy, click on Add.

Prompt to Add Copy Items

6.  The window you’ve dragged over will default to settings for “Add Date or Time”.  We want to do this eventually, but let’s start with changing the name and adding a sequence number.  In the drop-down menu at the top of the window, change “Add Date or Time” to “Make Sequential”

Select Make Sequential Option

7.  Select the radio button next to “new name”, but don’t enter a default item name.

Set naming parameters

8.  Set the rest of the parameters.  For my purposes, I placed the number after the name, used a dash to separate, and used a three digit number set.

9.  Click on “Options” at the bottom, and select “Show this action when the workflow runs.”  The application will then prompt you to fill in the item name at runtime.

A note about the date:  In cases where you’d like to append a system date (e.g. Created, Modified, Last Opened or Current), you would use “Add Date or Time”.  To match our file naming conventions we have already established, we’ll want to select non-space and non-special characters as our separators, use Year Month Day as the format, and click the checkbox to “Use Leading Zeros”.  I would use a dash to separate the name from the date and no separator for Year Month Day.  Look at the example provided at the bottom to make sure the date looks correct.

However, in my case, I’m working with a set of files where the system dates aren’t much use to me.  I want to know the date of the photo; this is especially likely if I were working with scanned files from a historical period. So, I’m going to use “Add Text” instead, and append my own date.

10.  Repeat step 4: drag “Rename Finder Items” to the right pane.  This time, select “Add Text” from the dropdown.

11. Leave the “Add Text” field blank, click on “Options” and select “Show this action when the workflow runs.”  Then, when you run the application you’ll be prompted to add text and you can append 1950, for example, to the file name.

12.  Click on File > Save As, and save your Application in a location where it is easy to drag and drop files, like the Desktop.  For my example, I called the application BatchFileRename.

13.  Navigate to the folder containing the files you want to rename, and select them all (can use Cmd+A).  Drag the whole selection to the Automator file you just created, fill in the prompts and click “Continue”.

Automator-12

Prompt 1 for Automator

Text Prompt for Automator

You now have a set of renamed files.  Note that the script did not modify the “Date Modified” value for the file.  The script is now set up for future projects as well; any time you want to rename files, just repeat step 13.

One thing you might notice is that the date is appended after the index number.  If you wanted it before the index number, we would append it to the “item name” field in the Make Sequential box and skip the Add Text section all together.

automatorresult

A note from a paranoid librarian:  I copied this set of files from its original location to do this example, so that if something went horribly wrong, I’d still have the originals.  Until you get comfortable with batch renaming you might consider doing the same.

There are lots of other uses for the Automator tool, check out “10 Awesome Uses for Automator Explained” by Matt Reich for more ideas, or do a search for Automator tutorials in your favorite search engine.

Windows – Notepad++ Column Editor

I started out hoping to accomplish this task the same way I did in the Mac OS X – with no outside tools.  However, the default renaming function in Windows lacks a few things for our purposes.  If you select a group of files, right-click and select “Rename”, you can rename all of the files at once.

Windows default process

However, the resulting file names do not conform to our earlier standards.  They contain spaces and special characters and the index number is not a consistent length, which can cause sorting headaches.

After some searching, I came across this stackoverflow page, which contained a very useful command:

dir /b *.jpg >file.bat

This command allows me to dump a directory’s files into a text file which I can edit into a series of rename commands to be run as a batch file.  The editing of the text file is the most time-consuming part, but using the Column Editor in Notepad++ speeds up the process considerably.  (This is where we break the “no third-party tool” convention.  Notepad++ is a free text editor I use frequently for writing code and highly recommend, though this process may work with other text editors.)

1.  Open a command prompt.

2.  Navigate to the directory which contains the files that need to be renamed.

3.  The command we found above is composed of several parts.  “dir” lists the directory contents, “/b” indicates to only list the filenames, “*.jpg” means to grab only the jpg files, and “>file.bat” directs the output to a file called file.bat.  We are going to keep everything the same except change the name of our output file.

dir /b *.jpg >rename.bat

Command Line

4.  In Windows Explorer, navigate to the directory and find the file you just created.  Right click on it and select Edit with Notepad++ (or Open With > Your Text Editor).

Windows Edit With Notepad++

5.  Put the cursor before the first letter in the first line, and open the Column Editor (Edit > Column Editor or Alt+C).

Open Column Editor

6.  This tool allows you to assign the same character to every line of text in the same space.  We want to insert the beginning of the Windows rename command for each line.  So, in the “text to insert” box, we type:

rename "

and click OK.

Column Editor

7.  Open the editor again to add the portion of the rename command which goes after the old filename.  Here is where we’ll designate our new name, again using the “text to insert” box.  I typed:

" "Beijing-OlympicStadium-

(Note, if you are using file names of varying length, move to the column after the longest file name, then use Find & Replace at the end of the process to remove the extra spaces.)

8.  Next, let’s append an index before the file extension.  Open the Column Editor again and this time, select the number option.  Start at 1, increment by 1, and check the leading zeros box. Click Ok.

Column Editor Insert Number

9.  Last, append the file extension and end the command for each line.  Using the Column Editor’s “text to insert” box one more time, add:

.JPG"

10. The Column Editor adds one extra line at the bottom.  Scroll down and delete it before saving the file.

Column Editor Extra Line

11. Save the file and go back to the command prompt. (If you closed it, re-open it and navigate back to the directory before proceeding.)

12.  Type in the full name of the batch file so it will execute, i.e.

rename.bat

You’ll see the rename commands go by, and the files will each have a new name.  Again, this doesn’t appear to affect the Date Modified on the file.

Windows – Batch File with Loop

It is possible to write your own batch file that will loop through the files in question and rename them.  I have never written my own batch file, so in the interest of researching this post, I decided to give it a shot.  There is lots of documentation available online to help in this effort.  I consulted Microsoft’s documentation, DOS help documentation, and batch file examples (such as this stackoverflow post and a page on OhioLINK’s Digital Resource Management Committee wiki, which focuses preparing files for DSpace batch upload).

A batch file just groups a number of Windows commands together in one file and executes them when the batch file is run, as we saw in our previous example. But, instead of writing the specific rename commands one by one using a text editor, a batch file can also be used to generate the commands on the fly.  Save the following code to a file, place it in the same directory with the set of files and then double click to run it.  Caveat: test this with sample files before you use it!  I have tested on a few directories, but not extensively.

First, we use @echo off to stop the batch commands from printing to the command line window.

@echo off

Then, we set EnableDelayedExpansion so that our index counter will work (has to do with evaluating the variable at execution).  This is why when you see i in the loops, it is written !i! instead of %i% used for other variables.

@setlocal enabledelayedexpansion

Next, I set three prompts to ask the user for some information about the renaming. What’s the root name we want to use? What’s the file extension? How many files are there? (Note, this will only work for under 1000 files).  The “/p” flag assigns the response to the prompt to a variable (r, e and n, respectively).  When we reference these variables later, we’ll use the syntax %r% %e% and %n%.

set /p r=Enter name root: 
set /p e=Enter file extension (ie .jpg .tif): 
set /p n=More than one hundred files? (y/n):

Next, we set the index counter, which allows to add an incrementing index to our filenames.

set /a "i = 1"

If there are less than 100 files, we only need one leading zero in the index for our first ten files, and none for the remaining. If there are more than 100, obviously we’ll want a three digit index. So, the following if statement allows us to fork to one of two loops – for two digits or three digits.

if %n%==y (GOTO three) else GOTO two

Our first segment handles three digit indexes for more than 100 files.  %%v is the temporary variable that holds each item as we iterate through the loop one time. *%e% represents a wildcard plus the extension given by the user. So, if the user enters .jpg, we want to select *.jpg, or all files with a .jpg extension. Everything that follows “do” is a command.

:three
for %%v in (*%e%) do (

First, we want to see if, based on the index counter i, we need leading zeros. If i is less than ten, we want two leading zeros. If it’s less than 100, we want one leading zero. This affects the renaming statement that gets applied. All of the rename statements will rename the file currently in %%v to the root name (represented by %r%), followed by a hyphen, the correct number of leading zeros, the index number (represented by !i!) and the file extension (represented by %e%).

if !i! lss 10 (
rename %%v %r%-00!i!%e%
) else (
if !i! lss 100 (
rename %%v %r%-0!i!%e%
) else (
rename %%v %r%-!i!%e%
)
)

Before we exit the loop, we want to increment the index to use with the next file. And, lastly, we need to add a “goto done” statement, so that we don’t execute the “two” segment.

set /a "i = i + 1"
)
goto done

The “two” section is the basically the same, except that we only need two digit indexes since there are less than 100 files.

:two
for %%v in (*%e%) do (
if !i! lss 10 (
rename %%v %r%-0!i!%e%
) else (
rename %%v %r%-!i!%e%
)
set /a "i = i + 1"
)

We end with our “done” label, which marks the exit point.

:done

Here is the code as a whole:


@echo off
@setlocal enabledelayedexpansion
set /p r=Enter name root: 
set /p e=Enter file extension (ie .jpg .tif): 
set /p n=More than one hundred files? (y/n): 
set /a "i = 1"
if %n%==y (GOTO three) else GOTO two
:three
for %%v in (*%e%) do ( 
 if !i! lss 10 (
 rename %%v %r%-00!i!%e%
 ) else (
 if !i! lss 100 (
 rename %%v %r%-0!i!%e%
 ) else (
 rename %%v %r%-!i!%e%
 )
 )
 set /a "i = i + 1"
)
goto done
:two
for %%v in (*%e%) do (
 if !i! lss 10 (
 rename %%v %r%-0!i!%e%
 ) else (
 rename %%v %r%-!i!%e%
 )
 set /a "i = i + 1"
 )
:done

I saved the file as BatchRename.bat, and then copied it to my test directory. Double click on the .bat file to open it. Enter the prompts and the batch file takes care of the rest.

batch-1
batch-2

The files are renamed and again, the Date Modified field was not changed by this action.

 

Conclusion

Of the three methods, I slightly prefer the Automator method, because of its simplicity and ability to be re-used: once the application is created it can be used over and over again with different sets of files.  The batch file for Windows is similar in that it can be re-used once created, but does require some knowledge of coding concepts.  With the Notepad++ method, we have simplicity, but you’ll need to step through the file editing with each new set.  I love the Column Editor, however; the Insert Number function is incredibly useful for indexing files in file names without the pesky Window parentheses.

All of the methods are quick and easy ways to rename a large set of files.  And from personal experience, I will attest that all are preferable to doing it manually.

I’m curious to hear our readers’ thoughts – feel free to leave questions and other recommendations in the Comments section below.

 

 

 


An Elevator Pitch for File Naming Conventions

As a curator and a coder, I know it is essential to use naming conventions.  It is important to employ a consistent approach when naming digital files or software components such as modules or variables. However, when a student assistant asked me recently why it was important not to use spaces in our image file names, I struggled to come up with an answer.  “Because I said so,” while tempting, is not really an acceptable response.  Why, in fact, is this important?  For this blog entry, I set out to answer this question and to see if, along the way, I could develop an “elevator pitch” – a short spiel on the reasoning behind file naming conventions.

The Conventions

As a habit, I implore my assistants and anyone I work with on digital collections to adhere to the following when naming files:

  • Do not use spaces or special characters (other than “-” and “_”)
  • Use descriptive file names.  Descriptive file names include date information and keywords regarding the content of the file, within a reasonable length.
  • Date information is the following format: YYYY-MM-DD.

So, 2013-01-03-SmithSculptureOSU.jpg would be an appropriate file name, whereas Smith Jan 13.jpg would not.  But, are these modern practices?  Current versions of Windows, for example, will accept a wide variety of special characters and spaces in naming files, so why is it important to restrict the use of these characters in our work?

The Search Results

A quick Google search finds support for my assertions, though often for very specific cases of file management.  For example, the University of Oregon publishes recommendations on file naming for managing research data.  A similar guide is available from the University of Illinois library, but takes a much more technical, detailed stance on the format of file names for the purposes of the library’s digital content.

The Bentley Historical Library at University of Michigan, however, provides a general guide to digital file management very much in line with my practices: use descriptive directory and file names, avoid special characters and spaces.  In addition, this page discusses avoiding personal names in the directory structure and using consistent conventions to indicate the version of a file.

The Why – Dates

The Bentley page also provides links to a couple of sources which help answer the “why” question.  First, there is the ISO date standard (or, officially, “ISO 8601:2004: Data elements and interchange formats — Information interchange — Representation of dates and times”).  This standard dictates that dates be ordered from largest term to smallest term, so instead of the month-day-year we all wrote on our grade school papers, dates should take the form year-month-day.  Further, since we have passed into a new millennium, a four digit year is necessary.  This provides a consistent format to eliminate confusion, but also allows for file systems to sort the files appropriately.  For example, let’s look at the following three files:

1960-11-01_libraryPhoto.jpg
1977-01-05_libraryPhoto.jpg
2000-05-01_libraryPhoto.jpg

If we expressed those dates in another format, say, month-day-year, they would not be listed in chronological order in a file system sorting alphabetically.  Instead, we would see:

01-05-1977_libraryPhoto.jpg
05-01-2000_libraryPhoto.jpg
11-01-1960_libraryPhoto.jpg

This may not be a problem if you are visually searching through three files, but what if there were 100?  Now, if we only used a two digit year, we would see:

00-05-01_libraryPhoto.jpg
60-11-01_libraryPhoto.jpg
77-01-05_libraryPhoto.jpg

If we did not standardize the number of digits, we might see:

00-5-1_libraryPhoto.jpg
77-1-5_libraryPhoto.jpg
60-11-1_libraryPhoto.jpg

You can try this pretty easily on your own system.  Create three text files with the names above, sort the files by name and check the order.  Imagine the problems this might create for someone trying to quickly locate a file.

You might ask, why include the date at all, when dates are also maintained by the operating system?  There are many situations where the operating system dates are unreliable.  In cases where a file moves to a new drive or computer, for example, the Date Created may reflect the date the file moved to the new system, instead of the initial creation date.  Or, consider the case where a user opens a file to view it and the application changes the Date Modified, even though the file content was not modified.  Lastly, consider our earlier example of a photograph from 1960; the Date Created is likely to reflect the date of digitization.  In each of these examples, it would be helpful to include an additional date in the file name.

The Why – Descriptive File Names

So far we have digressed down a date-specific path.  What about our other conventions?  Why are those important?  Also linked from the Bentley Library and in the Google search results are YouTube videos created by the State Library of North Carolina which answer some of these questions.  The Inform U series on file naming has four parts, and is intended to help users manage personal files.  However, the rationale described in Part 1 for descriptive file names in personal file management also applies in our libraries.

First, we want to avoid the accidental overwriting of files.  Image files can provide a good example here: many cameras use the file naming convention of IMG_1234.jpg.  If this name is unique to the system, that works ok, but in a situation where multiple cameras or scanners are generating files for a digital collection, there is potential for problems.  It is better to batch re-name image files with a more descriptive name. (Tutorials on this can be found all over the web, such as the first item in this list on using Mac’s Automator program to re-name a batch of photos).

Second, we want to avoid the loss of files due to non-descriptive names.  While many operating systems will search the text content of files, naming files appropriately makes for more efficient access.  For example, consider mynotes.docx and 2012-01-05WebMeetingNotes.docx – which file’s contents are easier to ascertain?

I should note, however, that there are cases where non-descriptive file names are appropriate.  The use of a unique identifier as a filename is sometimes a necessary approach.  However, in those cases where you must use a non-descriptive filename, be sure that the file names are unique and in a descriptive directory structure.  Overall, it is important that others in the organization have the same ability to find and use the files we currently manage, long after we have moved on to another institution.

The Why – Special Characters & Spaces

We have now covered descriptive names and reasons for including dates, which leaves us with spaces and special characters to address.  Part 3 of the Inform U video series addresses this as well.  Special characters can designate special meaning to programming languages and operating systems, and might be misinterpreted when included in file names.  For instance, the $ character designates the beginning of variable names in the php programming language and the \ character designates file path locations in the Windows operating system.

Spaces may make things easier for humans to read, but systems generally do better without the spaces.  While operating systems attempt to handle spaces gracefully and generally do so, browsers and software programs are not consistent in how they handle spaces.  For example, consider a file stored in a digital repository system with a space in the file name.  The user downloads the file and their browser truncates the file name after the first space.  This equates to the loss of any descriptive information after the first space.  Plus, the file extension is also removed, which may make it harder for less tech savvy users to use a file.

The Pitch

That example leads us to the heart of the issue: we never know where our files are going to end up, especially files disseminated to the public.  Our content is useless if our users cannot open it due to a poorly formatted file name.  And, in the case of non-public files or our personal archives, it is essential to facilitate the discovery of items in the piles and piles of digital material accumulated every day.

So, do I have my elevator pitch?  I think so.  When asked about file naming standards in the future, I think I can safely reply with the following:  “It is impossible to accurately predict all of the situations in which a file might be used.  Therefore, in the interest of preserving access to digital files, we choose file name components that are least likely to cause a problem in any environment.  File names should provide context and be easily understood by humans and computers, now and in the future.”

And, like a good file name, that is much more effective and descriptive than, “Because I said so.”


Visualizing DSpace Data with Google Fusion Table & Viewshare

During my time as the Digital Resources Librarian at Kenyon College I had the opportunity to work with The Community Within collection, which explores black history in Knox County, Ohio.  At the beginning of the project, our goal for this collection was simple: to make a rich set of digitized materials publicly available through our DSpace repository, the Digital Resource Commons (DRC).  However, once the collection was published in the DRC, a new set of questions emerged. How do we drive people to the collection? Can we create more interesting interfaces or virtual exhibits for the collection? How do we tie it all together? To answer these questions, we started exploring the digital humanities landscape, looking for low cost tools we could integrate with our existing DSpace collections.  We started to think about the collection and associated metadata as a data set, which contained elements we could use to create a display different than the standard list of items.  We wanted to facilitate the discovery of individual items by displaying them to our users in different visual contexts, such as maps or timelines.

Two tools that emerged from this exploration were Google Fusion Tables, a Google product, and Viewshare, which is provided by National Digital Information Infrastructure and Preservation Program (NDIIPP) at the Library of Congress.  Google Fusion Tables provides a platform for researchers to upload and share data sets, which can then be displayed in seven different visualization formats (map, scatter plot, intensity map).  Various examples of the results can be seen in their gallery, which also illustrates the wide range of organizations using the tool, including academic research institutions, news organizations and government agencies.  Viewshare, according to their website, “is a free platform for generating and customizing views (interactive maps, timelines, facets, tag clouds) that allow users to experience your digital collections.”  While it does many of the same things as Google Fusion in allowing users to create visualizations of data sets, it is more specifically geared towards cultural heritage collections.

Both tools are freely available and allow users to import data from a variety of sources.  Because the tools are easy to use, it is possible to get started quickly in manipulating and sharing your data.  Each tool provides a space for the uploaded data and accompanying views, but also allows for you to embed this information in other web locations.  In the case of The Community Within, we created an exhibit which links to materials about churches in the collection using an embedded Google Fusion map display.

This blog entry will walk through how to successfully export and manipulate data from DSpace in order to take advantage of these tools, as well as how to embed the resulting interface components back into DSpace or other collection websites.

The How-To – DSpace and Google Fusion

1.  First, start with a DSpace collection.  Our example collection is a photo collection of art on the campus at Ohio State University.  In the screenshot below, we are already logged in as a collection administrator.

Note. Click the images to see them in their full-size.

A DSpace Collection

2.  We need to export the metadata.  So, click on “Export Metadata” (under Context).  This will download a .csv file.

Save the csv file.

3.  When you open the .csv, you may notice that metadata added to the collection at different times in different ways may show up differently.  We want to fix this before we send this file anywhere.

CSV data, pre-edit

Edited CSV data

4.  Save the file as a .csv file.  If you are given a choice, be sure to select a comma as the separating punctuation.

5.  Open Google Fusion.  If you do not use Google Drive (formerly Docs), you will need to login with a Google account or sign up for one.  Go to drive.google.com.

6.  Once you are logged in, click on Create > More > Fusion Table (experimental).
Select Create, Other, Fusion Table
7.  On the next screen, we’re going to select “From this computer”, then click on Browse to get to the csv we created above.  Once the file is in the Browse text box, click on Next.
Browse for file
8.  Check that your data looks ok, then click on Next again.  A common problem occurs here when your spreadsheet editor chooses a separator other than a comma.  Fixing is easy enough, just click Back and indicate the correct separator character.
Check your data
9.  On the next screen describe your table, then click on Finish.
Describe your table, and click Finish
10.  We have a Fusion table.  Now, let’s create our visualization.  Click on Visualize > Map.

Click on Visualize, then Map

Because our collection already contained Geocodes in the dc.coverage.spatial column, the map is automatically created.  However, if you would like to use a different column, you can change it by selecting the Location field to the top left of the map.  Google Fusion tables can also create the map using an address, instead of a latitude/longitude pair.  If the map is zoomed far back, zoom in before you get the embed code to make sure the zoom is appropriate on your Dspace page.

We have a map

11.  Now, let’s embed our map back in DSpace.  In Google Fusion, click on “Get embeddable link” at the top of the map.  In the dialog which comes up, copy the text in the field “Paste HTML to embed in a website” (Note: your table must be shared for this to work.  Google should prompt you to share the table if you try to get an embeddable link for an unshared table.  If not, just click on Share in your Fusion window and make the table public.)

Copy the link text
12.  Now, back in DSpace, click on Edit Collection.  In one of the HTML fields (I usually use Introductory Text) and paste the text you copied.

Paste the embed code

13.  Here’s a huge gotcha.  I have pasted the embed code below.  If you paste it just like this and click on Save, the Collection page will disappear because there is nothing between the tags.  We need to add something between the opening and closing <iframe></iframe> tag.  Usually, I use “this browser does not support frames.”

<iframe width=”500″ height=”300″ scrolling=”no” frameborder=”no” src=”https://www.google.com/fusiontables/embedviz?viz=MAP&amp;q=select+col4+from+1Fqwl_ugZxBx3vCXLVEfnujSpYJa9F0IICVqHLYw&amp;h=false&amp;lat=40.00118408791957&amp;lng=-83.016412&amp;z=10&amp;t=1&amp;l=col4″></iframe>

<iframe width=”500″ height=”300″ scrolling=”no” frameborder=”no” src=”https://www.google.com/fusiontables/embedviz?viz=MAP&amp;q=select+col4+from+1Fqwl_ugZxBx3vCXLVEfnujSpYJa9F0IICVqHLYw&amp;h=false&amp;lat=40.00118408791957&amp;lng=-83.016412&amp;z=10&amp;t=1&amp;l=col4″>This browser does not support frames.</iframe>

14.  Now, click on Save.  This will take you back to your collection homepage, which now has a map.
Embedded Map
15.  One last thing – that info window in the map is not really user friendly.  Let’s go back go Google Fusion and fix it.  Just click on “Configure info window” above the Fusion map.  It will bring up a dialog which allows you to choose which fields you want to show, as well as modify the markup so that, for example, links display as links.
Modify the info window
16.  No need to re-embed, just head back to your DSpace page and click refresh.
Final embedded map
Done!  You can play with the settings at various points along the way to make the map smaller or larger.

The How-To – DSpace and Viewshare

We can complete the same process using Viewshare.  If you skipped to this section, go back and read steps 1-4 above.

Back?  Ok.  So we should have a .csv of exported metadata from our DSpace collection.

1.  Log into Viewshare.  You will have to request an account if you don’t have one.
2.  From the homepage, click on Upload Data.

Click on Upload Data

3.  There are a multitude of source options, but we’re going to use the .csv we created above, so we select “From a file on your computer.”

Select "from a file"
4.  Browse for the file, then click on Upload.

5.  In the Preview Window, you can edit the field names to more user friendly alternatives.  You can also click the check box under Enabled to include or not include certain fields.  You can also select field types, so that data is formatted correctly (as in, links) and can be used for visualizations (as in dates or locations).

Edit the data

6.  When you have finished editing, click on Save.  You will now see the dataset in your list of Data.  Click on Build to build a visualization.

Select Build

7.  You can pick any layout, but I usually pick the One Column for simplicity’s sake.

Select a layout

8.  The view will default to List, but really, we already have a list.  Let’s click on the Add a View tab to create another visualization.  For this example, we’re going to select Timeline.

Select a Timeline View

9. There are a variety of settings for this visualization.  Select the field which contains the date (in our case, we just have one date, so we leave End Date blank), decide how you want to color the timeline and what unit you want to use.  Timeline lens lets you decide what is included in the pop-up.  Click on Save (top right) when you are finished selecting values.

Select options for View

10.  We have created a timeline.  Now we need to embed it back in DSpace. Click on Embed in the top menu.

Now we have a timeline

11.  Copy the embed code.

Copy the embed code

12.  Again, back in DSpace, we will click on Edit Collection and paste the embed code into one of the HTML fields.  And, again, it is essential that there is some text between the tags.

Paste the embed code

Now we have an embedded timeline!

An embedded timeline

Depending on the space available on your DSpace homepage, you may want to adjust the top and bottom time bands so that the timeline displays more cleanly.

Of course, there are a few caveats.  For example, this approach works best with collections that are complete.  If items are still being added to the collection, the collection manager will need to build in a workflow to refresh the visualization from time to time.  This is done by re-exporting, re-uploading, and re-embedding.  Also, Google Fusion Tables is officially an “experimental” product.  It is important to keep your data elsewhere as well, and to be aware that your Fusion visualizations may not be permanent.

However, this solution provides an easy, code-free way to improve the user interface to a collection.  Similar approaches may also work using platforms not described here. For example, here’s a piece on using Viewshare with Omeka, another open source collection management system.  The goal is to let each tool do what it does best, then make the results play nicely together.  This is a free and relatively painless way to achieve that goal.

About our Guest Author: Meghan Frazer is the Digital Resources Curator for the Knowlton School of Architecture at The Ohio State University.  She manages the school archives as well as the KSA Digital Library, and spends lots of time wrangling Drupal for the digital library site. Her professional interests include digital content preservation and data visualization.  Before attending library school, Meghan worked in software quality assurance and training and has a bachelor’s degree in Computer Science.  You can send tweets in her direction using @meghanfrazer.