Makefiles to Control Bioinformatic Pipelines

I have a big sprawling project that until recently consisted of a folder of jumbled scripts. Previous Me knew exactly how everything worked and what depended on what and Current Me had a tenuous idea of what was going on, but there was little hope that Future Me would be able to make sense of it. Code you’ve written six months ago might as well have been written by a different person, which I think is a famous computing quote, but I fittingly can’t remember who said it.

I looked around for an easy way to turn my bag of interdependent scripts into a publishable pipeline, one which I could be proud to claim as my own work. After a few hours’ research, some early contenders were Bio-Linux and CloudBioLinux, but I was scared off by their size. I wanted something that most anybody could run with minimal effort either on an HPC cluster or a desktop. Plus I was feeling lazy and wanted to take the path of least resistance.

I finally settled on using a Makefile as my pipeline, which I first encountered applied to bioinformatic pipelines at this archived blog post. He explains the core benefit clearly:

“Whenever a script changes, all data files that it produces are redone. That is all very obvious for anyone with a little experience with makefiles, it simply didn’t occur to use the whole machinery for my pipelines.”

One major benefit is the ability to pick up where you left off just by running make again after, say, your computer shuts down during a hurricane. This post echoed that idea:

“Plus, […] if the pipeline needs to be re-run for any reason (whether it prematurely aborted or some of the input data or parameters were modified), Make will only run the commands it needs to.”

For a really thorough overview of the use of Makefiles in bioinformatics and some introductory examples, see this post at Bioinformatics Zen. It summarizes the improvements afforded by the use of Makefiles in areas of reproducibility, programming language independence, analysis step abstraction, and simple parallelization, and is definitely worth a read before you jump into the technique.

Finally, when you want to give Makefiles a go in your own bioinformatic pipeline, read though the excellent tutorial over at Software Carpentry.

Hopefully these links will serve to point programmers in a similar predicament in a fruitful direction. I’m just trying the tool out, but I’m already a convert. When the pipeline is finished, I’ll try to write up a more detailed summary of the use of Makefiles as bioinformatic piplines.

Password Protecting a File on Website

I just wrapped up a web coding side project for a friend that needed a password-protected page. Here’s a quick tutorial on easily password-protecting a folder or file on your website using .htaccess and .htpasswd, mainly for the next time I have to do this after I’ve forgotten how.

Step 1: Create an .htpasswd file

First you need to create a file named .htpasswd that contains a single line with username and password on it, separated by a colon. The password has to be encrypted, and luckily there are web tools to generate these lines for you. I used this one. They’ll generate a line that looks like this:

username:HHfJtS7V7Esf6

Copy and paste that into your .htpasswd file, and it’ll be good to go.

Step 2: Upload the .htpasswd file

Stick that .htpasswd file somewhere safe on your webserver. Meaning it’s best if it’s not in a folder that’s accessible to the public like htdocs. I put mine at root.

Step 3: Create an .htaccess file

Next up, create another text file, called .htaccess. Into this, paste the following if you want to protect a folder:

AuthUserFile /path/to/.htpasswd
AuthType Basic
AuthName "Protected Folder"
Require valid-user

Or this if you want to protect a file:

AuthUserFile /path/to/.htpasswd
AuthType Basic
AuthName "Protected File"

<Files "secretpage.html"$gt;
  Require valid-user
$lt;/Files>

Now you have to change one or two bits:

  • Change /path/to/.htpasswd to the real full path to the .htpasswd file you uploaded earlier. See below if you don’t know the full path.
  • If you’re protecting a file, change secretpage.html to the name of the file you want to want to protect. (Not the full path, you just need the filename.)

Now there’s a tricky part to these changes, one that threw me for a loop for a little while and gave me a 500 Internal Server Error. The path to .htpasswd has to be the full path to the file, starting with the real root. An easy way to find out the full path of your .htpasswd is to create a PHP test script that just contains the command phpinfo(). Putting this php file on your server and pointing your web browser to it will show a bunch of information. This includes a value, SCRIPT_FILENAME, and then the full path to the test script. From this you should be able to tell what the full path to your server is, and then deduce the path to .htpasswd.

Step 4: Upload your .htaccess file

You’re about done. Stick the .htaccess file you just created in the folder you want to protect or in the same folder as the file you want to protect. If all is well, trying to access the protected file or folder will prompt you for a username and password. The browser should remember it until it’s closed, which is important to keep in mind when testing.

New Lab Website Launched

We recently redesigned our lab’s website using weebly, which I highly recommend for academics looking to create a group or personal home page.

I wrote before of how I exported our lists of publications to HTML for insertion on the site. Here’s how I redirected all the pages on the old site using an .htaccess 301 redirect. For background info, see this tutorial. On the old site, in the directory containing pages that I wanted to be redirected, I placed a file named .htaccess that contained the following:

RedirectMatch 301 ^/.*$ http://nyu-anthro-lab.weebly.com

It tells search engines and the browser that any URL matching ^/.*$ (the_current_directory/whatever) should redirect to http://nyu-anthro-lab.weebly.com. Super simple, and good SEO!