Stuff I Routinely Forget How to Do

A surprisingly important hunk of my bioinformatic skill set consists of Googling for error messages and implementing whatever fix the, say, StackOverflow posters suggested. I’ve collected the wisdom gleaned from such forums, man pages, and other programmers into this (newly relocated and expanded) document on GitHub: Stuff I Routinely Forget How to Do. I’m sure I’ll refer to it often and hopefully someone else will find it useful.

Elephant

(Elephant photo by Flickr user guido da rozze.)

/bin/true

I like coming across the odd history of Unix commands that I use on a daily basis. Here’s one I didn’t know: the true utility, which just exits without error and is often used in infinite loops, was originally just an empty shell script. By definition, it never contained any errors, and quite simply performed its duty. I found out about it from this post:

Read Quote of Ron Spencer’s answer to What is the shortest and most effective code ever written? on Quora

A longer discussion of its history (and the bizarre addition of copyright statements) can be found here.

Process Substitution

The Molecular Ecologist blog pointed me to Vince Buffalo’s post on named pipes and process substitution, which should be in the arsenal of any Unix hacker. Both allow you to avoid creating temporary files with advance piping techniques. Here’s one quick example:

$ diff <(ls /scratch/secret_project/) <(ls /archive/secret_project/)
93a94
> tajima_d.txt

This lists the files in the folders named “secret_project” in the scratch and archive directories and compares the outputs of the ls commands with diff. The file, tajima_d.txt, is found only in /archive, suggesting it’s been deleted in the /scratch folder.

Vince has another example, which I’m stealing because it’s so beautiful:

program --in1 <(makein raw1.txt) --in2 <(makein raw2.txt) \
   --out1 >(gzip > out.txt.gz) --out2 >(gzip > out2.txt.gz) \
   > stats.txt 2> diagnostics.stderr

The two input files are made with the “makein” program, just like above. The two output streams are piped out into gzip by flipping the angle bracket from <() to >().

Like many advanced Unix techniques, it’s definitely powerful when wielded precisely, but the debugging headache and ever-present potential for ruination can be summed up by his chainsaw metaphor:

I’ve used a chainsaw, and you’re simultaneously amazed at (1) how easily it slices through a tree, and (2) that you’re dumb enough to use this thing three feet away from your vital organs. This is Unix.