This book assumes you’re familiar with basic topics such as what a terminal is, what the shell is, the Unix filesystem hierarchy, moving about directories, file permissions, executing commands, and working with a text editor. In this chapter, we’ll cover remedial concepts that deeply underly how we use the shell in bioinformatics: streams, redirection, pipes, working with running programs, and command substitution.
Why Do We Use Unix in Bioinformatics? Modularity and the Unix Philosophy
This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.
—Doug McIlory
The Unix shell provides a way for these programs to talk to each other (pipes) and write to and read files (redirection). Unix’s core programs (which we’ll use to analyze data on the command line in Chapter 7) are modular and designed to work well with other programs.
Advantages in Bioinformatics
1.spot errors and figure out where
2.Modular workflows, alternative methods and approaches,
3.Modular components allow us to choose tools and languages that are appropriate for specific tasks.
4.Modular programs are reusable and applicable to many types of data.
Both general Unix tools and many bioinformatics programs are designed to take input through a stream and pass output through a different stream. It’s these text streams that allow us to both couple programs together into workflows and process data without storing huge amounts of data in our computers’ memory.
The many Unix Shells
make sure you’re using the Bourne-again shell, or bash. Run echo $SHEll
to verify you’re using bash as your shell (although it’s best to also check whatecho $0
says too. The Bourne shell (sh) was the predecessor of the Bourne-again shell (bash); but bash is newer and usually preferred. If you feel confident with general shell basics, you may want to try Z shell.
Gary Bernhardt: Unix is like a chainsaw. Chainsaws are powerful tools, and make many difficult tasks like cutting through thick logs quite easy.Unfortunately, this power comes with danger: chainsaws can cut just as easily through your leg (well, technically more easily).
$ rm -rf tmp-data/aligned-reads* # deletes all old large files
$ # versus
$ rm -rf tmp-data/aligned-reads * # deletes your entire current directory
rm: tmp-data/aligned-reads: No such file or directory
In Unix, a single space could mean the difference between cleaning out some old files and finishing your project behind schedule because you’ve accidentally deleted everything.
Unix was not designed to stop its users from doing stupid things, as that would also stop them from doing clev