Chapter2 Setting Up and Managing a Bioinformatics Project
Organizing Data to Automate File Processing Tasks
- Shell Expansion Tips
$ echo dog-{gone,bowl,bark}
dog-gone dog-bowl dog-bark
$ mkdir -p zmays-snps/{data/seqs,scripts,analysis}
#在zmays-snps目录下同时创建多个子目录
$ cd data
$ touch seqs/zmays{A,B,C}_R{1,2}.fastq
$ ls seqs/
zmaysA_R1.fastq zmaysB_R1.fastq zmaysC_R1.fastq
zmaysA_R2.fastq zmaysB_R2.fastq zmaysC_R2.fastq
$ ls seqs/zmaysB*
zmaysB_R1.fastq zmaysB_R2.fastq
OS X and Linux systems have a limit to the number of arguments that can be supplied to a command (more technically, the limit is to the total length of the arguments)
see “Using find and xargs” on page 411 for the solution
$ ls zmays[AB]_R1.fastq
zmaysA_R1.fastq zmaysB_R1.fastq
$ ls zmays[A-B]_R1.fastq
zmaysA_R1.fastq zmaysB_R1.fastq
2. Leading Zeros and Sorting
3. Markdown for Project Notebooks, Formatting Basics
e.g.
# *Zea Mays* SNP Calling
We sequenced three lines of *zea mays*, using paired-end
sequencing. This sequencing was done by our sequencing core and we
received the data on 2013-05-10. Each variety should have **two**
sequences files, with suffixes `_R1.fastq` and `_R2.fastq`, indicating
which member of the pair it is.
## Sequencing Files
All raw FASTQ sequences are in `data/seqs/`:
$ find data/seqs -name "*.fastq"
data/seqs/zmaysA_R1.fastq
data/seqs/zmaysA_R2.fastq
data/seqs/zmaysB_R1.fastq
data/seqs/zmaysB_R2.fastq
data/seqs/zmaysC_R1.fastq
data/seqs/zmaysC_R2.fastq
## Quality Control Steps
After the sequencing data was received, our first stage of analysis
was to ensure the sequences were high quality. We ran each of the
three lines' two paired-end FASTQ files through a quality diagnostic
and control pipeline. Our planned pipeline is:
1. Create base quality diagnostic graphs.
2. Check reads for adapter sequences.
3. Trim adapter sequences.
4. Trim poor quality bases.
Recommended trimming programs:
- Trimmomatic
- Scythe
- Using Pandoc to Render Markdown to HTML
Using Pandoc is very simple—to convert from Markdown to HTML, use the --from mark
down and --to html options and supply your input file as the last argument:
$ pandoc --from markdown --to html notebook.md > output.html