Bioinformatics Data Skills by Oreilly学习笔记-2

最新推荐文章于 2022-12-23 20:09:00 发布

weixin_42953727

最新推荐文章于 2022-12-23 20:09:00 发布

阅读量244

点赞数 1

分类专栏： bioinformatics 文章标签： Bioinformatics

本文链接：https://blog.csdn.net/weixin_42953727/article/details/100045149

版权

bioinformatics 专栏收录该内容

27 篇文章 5 订阅

订阅专栏

Chapter2 Setting Up and Managing a Bioinformatics Project

Organizing Data to Automate File Processing Tasks

Shell Expansion Tips

$ echo dog-{gone,bowl,bark}
dog-gone dog-bowl dog-bark

$ mkdir -p zmays-snps/{data/seqs,scripts,analysis}
#在zmays-snps目录下同时创建多个子目录

$ cd data
$ touch seqs/zmays{A,B,C}_R{1,2}.fastq
$ ls seqs/
zmaysA_R1.fastq zmaysB_R1.fastq zmaysC_R1.fastq
zmaysA_R2.fastq zmaysB_R2.fastq zmaysC_R2.fastq

$ ls seqs/zmaysB*
zmaysB_R1.fastq zmaysB_R2.fastq

OS X and Linux systems have a limit to the number of arguments that can be supplied to a command (more technically, the limit is to the total length of the arguments)
see “Using find and xargs” on page 411 for the solution

$ ls zmays[AB]_R1.fastq
zmaysA_R1.fastq zmaysB_R1.fastq
$ ls zmays[A-B]_R1.fastq
zmaysA_R1.fastq zmaysB_R1.fastq

在这里插入图片描述
2. Leading Zeros and Sorting
3. Markdown for Project Notebooks, Formatting Basics
e.g.

# *Zea Mays* SNP Calling
We sequenced three lines of *zea mays*, using paired-end
sequencing. This sequencing was done by our sequencing core and we
received the data on 2013-05-10. Each variety should have **two**
sequences files, with suffixes `_R1.fastq` and `_R2.fastq`, indicating
which member of the pair it is.
## Sequencing Files
All raw FASTQ sequences are in `data/seqs/`:
$ find data/seqs -name "*.fastq"
data/seqs/zmaysA_R1.fastq
data/seqs/zmaysA_R2.fastq
data/seqs/zmaysB_R1.fastq
data/seqs/zmaysB_R2.fastq
data/seqs/zmaysC_R1.fastq
data/seqs/zmaysC_R2.fastq
## Quality Control Steps
After the sequencing data was received, our first stage of analysis
was to ensure the sequences were high quality. We ran each of the
three lines' two paired-end FASTQ files through a quality diagnostic
and control pipeline. Our planned pipeline is:
1. Create base quality diagnostic graphs.
2. Check reads for adapter sequences.
3. Trim adapter sequences.
4. Trim poor quality bases.
Recommended trimming programs:
- Trimmomatic
- Scythe

Using Pandoc to Render Markdown to HTML
Using Pandoc is very simple—to convert from Markdown to HTML, use the --from mark
down and --to html options and supply your input file as the last argument:

$ pandoc --from markdown --to html notebook.md > output.html

weixin_42953727

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Bioinformatics Data Skills by Oreilly学习笔记-2

Chapter2 Setting Up and Managing a Bioinformatics ProjectOrganizing Data to Automate File Processing TasksShell Expansion Tips$ echo dog-{gone,bowl,bark}dog-gone dog-bowl dog-bark$ mkdir -p zm...
复制链接

扫一扫