Bioinformatics Data Skills by Oreilly学习笔记-7-2

最新推荐文章于 2020-10-15 22:40:57 发布

weixin_42953727

最新推荐文章于 2020-10-15 22:40:57 发布

阅读量366

点赞数 1

分类专栏： bioinformatics 文章标签： Bioinformatics

本文链接：https://blog.csdn.net/weixin_42953727/article/details/100126158

版权

本文是Oreilly《Bioinformatics Data Skills》的学习笔记，重点介绍了Unix工具如grep、hexdump、sort、uniq和join在生物信息学数据处理中的应用。grep用于文本搜索，hexdump用于查看文件编码，sort对数据进行排序，uniq去除重复行，join则通过共同列合并文件。这些工具在处理文本数据时展现出强大的功能。

摘要由CSDN通过智能技术生成

接上一篇Chapter 7

The All-Powerful Grep

grep “pattern” files
–color=auto
grep 是贪婪匹配，用**-w**进行准确匹配（constraining our matches to be words），默认输出行。

$ cat example.txt
bio
bioinfo
bioinformatics
computational biology
$ grep -v bioinfo example.txt
bio
computational biology
$ grep -v -w bioinfo example.txt
bio
bioinformatics
computational biology

get around this context before (-B), context: after (-A), and context before and after (-C). Each of these arguments takes how many lines of context to provide:

$ grep -B1 "AGATCGG" contam.fastq | head -n 6
@DJB775P1:248:D0MDGACXX:7:1202:12362:49613
TGCTTACTCTGCGTTGATACCACTGCTTAGATCGGAAGAGCACACGTCTGAA
--
@DJB775P1:248:D0MDGACXX:7:1202:12782:49716
CTCTGCGTTGATACCACTGCTTACTCTGCGTTGATACCACTGCTTAGATCGG
--
$ grep -A2 "AGATCGG" contam.fastq | head -n 6
TGCTTACTCTGCGTTGATACCACTGCTTAGATCGGAAGAGCACACGTCTGAA
+
JJJJJIIJJJJJJHIHHHGHFFFFFFCEEEEEDBD?DDDDDDBDDDABDDCA
--
CTCTGCGTTGATACCACTGCTTACTCTGCGTTGATACCACTGCTTAGATCGG
+

$ grep "Olfr141[13]" Mus_musculus.GRCm38.75_chr1_genes.txt
ENSMUSG00000058904 Olfr1413
ENSMUSG00000062497 Olfr1411

grep allows us to turn on ERE with the -E option

$ grep -E "(Olfr1413|Olfr1411)" Mus_musculus.GRCm38.75_chr1_genes.txt
ENSMUSG00000058904 Olfr1413
ENSMUSG00000062497 Olfr1411

计数：grep -c

$ grep -c "\tOlfr" Mus_musculus.GRCm38.75_chr1_genes.txt
27

Alternatively, we could pipe the matching lines to wc -l:

$ grep "\tOlfr" Mus_musculus.GRCm38.75_chr1_genes.txt | wc -l
27

only the matching part of the pattern：grep -o

最低0.47元/天解锁文章

weixin_42953727

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录