对于几百万条记录的文件,用python处理起来会慢很多,这时候可以结合awk grep 等使用
需求:获取2020001082.snp_indel.hg19_multianno.pro.txt 文件中第十列中每次字符的个数
#!/bin/bash
#for i in `cut -f 10 2020001082.snp_indel.hg19_multianno.pro.txt|sort |uniq -d # 这句总是报告代码最后一行的下面一行Syntax error: EOF in backquote substitution 错误提示
cut -f 10 2020001082.snp_indel.hg19_multianno.pro.txt|sort |uniq -d|while read i
do
echo -n "${i} number is \n" >> count.result.txt
awk '{print $10}' 2020001082.snp_indel.hg19_multianno.pro.txt|grep "${i}"|wc -l >> count.result.txt
done