GWAS
1. 使用bcftools对vcf文件创建索引index
bcftools index ${out}.vcf.gz
2. bcftools isec 计算不同vcf之间的突变的交集
bcftools isec -n~111 -c all -p merge bbj-a-107.vcf.gz bbj-a-76.vcf.gz ieu-b-4965.vcf.gz
会创建一个merge文件夹,并且该文件夹下有sites.txt文件
awk '{if (length($3) == 1 && length($4) == 1) print $1,$2,$2,$3,$4}' sites.txt >> annovar_input.txt
3. 提取chr1 - chr22位点信息
awk '{ if ($1 >= 1 && $1 <= 22) print $2 }' HapMap_3_r3_6.bim > snp_1_22.txt
4. 从vcf直接提取SNP,整合annovar注释
需要的格式(tab分割):1 751756 751756 T C
zcat bbj-a-107.vcf.gz |grep -v '^#'|grep -v "^$" |awk '{if (length($4) == 1 && length($5) == 1) print $1,$2,$2,$4,$5 >> "annovar_input.txt" }'
5. Annotate the variants with gene information用基因信息标注变异.
table_annovar.pl annovar_input.txt ~/software/annovar/humandb/ --buildver hg19 --out myanno --remove --protocol refGene --operation g --nastring . --polish
table_annovar.pl anno_input.txt ~/software/annovar/humandb/ --buildver hg19 --out myanno --remove --protocol refGene --operation g --nastring . --polish
6. 提取外显子区域
cat myanno.hg19_multianno.txt |awk 'length($4)==length($5) && $6=="exonic" {print $0 >> "exonic_out.txt"}'
7. 提取各种突变位点
### 同义突变位点
cat exonic_out.txt |awk '{if($9=="synonymous") print $0 >> "synonSNV.txt"}'
### stopgain位点
cat exonic_out.txt |awk '{if($9=="stopgain") print $0 >> "stopgainSNV.txt"}'
### stoploss位点
cat exonic_out.txt |awk '{if($9=="stoploss") print $0 >> "stoplossSNV.txt"}'
### 非同义突变位点
cat exonic_out.txt |awk '{if($9=="nonsynonymous") print $0 >> "nonsynonymousSNV.txt"}'