个人笔记GWAS系列

最新推荐文章于 2024-04-24 11:20:57 发布

May7799

最新推荐文章于 2024-04-24 11:20:57 发布

阅读量512

点赞数 1

文章标签： linux 运维服务器

本文链接：https://blog.csdn.net/may0707/article/details/129127395

版权

文章详细描述了一个GWAS（基因组关联分析）的工作流程，包括使用bcftools创建vcf文件索引，计算不同vcf文件的突变交集，提取特定染色体位点信息，从vcf中提取SNP并结合annovar进行基因注释，然后筛选出同义突变、非同义突变、stopgain和stoploss等不同类型的变异位点。

摘要由CSDN通过智能技术生成

GWAS

1. 使用bcftools对vcf文件创建索引index

bcftools index ${out}.vcf.gz

2. bcftools isec 计算不同vcf之间的突变的交集

bcftools isec -n~111 -c all -p merge bbj-a-107.vcf.gz bbj-a-76.vcf.gz ieu-b-4965.vcf.gz

会创建一个merge文件夹，并且该文件夹下有sites.txt文件

awk '{if (length($3) == 1 && length($4) == 1) print $1,$2,$2,$3,$4}' sites.txt >> annovar_input.txt

3. 提取chr1 - chr22位点信息

awk '{ if ($1 >= 1 && $1 <= 22) print $2 }' HapMap_3_r3_6.bim > snp_1_22.txt

4. 从vcf直接提取SNP，整合annovar注释

需要的格式(tab分割)：1 751756 751756 T C

zcat bbj-a-107.vcf.gz |grep -v '^#'|grep -v "^$"  |awk '{if (length($4) == 1 && length($5) == 1) print $1,$2,$2,$4,$5 >> "annovar_input.txt" }'

annovar_input.txt

5. Annotate the variants with gene information用基因信息标注变异.

table_annovar.pl annovar_input.txt ~/software/annovar/humandb/ --buildver  hg19 --out myanno --remove --protocol refGene --operation g --nastring . --polish

table_annovar.pl anno_input.txt ~/software/annovar/humandb/ --buildver  hg19 --out myanno --remove --protocol refGene --operation g --nastring . --polish

6. 提取外显子区域

cat myanno.hg19_multianno.txt |awk 'length($4)==length($5) && $6=="exonic" {print $0 >> "exonic_out.txt"}'

7. 提取各种突变位点

### 同义突变位点
 cat exonic_out.txt |awk '{if($9=="synonymous") print $0 >> "synonSNV.txt"}'
 
 ### stopgain位点
cat exonic_out.txt |awk '{if($9=="stopgain") print $0 >> "stopgainSNV.txt"}'

### stoploss位点
cat exonic_out.txt |awk '{if($9=="stoploss") print $0 >> "stoplossSNV.txt"}'

### 非同义突变位点
cat exonic_out.txt |awk '{if($9=="nonsynonymous") print $0 >> "nonsynonymousSNV.txt"}'