实验记录 | 6/21结个尾吧

最新推荐文章于 2023-02-06 10:55:50 发布

今天也是个妖精头子呀

最新推荐文章于 2023-02-06 10:55:50 发布

阅读量384

点赞数

分类专栏：谱系追踪

本文链接：https://blog.csdn.net/weixin_40640700/article/details/118080009

版权

谱系追踪专栏收录该内容

82 篇文章

订阅专栏

mutect的结果报错，对我而言始终是一件如鲠在喉的事情。今天上午，就入手解决这件事情。

首先，先理清楚汇报的主要的思路（也不用过于紧张，平常心就好）。所以，究竟是什么呢?
刚开始，比较纠结于到底是使用mutect1还是mutect2，后来去网上调查比较后发现，mutect1更加专长于对单核苷酸变异的检测，而mutect2则在这个基础上，与GATK结合，更加专长于indel的检测。所以，综上所述，我应该试着跑完mutect1的流程。

或者，现在开始配置cnv.pl的环境。

关于mutect的问题，我找到了几处解释：
参考链接：https://stackoverflow.com/questions/19934021/vcf4-2-file-not-recognised-by-gatk

I finally figured it out: It was something off the the VariantAnnotator vcf from GATK, I re-ran it and used the new file, I also deleted the old index file. The manual correction of the ESP VCF works! Its a bit boring, but it does the job. Hope this help anyone else getting this error! Replace all chromosome names manually using “find & replace” in Textedit. Its something about the formatting thats off.

参考链接：https://www.biostars.org/p/117241/

Ok, Luckily I figured it out.
There is nothing wrong with the VCF file, the recal and tranches files were empty, I was mislead by the error message!
Thanks everyone

这个问题，暂时无法解决，放在一边。

我们接下来，看一下cnv.pl的运行规则。
需要依赖性的软件，CNVkit。
参考链接：https://github.com/etal/cnvkit
https://cnvkit.readthedocs.io/en/stable/quickstart.html#install-cnvkit
https://cnvkit.readthedocs.io/en/stable/

尝试使用conda，来安装cnvkit。

./conda config --add channels bioconda
 ./conda install cnvkit

./conda config --set channel_priority flexible
./conda install cnvkit

报错：

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: /
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed
UnsatisfiableError: The following specifications were found to be incompatible with each other:
Output in format: Requested package -> Available versions

检查运行环境，cnvkit需要在3.5以上的环境中运行。
查看python的版本。
python --version

Python 3.6.13 :: Anaconda, Inc.

满足要求。

安装R的依赖包。
进入到R的路径当中，运行指令：
install.packages("DNAcopy")
得到报错：

Warning message:
package ‘DNAcopy’ is not available (for R version 3.6.3)

这个问题，稍后再谈。
我们先把hg38的环境配置好，配置好之后的话，再使用hg38的参考基因组跑一遍。看看结果会不会有所相似（不敢求同）。
看一下，配置与参考基因组相关的环境，都需要哪些文件：
（1）参考基因组
（2）索引文件

dict
（上次已经构建完成）
fai
（上次已经构建完成）
sa/amb/ann/pac/bwt
服务器上已有的索引文件是不完整的，所以自己重新下载。
/home/xxzhang/workplace/software/bwa/bwa index -a bwtsw hg38.fasta
（正在构建过程中）
exon.bed（这个也是在RNA的时候需要时候）
（暂时不用，我们先跑DNA）

（3）hg38.fasta_resource
已经存放在服务器中（可能今年的12月份已经有人用过mutation calling的这个过程）。我们使用cp指令，将其拷贝到目录文件夹内，然后对其进行重命名。