variant calling--SComatic

最新推荐文章于 2024-05-31 11:49:50 发布

李划水员

最新推荐文章于 2024-05-31 11:49:50 发布

阅读量248

点赞数 5

分类专栏：基因组文章标签： linux

本文链接：https://blog.csdn.net/daisy4/article/details/137038290

版权

基因组专栏收录该内容

1 篇文章 0 订阅

订阅专栏

我的任务是在几个scRNA数据中找到在突变基因不同细胞中特异性的表达。

方法来自论文：De novo detection of somatic mutations in high-throughput single-cell profiling data sets

github链接是：https://github.com/cortes-ciriano-lab/SComatic

声明：我要处理的是hg38的染色体。

1. 下载23条染色体的数据，网址Index of /goldenpath/hg38/chromosomes

找到需要的染色体的fa.gz文件，右击-->复制链接地址-->linux端

wget https://hgdownload.cse.ucsc.edu/goldenpath/hg38/chromosomes/chr10.fa.gz

2. 解压染色体文件，并得到染色体的.fai文件

gunzip chr10.fa.gz
samtools faidx chr10.fa

3. 按照步骤操作：

3.1 # step1: Splitting alignment file in cell type specific bams

sample=Example
output_dir1=$output_dir/Step1_BamCellTypes
mkdir -p $output_dir1

python $SCOMATIC/scripts/SplitBam/SplitBamCellTypes.py --bam $SCOMATIC/example_data/Example.scrnaseq.bam \
        --meta $SCOMATIC/example_data/Example.cell_barcode_annotations.tsv \
        --id ${sample} \
        --n_trim 5 \
        --max_nM 5 \
        --max_NH 1 \
        --outdir $output_dir1

3.2 #Step 2: Collecting base count information

REF=$SCOMATIC/example_data/chr10.fa

output_dir2=$output_dir/Step2_BaseCellCounts
mkdir -p $output_dir2

for bam in $(ls -d $output_dir1/*bam);do
  
  # Cell type
  cell_type=$(basename $bam | awk -F'.' '{print $(NF-1)}')

  # Temp folder
  temp=$output_dir2/temp_${cell_type}
  mkdir -p $temp

  # Command line to submit to cluster
  python $SCOMATIC/scripts/BaseCellCounter/BaseCellCounter.py --bam $bam \
    --ref $REF \
    --chrom all \
    --out_folder $output_dir2 \
    --min_bq 30 \
    --tmp_dir $temp \
    --nprocs 1

  rm -rf $temp
done

3.3 #Step 3: Merging base count matrices

sample=Example
output_dir3=$output_dir/Step3_BaseCellCountsMerged
mkdir -p $output_dir3

python $SCOMATIC/scripts/MergeCounts/MergeBaseCellCounts.py --tsv_folder ${output_dir2} \
  --outfile ${output_dir3}/${sample}.BaseCellCounts.AllCellTypes.tsv

3.4 # Step 4: Detection of somatic mutations

# Step 4.1
output_dir4=$output_dir/Step4_VariantCalling
mkdir -p $output_dir4

sample=Example
REF=$SCOMATIC/example_data/chr10.fa

python $SCOMATIC/scripts/BaseCellCalling/BaseCellCalling.step1.py \
          --infile ${output_dir3}/${sample}.BaseCellCounts.AllCellTypes.tsv \
          --outfile ${output_dir4}/${sample} \
          --ref $REF

# Step 4.2
editing=$SCOMATIC/RNAediting/AllEditingSites.hg38.txt
PON=$SCOMATIC/PoNs/PoN.scRNAseq.hg38.tsv

python $SCOMATIC/scripts/BaseCellCalling/BaseCellCalling.step2.py \
          --infile ${output_dir4}/${sample}.calling.step1.tsv \
          --outfile ${output_dir4}/${sample} \
          --editing $editing \
          --pon $PON

将以上代码段写在.sh脚本中，执行 ./ **.sh 运行即可

PS：如果有问题欢迎留言

李划水员

关注

5
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
variant calling--SComatic

方法来自论文：De novo detection of somatic mutations in high-throughput single-cell profiling data sets。找到需要的染色体的fa.gz文件，右击-->复制链接地址-->linux端。将以上代码段写在.sh脚本中，执行 ./ **.sh 运行即可。我的任务是在几个scRNA数据中找到在突变基因不同细胞中特异性的表达。2. 解压染色体文件，并得到染色体的.fai文件。声明：我要处理的是hg38的染色体。
复制链接

扫一扫