我的任务是在几个scRNA数据中找到在突变基因不同细胞中特异性的表达。
方法来自论文:De novo detection of somatic mutations in high-throughput single-cell profiling data sets
github链接是:https://github.com/cortes-ciriano-lab/SComatic
声明:我要处理的是hg38的染色体。
1. 下载23条染色体的数据,网址Index of /goldenpath/hg38/chromosomes
找到需要的染色体的fa.gz文件,右击-->复制链接地址-->linux端
wget https://hgdownload.cse.ucsc.edu/goldenpath/hg38/chromosomes/chr10.fa.gz
2. 解压染色体文件,并得到染色体的.fai文件
gunzip chr10.fa.gz
samtools faidx chr10.fa
3. 按照步骤操作:
3.1 # step1: Splitting alignment file in cell type specific bams
sample=Example
output_dir1=$output_dir/Step1_BamCellTypes
mkdir -p $output_dir1
python $SCOMATIC/scripts/SplitBam/SplitBamCellTypes.py --bam $SCOMATIC/example_data/Example.scrnaseq.bam \
--meta $SCOMATIC/example_data/Example.cell_barcode_annotations.tsv \
--id ${sample} \
--n_trim 5 \
--max_nM 5 \
--max_NH 1 \
--outdir $output_dir1
3.2 #Step 2: Collecting base count information
REF=$SCOMATIC/example_data/chr10.fa
output_dir2=$output_dir/Step2_BaseCellCounts
mkdir -p $output_dir2
for bam in $(ls -d $output_dir1/*bam);do
# Cell type
cell_type=$(basename $bam | awk -F'.' '{print $(NF-1)}')
# Temp folder
temp=$output_dir2/temp_${cell_type}
mkdir -p $temp
# Command line to submit to cluster
python $SCOMATIC/scripts/BaseCellCounter/BaseCellCounter.py --bam $bam \
--ref $REF \
--chrom all \
--out_folder $output_dir2 \
--min_bq 30 \
--tmp_dir $temp \
--nprocs 1
rm -rf $temp
done
3.3 #Step 3: Merging base count matrices
sample=Example
output_dir3=$output_dir/Step3_BaseCellCountsMerged
mkdir -p $output_dir3
python $SCOMATIC/scripts/MergeCounts/MergeBaseCellCounts.py --tsv_folder ${output_dir2} \
--outfile ${output_dir3}/${sample}.BaseCellCounts.AllCellTypes.tsv
3.4 # Step 4: Detection of somatic mutations
# Step 4.1
output_dir4=$output_dir/Step4_VariantCalling
mkdir -p $output_dir4
sample=Example
REF=$SCOMATIC/example_data/chr10.fa
python $SCOMATIC/scripts/BaseCellCalling/BaseCellCalling.step1.py \
--infile ${output_dir3}/${sample}.BaseCellCounts.AllCellTypes.tsv \
--outfile ${output_dir4}/${sample} \
--ref $REF
# Step 4.2
editing=$SCOMATIC/RNAediting/AllEditingSites.hg38.txt
PON=$SCOMATIC/PoNs/PoN.scRNAseq.hg38.tsv
python $SCOMATIC/scripts/BaseCellCalling/BaseCellCalling.step2.py \
--infile ${output_dir4}/${sample}.calling.step1.tsv \
--outfile ${output_dir4}/${sample} \
--editing $editing \
--pon $PON
将以上代码段写在.sh脚本中,执行 ./ **.sh 运行即可
PS:如果有问题欢迎留言