数据是sporadic的慢病case-control的组合。想用GATK germline best practice的方法进行突变的分析。这里主要参考GATK Germline best practice的教程。1 这里用的是GATK3.7的版本,目前已经出到GATK3.8。最近4.0也发布了。
部分步骤后续补完。。。
Map to Reference
bwa mem -t 8 -M -R '@RG\tID:${name}\tLB:${name}\tPL:ILLUMINA\tPM:X10\tSM:${name}' ${INDEX} ${RAW_DATA}/${name}_1.fastq ${RAW_DATA}/${name}_2.fastq > ${WORKING_DIR}/2018rerun/processed_bam/${name}.sam
$java -Xmx20g -jar $PICARD SortSam SORT_ORDER=coordinate INPUT=${WORKING_DIR}/2018rerun/processed_bam/${name}.sam OUTPUT=${WORKING_DIR}/2018rerun/processed_bam/${name}.bam
Basic Statistics
samtools flagstat ${WORKING_DIR}/2018rerun/processed_bam/${name}.bam > ${WORKING_DIR}/2018rerun/processed_bam/${name}.flagstat &
samtools stats ${WORKING_DIR}/2018rerun/processed_bam/${name}.bam > ${WORKING_DIR}/2018rerun/processed_bam/${name}.stats &
Mark Duplicates
$java -Xmx20g -jar $PICARD MarkDuplicates INPUT=${WORKING_DIR}/2018rerun/processed_bam/${name}.bam OUTPUT=${WORKING_DIR}/2018rerun/processed_bam/${name}_marked.bam METRICS_FILE=${WORKING_DIR}/2018rerun/processed_bam/${name}.metrics
samtools index ${WORKING_DIR}/2018rerun/processed_bam/${name}_marked.bam
Base Recalibration
$java -Xmx10g -jar $gatk_jar -T BaseRecalibrator -R $INDEX -I ${WORKING_DIR}/2018r