- 摘要
- 最近在搭建WGBS流程,使用bs_seeker2对数据进行比对后需要有一个统计结果。将就之前的脚本稍微改了一下,直接拿来用。
- 环境配置
- python:2.7.18
- BS-Seeker2: v2.1.7
- 目标文件格式
- 该log文件生成行数是不确定的,但统计结果都是在最后几行,所以我们的脚本里使用负值代表倒数几行开始抓取统计结果。
- 使用代码
-
import re import os newfile_name = '02.align/map_stat.txt' newfile = open(newfile_name,'w') stat_title = "Sample_ID Total_Reads Mapped_Reads Uniq_Mapped_Reads mCG_Map_rates mCHG_Map_rates mCHH_Map_rates"+"\n" newfile.write(stat_title) for root,dirs,files in os.walk(r"logs"): for file in files: if ".bs_seeker2_log" in file: #print(os.path.join(root,file)) log_file = open(os.path.join(root,file),'r') stat_line = log_file.readlines() total_reads = stat_line[-21].split(" ") total_reads = int(total_reads[-2]) mapped_rates = stat_line[-12].split(" ") mapped_rates = float(mapped_rates[-1].replace("%",'')) mapped_reads = int(total_reads*mapped_rates*0.01) Uniq_map_reads = stat_line[-11].split(" ") Uniq_map_reads = int(Uniq_map_reads[-1]) Uniq_map_rates = float(Uniq_map_reads*100/total_reads) mCG_cov_rates = stat_line[-6].split(" ") mCG_cov_rates = float(mCG_cov_rates[-1].replace("%",'')) mCHG_cov_rates = stat_line[-5].split(" ") mCHG_cov_rates = float(mCHG_cov_rates[-1].replace("%",'')) mCHH_cov_rates = stat_line[-4].split(" ") mCHH_cov_rates = float(mCHH_cov_rates[-1].replace("%",'')) file_name = file.replace('_tmp.bam.bs_seeker2_log','') result_stat = file_name+"\t"+'%.0f'%total_reads+"\t"+'%.d'%mapped_reads+'('+'%.2f'%mapped_rates+'%)'+"\t"+'%.0f'%Uniq_map_reads+'('+'%.2f'%Uniq_map_rates+'%)'+"\t"+'%.2f'%mCG_cov_rates+'%'+"\t"+'%.2f'%mCHG_cov_rates+'%'+"\t"+'%.2f'%mCHH_cov_rates+'%'+"\n" newfile.write(result_stat) newfile.close() log_file.close()
-
- 结果展示
- 总结
- 这套脚本可以针对任何比对结果生成的log文件进行微调适配,代码修改时间大概0.5h,非常方便。
2021.03.26丨甲基化分析工具bs_seeker2比对统计脚本
最新推荐文章于 2023-06-12 12:04:48 发布