bwa-mem中的ksw算法到底是怎么回事儿？

最新推荐文章于 2024-09-01 00:00:00 发布

小c轩令

最新推荐文章于 2024-09-01 00:00:00 发布

阅读量1.8k

点赞数 2

分类专栏： CUDA编程 BWA

本文链接：https://blog.csdn.net/qq_39499621/article/details/85144258

版权

CUDA编程同时被 2 个专栏收录

11 篇文章 1 订阅

订阅专栏

BWA

4 篇文章 1 订阅

订阅专栏

ksw算法小结

研究了这么久bwa-mem算法，也不只一次啃了ksw这块硬骨头，之前没有好好总结，今天来小结一下函数的输入输出。
主要研究的对象是ksw_extend2函数，它是一个单线程、非完全匹配的sequence alignment算法，该算法基于动态规划的逻辑，进行数据匹配，由于数据间存在大量相关性，所以不方便并行实现，串行实现有十分耗时，我们的工作就是，把这一块从原工程中拎出来，并改写成GPU加速版本。

下面分析函数的输入输出，把英语的注释翻译成了中文。

ksw_extend2

   @param qlen    query length		待匹配段碱基的query长度
 * @param query   query sequence with 0 <= query[i] < m    //query的指针		
 * @param tlen    target length                               //reference长度
 * @param target  target sequence with 0 <= target[i] < m          //reference数据的指针
 * @param m       number of residue types                           // 碱基种类=5
 * @param mat     m*m scoring mattrix in one-dimension array      //每个位置的query和target的匹配得分
 * @param gapo    gap open penalty; a gap of length l cost "-(gapo+l*gape)"  //错配开始的惩罚系数=6
 * @param gape    gap extension penalty       						//错配继续的惩罚系数=1
 * @param w       band width					//提前剪枝系数，w =100   匹配位置和beg的最大距离
 * @param         end_bonus                                             end_bonus=5
 * @param         zdrop 							 zdrop=100
 * @param h0      alignment score of upstream sequences   //该seed的初始得分（完全匹配query的碱基数）
 * @param _qle    (out) length of the query in the alignment     //匹配得到全局最大得分的碱基在query的位置
 * @param _tle    (out) length of the target in the alignment   //匹配得到全局最大得分的碱基在reference的位置
 * @param _gtle   (out) length of the target if query is fully aligned	//query全部匹配上的target的长度
 * @param _gscore (out) score of the best end-to-end alignment;    //query的端到端匹配得分
 * @param _gscore (out) max_off  								//取得最大得分时在query和reference上位置差的 最大值
 
 * @return        best semi-local alignment score

int ksw_extend2(int qlen, const uint8_t *query, int tlen, const uint8_t *target, int m, const int8_t *mat,int o_del, int e_del, int o_ins, int e_ins, int w, int end_bonus, int zdrop, int h0, int *qle, int *tle, int *gtle, int *gscore, int *max_off);

其中绿色的四个参数对应上面的gapo和gape两个参数