动态规划-序列比对-Smith-Waterman

背景

  近期遇到一些关于字符串比对的业务需求,并通过编辑距离进行了实现,并达到了业务要求。基于类似的业务场景,作者思考有没有其他的方案:故了解Smith-Waterman 算法并进行本文的编写。

示例

  假设有如下两个字符串,对这两个字符串进行比对,并设置一系列的参数,可以得到一系列相似子串。

  输入两个字符串,在限制参数的情况下,输出三组相似的子串,其中-代表差异。

  1. input.sequence1: AACGTACTCAAGTCT
  2. input.sequence2: TCGTACTCTAACGAT
  3. output.list[0]: CGTACTCCAA->CGTACTC-AA
  4. output.list[1]: CGTACTCCAAAG->CGTACTC-AA-G
  5. output.list[2]: CGTACTCCAAAGGT->CGTACTC-AA-G-T

算法

package com.pap.base.util.string;

import java.util.ArrayList;
import java.util.List;

public class SmithWaterman {

    /**
     * List<Pair<String, String>> alignments = smithWaterman(sequence1, sequence2, matchScore, mismatchScore, gapPenalty);
     *
     * @param sequence1     AACGTACTCAAGTCT
     * @param sequence2     TCGTACTCTAACGAT
     * @param matchScore    2
     * @param mismatchScore -1
     * @param gapPenalty    -2
     * @return ["CGTACTCCAA : CGTACTC-AA",  "CGTACTCCAAAG : CGTACTC-AA-G",  "CGTACTCCAAAGGT : CGTACTC-AA-G-T"]
     */
    public static List<Pair<String, String>> smithWaterman(String sequence1, String sequence2, int matchScore, int mismatchScore, int gapPenalty) {
        int m = sequence1.length();
        int n = sequence2.length();

        int[][] matrix = new int[m + 1][n + 1];
        List<Pair<String, String>> alignments = new ArrayList<>();
        int maxScore = 0;

        for (int i = 0; i <= m; i++) {
            matrix[i][0] = 0;
        }

        for (int j = 0; j <= n; j++) {
            matrix[0][j] = 0;
        }

        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                int match = matrix[i - 1][j - 1] + (sequence1.charAt(i - 1) == sequence2.charAt(j - 1) ? matchScore : mismatchScore);
                int delete = matrix[i - 1][j] + gapPenalty;
                int insert = matrix[i][j - 1] + gapPenalty;

                matrix[i][j] = Math.max(0, Math.max(match, Math.max(delete, insert)));

                if (matrix[i][j] >= maxScore) {
                    if (matrix[i][j] > maxScore) {
                        maxScore = matrix[i][j];
                        alignments.clear();
                    }
                    Pair<String, String> alignment = backtrack(matrix, sequence1, sequence2, i, j, "", "", matchScore, mismatchScore);
                    alignments.add(alignment);
                }
            }
        }

        return alignments;
    }

    private static Pair<String, String> backtrack(int[][] matrix, String sequence1, String sequence2, int i, int j, String currentAlignment1, String currentAlignment2, int matchScore, int mismatchScore) {
        if (i == 0 || j == 0 || matrix[i][j] == 0) {
            return new Pair<>(currentAlignment1, currentAlignment2);
        }

        if (matrix[i][j] == matrix[i - 1][j - 1] + (sequence1.charAt(i - 1) == sequence2.charAt(j - 1) ? matchScore : mismatchScore)) {
            return backtrack(matrix, sequence1, sequence2, i - 1, j - 1, sequence1.charAt(i - 1) + currentAlignment1, sequence2.charAt(j - 1) + currentAlignment2, matchScore, mismatchScore);
        }

        if (matrix[i][j] == matrix[i - 1][j] - 2) {
            return backtrack(matrix, sequence1, sequence2, i - 1, j, "-" + currentAlignment1, sequence2.charAt(j - 1) + currentAlignment2, matchScore, mismatchScore);
        }

        if (matrix[i][j] == matrix[i][j - 1] - 2) {
            return backtrack(matrix, sequence1, sequence2, i, j - 1, sequence1.charAt(i - 1) + currentAlignment1, "-" + currentAlignment2, matchScore, mismatchScore);
        }

        return null;
    }

    static class Pair<K, V> {
        private K key;
        private V value;

        public Pair(K key, V value) {
            this.key = key;
            this.value = value;
        }

        public K getKey() {
            return key;
        }

        public V getValue() {
            return value;
        }

        public void setKey(K key) {
            this.key = key;
        }

        public void setValue(V value) {
            this.value = value;
        }

        @Override
        public String toString() {
            return "(" + key + ", " + value + ")";
        }

    }
}

参考

  1. http://pap-docs.pap.net.cn/
  2. https://gitee.com/alexgaoyh
  • 5
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值