具有LCS方法的通用文本比较工具

常见的问题是检测并显示两个文本(尤其是几百行或几千行)的差异。 使用纯java.lang.String类方法可能是一种解决方案,但是对于此类操作最重要的问题是,“性能”将不能令人满意。 我们需要一种有效的解决方案,其可能具有以下观点:
文字差异工具示例

该问题包含两个部分:

  • 检测两个文本的差异:为了检测差异,在此解决方案中使用了一种有效的LCS(最长公共子序列)动态算法 。 该解决方案具有O(text1WordCount * text2WordCount)复杂度,并在下面被编码为“ longestCommonSubsequence”方法。
  • 可视化差异:可视化使用基于HTML标记的方法,该方法将text2的新单词标记为绿色,将text1的旧单词标记为红色。 该解决方案具有O(changedWordsCount *(text1WordCount + text2WordCount))复杂度,并在下面被编码为“ markTextDifferences”方法。

注意1:为简单起见,“ normalizeText”方法用于删除\ n,\ t和多个空格字符。 注意2:此类创建为Vaadin组件。 但是,“ longestCommonSubsequence”是纯通用的,“ markTextDifferences”方法是基于HTML的可视组件的通用,因此它们也可以用于不同的框架。

import java.util.ArrayList;
import com.vaadin.ui.CustomComponent;
import com.vaadin.ui.Label;
import com.vaadin.ui.Layout;
import com.vaadin.ui.VerticalLayout;

/**
* Text comparison component which marks differences of two texts with colors.
* 
* @author cb
*/
public class TextCompareComponent extends CustomComponent {

    private Layout mainLayout = new VerticalLayout();
    private ArrayList<String> longestCommonSubsequenceList;
    private static final String INSERT_COLOR = "#99FFCC";
    private static final String DELETE_COLOR = "#CB6D6D";

    public TextCompareComponent(String text1, String text2) {

        text1 = normalizeText(text1);
        text2 = normalizeText(text2);

        this.longestCommonSubsequenceList = longestCommonSubsequence(text1, text2);
        String result = markTextDifferences(text1, text2, 
            longestCommonSubsequenceList, INSERT_COLOR, DELETE_COLOR);

        Label label = new Label(result, Label.CONTENT_XHTML);
        mainLayout.addComponent(label);
        setCompositionRoot(mainLayout);
    }

     /**
      * Finds a list of longest common subsequences (lcs) of given two texts.
      * 
      * @param text1
      * @param text2
      * @return - longest common subsequence list
      */
     private ArrayList<String> longestCommonSubsequence(String text1,String text2){

         String[] text1Words = text1.split(" ");
         String[] text2Words = text2.split(" ");

         int text1WordCount = text1Words.length;
         int text2WordCount = text2Words.length;

         int[][] solutionMatrix = new int[text1WordCount + 1][text2WordCount + 1];

         for (int i = text1WordCount - 1; i >=0; i--) {
             for (int j = text2WordCount - 1; j >= 0; j--) {
                 if (text1Words[i].equals(text2Words[j])){
                     solutionMatrix[i][j] = solutionMatrix[i + 1][j + 1] + 1;
                 }
                 else {
                     solutionMatrix[i][j] = Math.max(solutionMatrix[i + 1][j],
                         solutionMatrix[i][j + 1]);
                 }
              }
          }

          int i = 0, j = 0;
          ArrayList<String> lcsResultList =new ArrayList<String>();

          while (i < text1WordCount && j < text2WordCount) {
              if (text1Words[i].equals(text2Words[j])) {
                  lcsResultList.add(text2Words[j]);
                  i++;
                  j++;
              } 
              else if (solutionMatrix[i + 1][j] >= solutionMatrix[i][j + 1]) {
                  i++;
              }
              else {
                  j++;
              }
          }
          return lcsResultList;
      }

      /**
       * Normalizes given string by deleting \n, \t and extra spaces.
       * 
       * @param text - initial string
       * @return - normalized string
       */
       private String normalizeText(String text) {

           text = text.trim();
           text = text.replace("\n", " ");
           text = text.replace("\t", " ");

           while (text.contains("  ")) {
               text = text.replace("  ", " ");
           }
           return text;
       }

      /**
       * Returns colored inserted/deleted text representation of given two texts.
       * Uses longestCommonSubsequenceList to determine colored sections.
       *
       * @param text1
       * @param text2
       * @param lcsList
       * @param insertColor
       * @param deleteColor
       * @return - colored result text
       */
      private String markTextDifferences(String text1, String text2,
        ArrayList<String> lcsList, String insertColor, String deleteColor) {

        StringBuffer stringBuffer = new StringBuffer();

        if (text1 != null && lcsList != null) {

            String[] text1Words = text1.split(" ");
            String[] text2Words = text2.split(" ");
            int i = 0, j = 0, word1LastIndex = 0, word2LastIndex = 0;

            for (int k = 0; k < lcsList.size(); k++) {
                for (i = word1LastIndex, j = word2LastIndex;
                    i < text1Words.length && j < text2Words.length;) {
                    if (text1Words[i].equals(lcsList.get(k)) &&
                        text2Words[j].equals(lcsList.get(k))) {
                        stringBuffer.append("<SPAN>" + lcsList.get(k) + " </SPAN>");
                        word1LastIndex = i + 1;
                        word2LastIndex = j + 1;
                        i = text1Words.length;
                        j = text2Words.length;
                    }
                    else if (!text1Words[i].equals(lcsList.get(k))) {
                        for (; i < text1Words.length &&
                            !text1Words[i].equals(lcsList.get(k)); i++) {
                            stringBuffer.append("<SPAN style='BACKGROUND-COLOR:" +
                                deleteColor + "'>" + text1Words[i] + " </SPAN>");
                        }
                    } else if (!text2Words[j].equals(lcsList.get(k))) {
                        for (; j < text2Words.length &&
                            !text2Words[j].equals(lcsList.get(k)), j++) {
                            stringBuffer.append("<SPAN style='BACKGROUND-COLOR:" +
                                insertColor + "'>" + text2Words[j] + " </SPAN>");
                        }
                    }
                }
            }
            for (; word1LastIndex < text1Words.length; word1LastIndex++) {
                stringBuffer.append("<SPAN style='BACKGROUND-COLOR:" +
                    deleteColor + "'>" + text1Words[word1LastIndex] + " </SPAN>");
            }
            for (; word2LastIndex < text2Words.length; word2LastIndex++) {
                stringBuffer.append("<SPAN style='BACKGROUND-COLOR:" +
                    insertColor + "'>" + text2Words[word2LastIndex] + " </SPAN>");
            }
        }
        return stringBuffer.toString();
    }
}

参考:我们的JCG合作伙伴 Cagdas Basaraner在CodeBuild博客上使用LCS方法实现了通用文本比较工具


翻译自: https://www.javacodegeeks.com/2012/05/generic-text-comparison-tool-with-lcs.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值