这段时间很忙,呵呵,没时间写blog。
前两天看了一个文本比较的算法,算法的思路我就不多说了,主要说下我的实现。算法参考:
文本比较算法剖析(1)-如何确定最大匹配率
文本比较算法剖析(2)-如何确定最优匹配路径
我的实现步骤是:
1、计算出所有可行的路径
如下图中,N(l,r)所在的位置如果该位置匹配,则肯定要走A区。在扫描A区所有的可行路径,可行路径的标准是下一个可行并且匹配的点,在这一步先不考虑不普配的点。
关键代码 :
public
List
<
CharPoint
>
getNextPoints(
boolean
[][] comps, CharPoint p) {
if (comps == null || p == null )
throw new IllegalArgumentException( " 参数不能为空。 " );
int maxY = 0 ;
List < CharPoint > cps = new ArrayList < CharPoint > ();
for ( int i = p.getX() + 1 ; i < comps.length; i ++ ) {
if (maxY == 0 && p.getX() + 1 == i) {
maxY = comps[i].length - 1 ;
}
int maxJ = maxY;
boolean bo = false ;
for ( int j = p.getY() + 1 ; j <= maxJ; j ++ ) {
if (comps[i][j]) {
if ( ! bo) {
bo = true ;
maxY = j;
}
CharPoint cp = new CharPoint(i, j);
cp.setFromPoint(p);
cps.add(cp);
}
}
}
return cps;
}
if (comps == null || p == null )
throw new IllegalArgumentException( " 参数不能为空。 " );
int maxY = 0 ;
List < CharPoint > cps = new ArrayList < CharPoint > ();
for ( int i = p.getX() + 1 ; i < comps.length; i ++ ) {
if (maxY == 0 && p.getX() + 1 == i) {
maxY = comps[i].length - 1 ;
}
int maxJ = maxY;
boolean bo = false ;
for ( int j = p.getY() + 1 ; j <= maxJ; j ++ ) {
if (comps[i][j]) {
if ( ! bo) {
bo = true ;
maxY = j;
}
CharPoint cp = new CharPoint(i, j);
cp.setFromPoint(p);
cps.add(cp);
}
}
}
return cps;
}
2、计算出最大匹配的路径
获得所有可行的路径后再对所有路径的节点数进行判断,节点最多的则是最大匹配路径。但有可能有几个最大匹配路径。
关键代码:
for
(CharPoint cp : map.keySet()) {
// System.out.println(map.get(cp));
if (map.get(cp) == max) {
cps.add(cp);
}
}
for (CharPoint cp : cps) {
cp = TextComparerUtil.copyCharPoint(cp);
while (cp.getFromPoint() != null ) {
cp.getFromPoint().setNext(cp);
cp = cp.getFromPoint();
}
cps2.add(cp);
}
// System.out.println(map.get(cp));
if (map.get(cp) == max) {
cps.add(cp);
}
}
for (CharPoint cp : cps) {
cp = TextComparerUtil.copyCharPoint(cp);
while (cp.getFromPoint() != null ) {
cp.getFromPoint().setNext(cp);
cp = cp.getFromPoint();
}
cps2.add(cp);
}
在计算完最大匹配路径后在对路径进行补充。使路径完整
关键代码:
public
static
List
<
CharPoint
>
applySetpBySetpPath(
boolean
[][] comps,
List < CharPoint > cps) {
for (CharPoint cp : cps) {
CharPoint next = cp.getNext();
applySetpPath(comps, cp, next);
}
return cps;
}
private static void applySetpPath( boolean [][] comps, CharPoint cp,
CharPoint next) {
while (next != null ) {
if (cp.getX() >= 0 && cp.getY() >= 0
&& cp.getX() + 1 < comps.length
&& cp.getY() + 1 < comps[cp.getX()].length
&& comps[cp.getX()][cp.getY()]) {
cp.setNext( new CharPoint(cp.getX() + 1 , cp.getY() + 1 ));
cp = cp.getNext();
}
for ( int i = cp.getX() + 1 ; i <= next.getX(); i ++ ) {
cp.setNext( new CharPoint(i, cp.getY()));
cp = cp.getNext();
}
for ( int i = cp.getY() + 1 ; i <= next.getY(); i ++ ) {
cp.setNext( new CharPoint(cp.getX(), i));
cp = cp.getNext();
}
next = next.getNext();
}
}
public static CharPoint applyToEnd( boolean [][] comps, CharPoint cp) {
while (cp.getNext() != null ) {
cp = cp.getNext();
}
CharPoint next = new CharPoint(comps.length - 1 , comps[ 0 ].length - 1 );
applySetpPath(comps, cp, next);
return cp;
}
List < CharPoint > cps) {
for (CharPoint cp : cps) {
CharPoint next = cp.getNext();
applySetpPath(comps, cp, next);
}
return cps;
}
private static void applySetpPath( boolean [][] comps, CharPoint cp,
CharPoint next) {
while (next != null ) {
if (cp.getX() >= 0 && cp.getY() >= 0
&& cp.getX() + 1 < comps.length
&& cp.getY() + 1 < comps[cp.getX()].length
&& comps[cp.getX()][cp.getY()]) {
cp.setNext( new CharPoint(cp.getX() + 1 , cp.getY() + 1 ));
cp = cp.getNext();
}
for ( int i = cp.getX() + 1 ; i <= next.getX(); i ++ ) {
cp.setNext( new CharPoint(i, cp.getY()));
cp = cp.getNext();
}
for ( int i = cp.getY() + 1 ; i <= next.getY(); i ++ ) {
cp.setNext( new CharPoint(cp.getX(), i));
cp = cp.getNext();
}
next = next.getNext();
}
}
public static CharPoint applyToEnd( boolean [][] comps, CharPoint cp) {
while (cp.getNext() != null ) {
cp = cp.getNext();
}
CharPoint next = new CharPoint(comps.length - 1 , comps[ 0 ].length - 1 );
applySetpPath(comps, cp, next);
return cp;
}
3、计算最优路径
计算最优路径的方法很简单,按照作者的算法的介绍,计算最优路径的方法就是最后匹配的节点离root(0,0)点最近的那条路径
4、计算差异
计算差异也很简单,就不介绍了,直接上代码
TextDiff diffdel
=
null
;
TextDiff diffins = null ;
while (root != null ) {
if (root.getNext() != null ) {
CharPoint next = root.getNext();
if (root.getX() == next.getX() && root.getY() != next.getY()) {
if (diffdel == null ) {
int start = root.getY();
if (comps[root.getX()][root.getY()]) {
start = next.getY();
}
diffdel = new TextDiff(TextDiff.TYPE_DELETED, start, 0 );
}
diffdel.setDiffLength(diffdel.getDiffLength() + 1 );
}
if (root.getY() == next.getY() && root.getX() != next.getX()) {
if (diffins == null ) {
int start = root.getX();
if (comps[root.getX()][root.getY()]) {
start = next.getX();
}
diffins = new TextDiff(TextDiff.TYPE_INSERTED, start, 0 );
}
diffins.setDiffLength(diffins.getDiffLength() + 1 );
}
if (root.getX() < comps.length
&& root.getY() < comps[root.getX()].length
&& comps[next.getX()][next.getY()]) {
if (diffdel != null && diffdel.getDiffLength() != 0 ) {
diffs.add(diffdel);
}
if (diffins != null && diffins.getDiffLength() != 0 ) {
diffs.add(diffins);
}
diffdel = null ;
diffins = null ;
}
}
root = root.getNext();
if (root != null && root.getNext() == null ) {
if (diffdel != null && diffdel.getDiffLength() != 0 ) {
diffs.add(diffdel);
}
if (diffins != null && diffins.getDiffLength() != 0 ) {
diffs.add(diffins);
}
}
}
TextDiff diffins = null ;
while (root != null ) {
if (root.getNext() != null ) {
CharPoint next = root.getNext();
if (root.getX() == next.getX() && root.getY() != next.getY()) {
if (diffdel == null ) {
int start = root.getY();
if (comps[root.getX()][root.getY()]) {
start = next.getY();
}
diffdel = new TextDiff(TextDiff.TYPE_DELETED, start, 0 );
}
diffdel.setDiffLength(diffdel.getDiffLength() + 1 );
}
if (root.getY() == next.getY() && root.getX() != next.getX()) {
if (diffins == null ) {
int start = root.getX();
if (comps[root.getX()][root.getY()]) {
start = next.getX();
}
diffins = new TextDiff(TextDiff.TYPE_INSERTED, start, 0 );
}
diffins.setDiffLength(diffins.getDiffLength() + 1 );
}
if (root.getX() < comps.length
&& root.getY() < comps[root.getX()].length
&& comps[next.getX()][next.getY()]) {
if (diffdel != null && diffdel.getDiffLength() != 0 ) {
diffs.add(diffdel);
}
if (diffins != null && diffins.getDiffLength() != 0 ) {
diffs.add(diffins);
}
diffdel = null ;
diffins = null ;
}
}
root = root.getNext();
if (root != null && root.getNext() == null ) {
if (diffdel != null && diffdel.getDiffLength() != 0 ) {
diffs.add(diffdel);
}
if (diffins != null && diffins.getDiffLength() != 0 ) {
diffs.add(diffins);
}
}
}
(已检测到代码有错误,正在修改ing,随后发表修正文章。。。)
顺便提前祝各位新春快乐,新年吉祥,和家安康