java indexof效率_Java indexOf函数比Rabin-Karp更有效?搜索文本的效率

几周前,我向Stackoverflow提出了一个关于创建一个有效的算法来搜索大块文本中的模式的问题.现在我使用String函数indexOf进行搜索.有一个建议是使用Rabin-Karp作为替代.我写了一个测试程序如下,以测试Rabin-Karp的实现如下.

public static void main(String[] args) {

String test = "Mary had a little lamb whose fleece was white as snow";

String p = "was";

long start = Calendar.getInstance().getTimeInMillis();

for (int x = 0; x < 200000; x++)

test.indexOf(p);

long end = Calendar.getInstance().getTimeInMillis();

end = end -start;

System.out.println("Standard Java Time->"+end);

RabinKarp searcher = new RabinKarp("was");

start = Calendar.getInstance().getTimeInMillis();

for (int x = 0; x < 200000; x++)

searcher.search(test);

end = Calendar.getInstance().getTimeInMillis();

end = end -start;

System.out.println("Rabin Karp time->"+end);

}

这里是我使用的Rabin-Karp的实现:

import java.math.BigInteger;

import java.util.Random;

public class RabinKarp {

private String pat; // the pattern // needed only for Las Vegas

private long patHash; // pattern hash value

private int M; // pattern length

private long Q; // a large prime, small enough to avoid long overflow

private int R; // radix

private long RM; // R^(M-1) % Q

static private long dochash = -1L;

public RabinKarp(int R, char[] pattern) {

throw new RuntimeException("Operation not supported yet");

}

public RabinKarp(String pat) {

this.pat = pat; // save pattern (needed only for Las Vegas)

R = 256;

M = pat.length();

Q = longRandomPrime();

// precompute R^(M-1) % Q for use in removing leading digit

RM = 1;

for (int i = 1; i <= M - 1; i++)

RM = (R * RM) % Q;

patHash = hash(pat, M);

}

// Compute hash for key[0..M-1].

private long hash(String key, int M) {

long h = 0;

for (int j = 0; j < M; j++)

h = (R * h + key.charAt(j)) % Q;

return h;

}

// Las Vegas version: does pat[] match txt[i..i-M+1] ?

private boolean check(String txt, int i) {

for (int j = 0; j < M; j++)

if (pat.charAt(j) != txt.charAt(i + j))

return false;

return true;

}

// check for exact match

public int search(String txt) {

int N = txt.length();

if (N < M)

return -1;

long txtHash;

if (dochash == -1L) {

txtHash = hash(txt, M);

dochash = txtHash;

} else

txtHash = dochash;

// check for match at offset 0

if ((patHash == txtHash) && check(txt, 0))

return 0;

// check for hash match; if hash match, check for exact match

for (int i = M; i < N; i++) {

// Remove leading digit, add trailing digit, check for match.

txtHash = (txtHash + Q - RM * txt.charAt(i - M) % Q) % Q;

txtHash = (txtHash * R + txt.charAt(i)) % Q;

// match

int offset = i - M + 1;

if ((patHash == txtHash) && check(txt, offset))

return offset;

}

// no match

return -1; // was N

}

// a random 31-bit prime

private static long longRandomPrime() {

BigInteger prime = new BigInteger(31, new Random());

return prime.longValue();

}

// test client

}

Rabin-Karp的实现工作原理是它返回正在找的字符串的正确偏移量.令人惊讶的是,当我运行测试程序时发生的时间统计.他们来了:

Standard Java Time->39

Rabin Karp time->409

这真的很令人惊讶不仅Rabin-Karp(至少在这里实现)不比标准的java indexOf String函数更快,它比一个数量级慢.我不知道是什么(如果有的话).有人有想法吗?

谢谢,

埃利奥特

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值