murmurhash3 java实现,Murmur3散列Python和Java实现之间的不同结果

I have two different program that wish to hash same string using Murmur3 in Python and Java respectively.

Python version 2.7.9:

mmh3.hash128('abc')

Gives 79267961763742113019008347020647561319L.

Java is Guava 18.0:

HashCode hashCode = Hashing.murmur3_128().newHasher().putString("abc", StandardCharsets.UTF_8).hash();

Gives string "6778ad3f3f3f96b4522dca264174a23b", converting to BigInterger gives 137537073056680613988840834069010096699.

How to get same result from both?

Thanks

解决方案

Here's how to get the same result from both:

byte[] mm3_le = Hashing.murmur3_128().hashString("abc", UTF_8).asBytes();

byte[] mm3_be = Bytes.toArray(Lists.reverse(Bytes.asList(mm3_le)));

assertEquals("79267961763742113019008347020647561319",

new BigInteger(mm3_be).toString());

The hash code's bytes need to be treated as little endian but BigInteger interprets bytes as big endian. You were presumably using new BigInteger(hex, 16) to create the BigInteger, but the output of HashCode.toString() is actually a series of pairs of hexadecimal digits representing the hash bytes in the same order they're returned by asBytes() (little endian). (You can also reverse those pairs of hexadecimal to get a hex number that does produce the same result when passed to new BigInteger(reversedHex, 16)).

I think the documentation of toString() is somewhat confusing because of the way it refers to "big endian"; it doesn't actually mean that the output of the method is the hexadecimal number representing the bytes interpreted as big endian.

We have an open issue for adding asBigInteger() to HashCode.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值