java字符串hash唯一,如何为对象生成(几乎)唯一的哈希ID?

How can I get an ID for my objects that makes it easy to distinguish it from others?

class MyClass {

private String s;

private MySecondClass c;

private Collection coll;

// ..many more

public Result calculate() {

/* use all field values recursively to calculate the result */

/* takes considerable amount of time. Implemented */

return result;

}

public String hash() {

/* use all field values recursively to generate a unique identifier */

// ?????

}

calculate() usually takes ~40 seconds to complete. Thus, I do not want to call it multiple times.

MyClass objects are quite huge (~60 MB). The Result value of the calculation will only be ~100 KB.

Whenever I am about to run the calculation on an object, my program should look up if that has been done some time earlier already, with the exact same values, recursively. If so, it will look up the result in (e.g.) a HashMap instead. Basically, MyClass objects itself could be used as keys, but the HashMap will include 30-200 elements - I obviously don't want to store all of that in full size. That's why I want to store 30-200 Hash/result values instead.

So, I thought I'd generate a ID (hash) over all values inside my MyClass object. How do I do that? This way, I can use that very hash to look up the result. I am aware that a hash code like MD5 will not guarantee 100% uniqueness, because multiple objects might have the same hash. However, if I store (at maximum) 200 elements via MD5, the chance for a twice used hash will be neglectible, I think. There are 16^32=3.4e38 different hash codes possible. I'll be happy to hear anybodys comments about it, or see other approaches.

Once the hash is generated, I don't need that object anymore, just its respective result value.

Two seperate objects with the exact same values have to return the same hash code. Much like original hashCode(), just with that I'm trying to maintain uniqueness. The probability for two objects having the same hash code should be absolutely neglectible.

I don't know how to describe the problem in other words anymore. If further clarification is needed, please ask.

So how can I generate my MyClass.hash()?

The problem isn't really about how or where to store the hashes, because I don't even know how I can generate an (almost) unique hash for an entire object, that will always be the same for same values.

Clarification:

When talking of size, I mean the serialized size on the hard drive.

I don't think putting the objects in a HashMap would decrease their size. That's whay I want to store some hash String instead. HashMap

When you put an object in a HashMap (either as a key or as a value), you don't create a copy of it. So storing 200 large objects in a HashMap consumes little more memory than the 200 objects themselves.

I do not store 200 large objects themselves. I only keep 200 different results (as values) which are small, and 200 respective hashCodes of MyClass objects which are also very small. The point of "hashing" the objects is to be able to work with the hash instead of with the object values themselves.

解决方案

If you want to create a hash of all of your data, you'll need to make sure that you can get all the values in byte format from them.

To do this, it's best if you have control of all the classes (except the Java built-in ones, perhaps), so that you can add a method to them to do this.

Given that your object is very large, it will probably not be a good idea to just collect it into one big byte array recursively and then calculate the digest. It's probably better to create the MessageDigest object, and add a method such as:

void updateDigest( MessageDigest md );

to each of them. You can declare an interface for this if you wish. Each such method will collect the class's own data that participates in the "big calculation" and update the md object with that data. After updating all its own data, it should recursively call the updateDigest method of any classes in it that have that method defined.

For example, if you have a class with fields:

int myNumber;

String myString;

MyClass myObj; // MyClass has the updateDigest method

Set otherObjects;

Then its updateDigest method should be doing something like this:

// Update the "plain" values that are in the current object

byte[] myStringBytes = myString.getBytes(StandardCharsets.UTF_8);

ByteBuffer buff = ByteBuffer.allocate(

Integer.SIZE / 8 // For myNumber

+ Integer.SIZE / 8 // For myString's length

+ myStringBytes.length

);

buff.putInt( myNumber );

buff.putInt( myStringBytes.length );

buff.put( myStringBytes );

buff.flip();

md.update(buff);

// Recurse

myObj.updateDigest(md);

for ( MyClass obj : otherObjects ) {

obj.updateDigest(md);

}

The reason I added the string's length (actually, its byte representation's length) to the digest is to avoid situations where you have two String fields:

String field1 = "ABCD";

String field2 = "EF";

If you just put their bytes directly into the digest one after the other, it will have the same effect on the digest as:

String field1 = "ABC";

String field2 = "DEF";

And this may cause an identical digest to be generated for two different sets of data. So adding the length will disambiguate it.

I used a ByteBuffer because it's relatively convenient to add things to it like int and double.

If you have classes that you don't control and cannot add a method to, you'll have to be creative. After all, you do get the values from every such class for the calculation, so you may call the same methods and digest their results. Or you could digest their serialized form if they are serializable.

So in your head class you'll create the md object using MessageDigest.getInstance("SHA") or whatever digest you wish to use.

MessageDigest md = null;

try {

md = MessageDigest.getInstance("SHA");

} catch (NoSuchAlgorithmException e) {

// Handle properly

}

// Call md.update with class's own data and recurse using

// updateDigest methods of internal objects

// Compute the digest

byte [] result = md.digest();

// Convert to string to be able to use in a hash map

BigInteger mediator = new BigInteger(1,result);

String key = String.format("%040x", mediator);

(You could actually use the BigInteger itself as the key).

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值