I'm struggling a bit to generate ID of type integer for given string in Python.
I thought the built-it hash function is perfect but it appears that the IDs are too long sometimes. It's a problem since I'm limited to 64bits as maximum length.
My code so far: hash(s) % 10000000000.
The input string(s) which I can expect will be in range of 12-512 chars long.
Requirements are:
integers only
generated from provided string
ideally up to 10-12 chars long (I'll have ~5 million items only)
low probability of collision..?
I would be glad if someone can provide any tips / solutions.
解决方案
I would do something like this:
>>> import hashlib
>>> m = hashlib.md5()
>>> m.update("some string")
>>> str(int(m.hexdigest(), 16))[0:12]
'120665287271'
The idea:
Calculate the hash of a string with MD5 (or SHA-1 or ...) in hexadecimal form (see module hashlib)
Convert the string into an integer and reconvert it to a String with base 10 (there are just digits in the result)
Use the first 12 characters of the string.
If characters a-f are also okay, I would do m.hexdigest()[0:12].