I need to generate a MD5 Hash in Python 3 to compare with a MD5 Hash that was generated on Python 2, but the result for json.dumps() is different, because on Python 2 the position of the elements changes and the MD5 result is different.
How can I generate the same result?
The code:
content = {'name': 'Marcelo', 'age': 30, 'address': {'country': 'Brasil'}, 'interests': [{'id': 1, 'description': 'tecnology'}]}
print('CONTENT:', json.dumps(content))
print('MD5:', md5(str(content).encode('UTF-8')).hexdigest())
The Python 2.7 result:
('CONTENT:', {'interests': [{'id': 1, 'description': 'tecnology'}], 'age': 30, 'name': 'Marcelo', 'address': {'country': 'Brasil'}})
('MD5:', 'a396f6997fb420992d96b37e8f37938d')
The Python 3.6 result:
CONTENT: {'name': 'Marcelo', 'age': 30, 'address': {'country': 'Brasil'}, 'interests': [{'id': 1, 'description': 'tecnology'}]}
MD5: 40c601152725654148811749d9fc8878
Edit:
I can't change the MD5 generated on Python 2. There is any way to reproduce the default order from Python 2 on Python 3?
解决方案
In Python before 3.6, the dictionary keys are not ordered. So in Python 3.6, the keys maintain the order of their insertion (or in the case of a dictionary literal, how they appear in the literal). The Python 2.7 dictionary is unordered, so the looping order does not necessarily match insertion order.
If you were to reload the json dictionary in both cases, it would still be equal (dictionary equality does not depend on order).
So there is no error here. The difference is because of how dictionaries are ordered in different Python versions.
json.dump and json.dumps write the key-value pairs out in the dictionary looping order. So in order to have consistent looping order, it would be best to use the collections.OrderedDict type in order to achieve consistent ordering. If you are calling json.load to get dictionaries, you would also need to use json.loads(text, object_hook=OrderedDict), which will then maintain ordering.
There is no trivial way to make Python 3 dictionaries use a Python 2 ordering, so moving both 2 and 3 code bases to use OrderedDict is a more maintainable solution.