I am trying to push user account data from an Active Directory to our MySQL-Server. This works flawlessly but somehow the strings end up showing an encoded version of umlauts and other special characters.
The Active Directory returns a string using this sample format: M\xc3\xbcller
This actually is the UTF-8 encoding for Müller, but I want to write Müller to my database not M\xc3\xbcller.
I tried converting the string with this line, but it results in the same string in the database:
tempEntry[1] = tempEntry[1].decode("utf-8")
If I run print "M\xc3\xbcller".decode("utf-8") in the python console the output is correct.
Is there any way to insert this string the right way? I need this specific format for a web developer who wants to have this exact format, I don't know why he is not able to convert the string using PHP directly.
Additional info: I am using MySQLdb; The table and column encoding is utf8_general_ci
解决方案
I found the solution to my problems. Decoding the String with .decode('unicode_escape').encode('iso8859-1').decode('utf8') did work at last. Now everything is inserted as it should. The full other solution can be found here: Working with unicode encoded Strings from Active Directory via python-ldap