I just hope that the text below will help someone who is struggling with charset encoding, specially when php-charset is different from the mysql-charset. Let me add that I really think that the php man-pages on the mysql-functions are lacking a lot of details on this important issues. Could someone add some useful text here?
Here is my situation. PHP5.2.4, MySql 4.1.15. A php web-application fully utf-8 encoded and a mysql database in latin1 charset.
To make this work I had to:
1. create and store all code files (php, html, inc, js, etc) in the utf-8 charset. Your editor should have an option for this, if not dump it.
2. check that your editor does not add a BOM (http://en.wikipedia.org/wiki/Byte-order_mark) at the beginning of the file. Use a hex-editor to detect them if needed.
3. Set your apache environment to utf-8 by adding 'AddDefaultCharset utf-8' to your .htaccess. If you do not use apache add 'default_charset utf-8' to your php.ini. You have to do either of them (not both), php will use the apache setting where needed.
4. Additionally add this meta-tag to your html-header: ''. This will help silly browsers (Oeps, IE again?) that ignore the utf-response-header send to them.
5. Check that the above line are listened to by check the 'page info' of your pages in firefox. It should show 2 (!!) utf-8 entries.
======== all of the above sofar has nothing to do with mysql ;-) ======
6. Do *NOT* (repeat NOT!) set the 'names' (set names *) or _ANY_ 'character set' (set character set *) (opposed to what they tell you on these pages).
7. Check the previous item by listing the results of the mysql query 'SHOW session VARIABLES'. All char_sets here should say 'latin1', except for the system one which is always 'utf8'. All collations should say 'latin1_*'. Furthermore the php function mysql_client_encoding() should also return latin1 (though I don't understand why; what does this value mean, I would think if php (being the client) is utf8 encoded this would be utf8?)
8 Finally test the above by storing this string in your db and output it in your webpage: 'Iñtërnâtiônàlizætiøn and €'.
Now what was interesting during testing and debugging of the above findings was:
1. If I would run 'mysql_set_charset('utf8')' _OR_ 'mysql_query("SET NAMES 'utf8'");' and then run a query in which I would have 'where char_column = 'abc''it would die with 'Illegal mix of collations'
2. If I would run 'mysql_query("SET character_set_client = 'utf8';"); mysql_query("SET character_set_result = 'utf8';")' the query would work BUT the non-ascii-characters would show scrambled in the browser.
3. BUT these 2 points above work just fine on my local dev-machine (php 5.2.3 & mysql 5.0.45)!!!!!!!!
This draws me to these 3 conclusions:
1. The Php-mysql-function library (5.2.+) does a fine job translating utf-8 queries & results to/from latin1! It's better to let php handle this for you then to have mysql do this.
2. Mysql (4.0.+) has 1 or more bugs (well, let's say unfinished features) that involve the charset-translations that are solved in 5.0.+.
3. It is not well enough documented! (Otherwise I would have to write this)
One last remark: clearly characters that exist in utf8 and not in latin1 (and vv.) will get lost during utf8-latin1-utf8 translation.
If any of the above is not correct or not complete feel free to correct this! (Or better yet, add a chapter to the php manual :-)
Cheers, Ronald