I'm stuck trying to fetch UTF-8 text in a MySQL database from R. I'm running R on OS X (tried both via the GUI and command line), where the default locale is en_US.UTF-8, and no matter what I try, the query result shows "?" for all non-ASCII characters.
I've tried setting options(encoding='UTF-8'), DBMSencoding='UTF-8' when connecting via ODBC, setting Encoding(res$str)
I'm totally stumped. Any ideas why it's not working, or other things I should try?
Here's a fairly minimal test case:
$ mysql -u root
mysql> CREATE DATABASE test;
mysql> USE test;
mysql> CREATE TABLE test (str VARCHAR(10)) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.02 sec)
mysql> INSERT INTO test (str) VALUES ('こんにちは');
Query OK, 1 row affected (0.00 sec)
mysql> select * from test;
+-----------------+
| str |
+-----------------+
| こんにちは |
+-----------------+
1 row in set (0.00 sec)
Querying the table in R using both RODBC and RMySQL shows "?????" for the str column:
> con
> sqlQuery(con, 'SELECT * FROM rtest.test')
str
1 ?????
> library(RMySQL)
Loading required package: DBI
> con
> dbGetQuery(con, 'SELECT * FROM rtest.test')
str
1 ?????
For completeness, here's my sessionInfo:
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RMySQL_0.9-3 DBI_0.2-5 RODBC_1.3-6
解决方案
Thanks to @chooban I found out the connection session was using latin1 instead of utf8. Here are two solutions I found:
For RMySQL, after connecting run the query SET NAMES utf8 to change the connection character set.
For RODBC, connect using CharSet=utf8 in the DSN string. I was not able to run SET NAMES via ODBC.
This question pointed me in the right direction.