RMySQL encoding issue on Windows - Spanish Character ñ

421 篇文章 15 订阅

https://stackoverflow.com/questions/43091376/rmysql-encoding-issue-on-windows-spanish-character-%C3%B1

While using RMySQL::dbWriteTable function in R to write a table to MySQL on Windows I get an error message concerning the character [ñ].

The simplified example is:

    table <- data.frame(a=seq(1:3), b=c("És", "España", "Compañía"))
    table

 a        b
1 1       És
2 2   España
3 3 Compañía


db <- dbConnect(MySQL(), user = "####", password = "####", dbname ="test", host= "localhost")

RMySQL::dbWriteTable(db, name="test1", table, overwrite=T, append=F )

Error in .local(conn, statement, ...) : 
  could not run statement: Invalid utf8 character string: 'Espa'

As you can see, there is no problem with the accents ("És") but there is with the ñ character ("España").

On the other hand, there is no problem with MySQL since this query works fine:

INSERT INTO test.test1 (a,b)
values (1, "España");

Things I have already tried previous to write the table:

  1. Encoding(x) <- "UTF-8" for all table.

  2. iconv(x, "UTF-8", "UTF-8") for all table.

  3. Sent pre-query: dbSendQuery(db, "SET NAMES UTF8;")

  4. Change MySQL table Collation to: "utf-8-general, latin-1, latin-1-spanish...)

*Tried "Latin-1" encoding and didn't work either.

I have been looking for an answer to this question for a while with no luck.
Please help!

Versions:

MySQL 5.7.17

R version 3.3.0

Sys.getlocale()

[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=C"

PS: Works fine in Linux environment but I am stuck with Windows in my current project :(

share improve this question
 
 
There's nothing wrong with Windows, an OS that uses UTF16 since its inception (OK, UCS2 up to ~2000). MySQL does have issues since it doesn't support Unicode natively (ie as nvarchar) but depends on varchar fields with a UTF8 collation. That does lead to problems, if an application saves ASCII data with a different codepage to the field. –  Panagiotis Kanavos  Mar 29 at 11:01
 
Most of the time, the problem is the application itself, especially in C and C++. Instead of using multi-byte or Unicode types (char16_t, utf16string), applications use ASCII buffers and types and to store UTF8-encoded strings. To be fair, these were added in C++11. R is notorious for this - R itself compiles with Unicode support on Windows, but third-party packages are often inconsistent. Some use Unicode, some allow you to specify a locale, and the worst case, some depend on the user's locale to specify the codepage, just like any non-Unicode application –  Panagiotis Kanavos  Mar 29 at 11:04 
 
Before saying "it's a Windows problem", imagine having to work with data from multiple locales. You can't hard-code just one locale in LC_COLLATE when you have to deal with multiple codepages –  Panagiotis Kanavos  Mar 29 at 11:08 
 
Check How to set charset for MySQL in RODBC? RODBC allows you to set the connection's character set to UTF8. –  Panagiotis Kanavos  Mar 29 at 12:01
 
I was able to isolate the problem to the GUI. RStudio 1.0.136 and Rgui silently convert Unicode literals. Whatever you type is treated as a non-Unicode string. On my machine (Greek Locale), the ñ was converted to nR Tools for Visual Studio on the other hand doesn't have this issue, the characters were preserved –  Panagiotis Kanavos  Mar 29 at 13:20 

1 Answer

This works for me in Windows:

write.csv(table, file = "tmp.csv", fileEncoding = "utf8", quote = FALSE, row.names = FALSE)

db <- dbConnect(MySQL(), user = "####", password = "####", dbname ="test", host= "localhost")

dbWriteTable( db, value = "tmp.csv", name = "test1", append = TRUE, row.names = FALSE, sep = ",", quote='\"', eol="\r\n")
share improve this answer

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值