1、问题背景
1.1 问题描述
在使用保存特殊字符时,数据库已经设置了utf8mb4编码格式,但是在调用接口保存含有额数字符的字段时,还是会报格式转化问题:SQLException: Incorrect string value: '\xF0\x9F\x8C\x9D\xE8\xA1...'
Caused by: java.sql.SQLException: Incorrect string value: '\xF0\x9F\x8C\x9D\xE8\xA1...' for column 'company_name' at row 1
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:964) ~[mysql-connector-java-5.1.43.jar:5.1.43]
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973) ~[mysql-connector-java-5.1.43.jar:5.1.43]
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909) ~[mysql-connector-java-5.1.43.jar:5.1.43]
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527) ~[mysql-connector-java-5.1.43.jar:5.1.43]
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680) ~[mysql-connector-java-5.1.43.jar:5.1.43]
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2494) ~[mysql-connector-java-5.1.43.jar:5.1.43]
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858) ~[mysql-connector-java-5.1.43.jar:5.1.43]
at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1197) ~[mysql-connector-java-5.1.43.jar:5.1.43]
1.2、问题原因
mysql-connector-java-5.1.43.jar:5.1.43,版本过低导致,数据库驱动在提交sql之前参数码值校验不支持utf8mb4编码格式导致。
2、问题定位
MySQL mysql-connector-java-5.1.x驱动中,校验设置字符串过程涉及 一下几个关键步骤:
2.1 解析连接参数
当用户通过JDBC URL指定characterEncoding
参数时,驱动会解析该参数。
相关代码在ConnectionImpl
类的构造函数中:
configureClientCharacterSet(boolean dontCheckServerMatch) throws SQLException {
configureClientCharacterSet(boolean dontCheckServerMatch) 处理的客户端的编码格式,5.1.47之前,默认不支持utf8mb4;
boolean utf8mb4Supported = versionMeetsMinimum(5, 5, 2);
boolean useutf8mb4 = utf8mb4Supported && (CharsetMapping.UTF8MB4_INDEXES.contains(this.io.serverCharsetIndex));
// UTF8MB4_INDEXES 集合中的utf8mb4 的码值没有33 的, serverCharsetIndex 初始化值为33
初始化数据
// ConnectionImpl.java
private void coreConnect(Properties mergedProps) throws SQLException, IOException {
int newPort = 3306;
String newHost = "localhost";
String protocol = mergedProps.getProperty(NonRegisteringDriver.PROTOCOL_PROPERTY_KEY);
this.port = newPort;
this.host = newHost;
// reset max-rows to default value
this.sessionMaxRows = -1;
// preconfigure some server variables which are consulted before their initialization from server
this.serverVariables = new HashMap<String, String>();
this.serverVariables.put("character_set_server", "utf8");
this.io = new MysqlIO(newHost, newPort, mergedProps, getSocketFactoryClassName(), getProxy(), getSocketTimeout(),
this.largeRowSizeThreshold.getValueAsInt());
// 初始化 serverCharsetIndex 为33
this.io.doHandshake(this.user, this.password, this.database);
if (versionMeetsMinimum(5, 5, 0)) {
// error messages are returned according to character_set_results which, at this point, is set from the response packet
this.errorMessageEncoding = this.io.getEncodingForHandshake();
}
}
// MysqlIO.java
void doHandshake(String user, String password, String database) throws SQLException {
// Read the first packet
this.checkPacketSequence = false;
this.readPacketSequence = 0;
Buffer buf = readPacket();
// Get the protocol version
this.protocolVersion = buf.readByte();
if ((versionMeetsMinimum(4, 1, 1) || ((this.protocolVersion > 9) && (this.serverCapabilities & CLIENT_PROTOCOL_41) != 0))) {
/* New protocol with 16 bytes to describe server characteristics */
// read character set (1 byte) s设置serverCharsetIndex 为 33
this.serverCharsetIndex = buf.readByte() & 0xff;
// read status flags (2 bytes)
this.serverStatus = buf.readInt();
checkTransactionState(0);
// read capability flags (upper 2 bytes)
this.serverCapabilities |= buf.readInt() << 16;
if ((this.serverCapabilities & CLIENT_PLUGIN_AUTH) != 0) {
// read length of auth-plugin-data (1 byte)
this.authPluginDataLength = buf.readByte() & 0xff;
} else {
// read filler ([00])
buf.readByte();
}
// next 10 bytes are reserved (all [00])
buf.setPosition(buf.getPosition() + 10);
}
}
5.1.47之后版本的判断逻辑:改变了判断逻辑通过utf8mb4Supported 来判断是否为utf8mb4
boolean utf8mb4Supported = versionMeetsMinimum(5, 5, 2);
String utf8CharsetName = connectionCollationSuffix.length() > 0 ? connectionCollationCharset
: (utf8mb4Supported ? "utf8mb4" : "utf8");
// 是通过utf8mb4Supported 转化编码费事支持了utf6mb4
if (!getUseOldUTF8Behavior()) {
if (dontCheckServerMatch || !characterSetNamesMatches("utf8") || (utf8mb4Supported && !characterSetNamesMatches("utf8mb4"))
|| (connectionCollationSuffix.length() > 0
&& !getConnectionCollation().equalsIgnoreCase(this.serverVariables.get("collation_server"))))
{
execSQL(null, "SET NAMES " + utf8CharsetName + connectionCollationSuffix, -1, null, DEFAULT_RESULT_SET_TYPE,
DEFAULT_RESULT_SET_CONCURRENCY, false, this.database, null, false);
this.serverVariables.put("character_set_client", utf8CharsetName);
this.serverVariables.put("character_set_connection", utf8CharsetName);
}
} else {
execSQL(null, "SET NAMES latin1", -1, null, DEFAULT_RESULT_SET_TYPE, DEFAULT_RESULT_SET_CONCURRENCY, false, this.database, null,
false);
this.serverVariables.put("character_set_client", "latin1");
this.serverVariables.put("character_set_connection", "latin1");
}
2.2字符集映射
驱动使用CharsetMapping
类将Java字符编码名称映射到MySQL字符集名称。例如,将"UTF-8"映射为"utf8":通过configureClientCharacterSet(boolean dontCheckServerMatch) 方法
关键源代码:
// 获取对应的MySQL字符集
String mysqlCharset = CharsetMapping.getMysqlCharsetForJavaEncoding(characterEncoding, this.getServerVersion());
if (mysqlCharset == null) {
// 抛出异常,编码不支持
throw SQLError.createSQLException("Unsupported character encoding: " + characterEncoding, ...);
}
其中还会分组设置一些编码格式;tempUTF8MB4Indexes 会放入与utf8mb4相关数据
// 此处分组封装数据
Collation notUsedCollation = new Collation(0, COLLATION_NOT_DEFINED, 0, NOT_USED);
for (int i = 1; i < MAP_SIZE; i++) {
Collation coll = collation[i] != null ? collation[i] : notUsedCollation;
COLLATION_INDEX_TO_COLLATION_NAME[i] = coll.collationName;
COLLATION_INDEX_TO_CHARSET[i] = coll.mysqlCharset;
String charsetName = coll.mysqlCharset.charsetName;
if (!charsetNameToCollationIndexMap.containsKey(charsetName) || charsetNameToCollationPriorityMap.get(charsetName) < coll.priority) {
charsetNameToCollationIndexMap.put(charsetName, i);
charsetNameToCollationPriorityMap.put(charsetName, coll.priority);
}
// Filling indexes of utf8mb4 collations
if (charsetName.equals(MYSQL_CHARSET_NAME_utf8mb4)) {
tempUTF8MB4Indexes.add(i); // utf8mb4 编码格式
}
}
3、问题解决
3.1 方案一
升级版本5.1.47以上版本,可以解决上述问题
3.2 方案二
升级大版本8.以上版本,但是需要修改配置数据库的驱动。
注:记录工作中遇到的问题及解决方案,大家有更优的解决方案,欢迎补充!