异常信息:
Incorrect string value: ‘\xF0\xA5\xA2\x89’ for column ‘name’ at row 1
排查思路:
1、先排除目标表是否为utf8mb4的表,GBase8a只支持在utf8mb4上插入4字节字符
2、排除源表的情况,正常情况下,4字节的数据没办法直接插入utf8的表中
3、一般是使用load data方式入库,怀疑是否为load data导致
测试步骤:
1)导入数据
# 制造测试数据
[gbase@keep-gbase8a-node5 ~]$ echo '1,𧒽
> 2,𥢉
> 3,𥦉
> 4,𥞉
> 5,正
> 6,常' > t_mb4.txt
[gbase@keep-gbase8a-node5 ~]$ gccli
# 创建库表
gbase> drop database if exists test_shengpi;
Query OK, 2 rows affected (Elapsed: 00:00:00.06)
gbase> create database test_shengpi;
Query OK, 1 row affected (Elapsed: 00:00:00.01)
gbase> use test_shengpi;
Query OK, 0 rows affected (Elapsed: 00:00:00.01)
gbase> create table t_utf8(id int, name varchar(100)) charset=utf8;
Query OK, 0 rows affected (Elapsed: 00:00:00.04)
gbase> create table t_utf8mb4(id int, name varchar(100)) charset=utf8mb4;
Query OK, 0 rows affected (Elapsed: 00:00:00.04)
# 加载数据
gbase> load data infile 'file://192.168.122.31/home/gbase/t_mb4.txt' into table test_shengpi.t_utf8 data_format 3 fields terminated by ',' lines terminated by '\n';
Query OK, 6 rows affected (Elapsed: 00:00:00.09)
Task 582219 finished, Loaded 6 records, Skipped 0 records
2)测试insert into
gbase> insert into t_utf8mb4 select * from t_utf8;
ERROR 1708 (HY000): [192.168.122.31:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: Incorrect string value: '\xF0\xA7\x92\xBD' for column 'name' at row 1
SQL: SELECT /*192.168.122.31_920_8_2024-07-03_15:03:45*/ /*+ TID('1968442') */ `vcname000001.test_shengpi.t_utf8`.`id` AS `id`, `vcname000001.test_shengpi.t_utf8`.`name` AS `name` FROM `test_shengpi`.`t_utf8_n2` `vcname000001.test_shengpi.t_utf8` target into server (HOST '192.168.122.31,192.168.122.47', PORT 5050, USER 'root', PASSWORD '', DATABASE 'test_shengpi', TABLE 't_utf8mb4_n2', COMMENT 'table_host 0 0 1, scn 582220, distribution 1' )
3)排查日志信息
[gbase@keep-gbase8a-node5 ~]$ vim /data/192.168.122.31/gnode/log/gbase/express.log
2024-07-03 15:03:45.590 Table test_shengpi.t_utf8mb4_n2, col 1, name, convert exception, value 𧒽, src type PA_VARCHAR(100), src charset utf8_general_ci, dst type PA_VARCHAR(100), dst charset 45: \xF0\xA7\x92\xBD, query SELECT /*192.168.122.31_920_8_2024-07-03_15:03:45*/ /*+ TID('1968442') */ `vcname000001.test_shengpi.t_utf8`.`id` AS `id`, `vcname000001.test_shengpi.t_utf8`.`name` AS `name` FROM `test_shengpi`.`t_utf8_n2` `vcname000001.test_shengpi.t_utf8` target into server (HOST '192.168.122.31,192.168.122.47', PORT 5050, USER 'root', PASSWORD '', DATABASE 'test_shengpi', TABLE 't_utf8mb4_n2', COMMENT 'table_host 0 0 1, scn 582220, distribution 1' )
2024-07-03 15:03:45.590 Incorrect string value: '\xF0\xA7\x92\xBD' for column 'name' at row 1
2024-07-03 15:03:45.591 (gns_host: 192.168.122.31) Incorrect string value: '\xF0\xA7\x92\xBD' for column 'name' at row 1 table:./test_shengpi/metadata/t_utf8mb4_n2
2024-07-03 15:03:45.598 (gns_host: 192.168.122.32) Incorrect string value: '\xF0\xA5\xA6\x89' for column 'name' at row 1 table:./test_shengpi/metadata/t_utf8mb4_n1
2024-07-03 15:03:45.636 Table ./test_shengpi/metadata/t_utf8mb4_n2, recv locks 1, flush rollback return 0.
2024-07-03 15:03:45.636 Table ./test_shengpi/metadata/t_utf8mb4_n1, recv locks 1, flush rollback return 0.
日志分析: 在insert的过程中,对name字段做了convert操作,由于字符集问题导致报错
4)解决方法
直接执行insert into 会报错,如下:
[gbase@keep-gbase8a-node5 ~]$ gccli
GBase client 9.5.3.28.149123692. Copyright (c) 2004-2024, GBase. All Rights Reserved.
gbase> use test_shengpi;
Query OK, 0 rows affected (Elapsed: 00:00:00.00)
gbase> insert into t_utf8mb4 select * from t_utf8;
ERROR 1708 (HY000): [192.168.122.31:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: Incorrect string value: '\xF0\xA5\xA6\x89' for column 'name' at row 1
SQL: SELECT /*192.168.122.31_1456_2_2024-07-03_15:23:03*/ /*+ TID('1968483') */ `vcname000001.test_shengpi.t_utf8`.`id` AS `id`, `vcname000001.test_shengpi.t_utf8`.`name` AS `name` FROM `test_shengpi`.`t_utf8_n2` `vcname000001.test_shengpi.t_utf8` target into server (HOST '192.168.122.31,192.168.122.47', PORT 5050, USER 'root', PASSWORD '', DATABASE 'test_shengpi', TAB
gbase>
解决方法1:
(1)把t_utf8表的数据导出成数据文件,
select * from test_shengpi.t_utf8 into outfile '/home/gbase/t_oututf8.txt'
fields terminated by ',' LINES TERMINATED BY '\N' WRITEMODE BY OVERWRITES;
(2)把数据文件再导入t_utf8mb4表
load data infile 'file://192.168.122.31/home/gbase/t_oututf8.txt/t_oututf8.txt' into table test_shengpi.t_utf8
data_format 3 fields terminated by ',' lines terminated by '\n';
具体代码:
[gbase@keep-gbase8a-node5 ~]$ gccli
gbase> select * from test_shengpi.t_utf8 into outfile '/home/gbase/t_oututf8.txt'
-> fields terminated by ',' WRITEMODE BY OVERWRITES;
Query OK, 6 rows affected (Elapsed: 00:00:00.05)
gbase> load data infile 'file://192.168.122.31/home/gbase/t_oututf8.txt/t_oututf8.txt' into table test_shengpi.t_utf8mb4
-> data_format 3 fields terminated by ',' lines terminated by '\n';
Query OK, 6 rows affected (Elapsed: 00:00:00.08)
Task 582228 finished, Loaded 6 records, Skipped 0 records
gbase> set names utf8mb4;
Query OK, 0 rows affected (Elapsed: 00:00:00.00)
gbase> select * from test_shengpi.t_utf8mb4;
+------+------+
| id | name |
+------+------+
| 1 | 𧒽 |
| 5 | 正 |
| 3 | 𥦉 |
| 4 | 𥞉 |
| 6 | 常 |
| 2 | 𥢉 |
+------+------+
6 rows in set (Elapsed: 00:00:00.02)
解决方法2:
具体方法: 在执行SQL时,对数据进行十六进制转换操作
具体代码:
# 使用hex对字段的数据进行十六进制转换
# 再用unhex对十六进制转为正常数据
gbase> insert into t_utf8mb4 select id,unhex(hex(name)) from t_utf8;
Query OK, 6 rows affected (Elapsed: 00:00:00.10)
Records: 6 Duplicates: 0 Warnings: 0
5)数据查询对比
(1)会话为utf8字符集
[gbase@keep-gbase8a-node5 ~]$ gccli
GBase client 9.5.3.28.149123692. Copyright (c) 2004-2024, GBase. All Rights Reserved.
gbase> set names utf8;
Query OK, 0 rows affected (Elapsed: 00:00:00.00)
gbase> show variables like '%character%';
+--------------------------+------------------------------------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_sort | binary |
| character_set_system | utf8mb4 |
| character_sets_dir | /data/192.168.122.31/gcluster/server/share/gbase/charsets/ |
+--------------------------+------------------------------------------------------------+
9 rows in set (Elapsed: 00:00:00.00)
gbase> select * from test_shengpi.t_utf8;
+------+------+
| id | name |
+------+------+
| 3 | 𥦉 |
| 6 | 常 |
| 1 | 𧒽 |
| 4 | 𥞉 |
| 2 | 𥢉 |
| 5 | 正 |
+------+------+
6 rows in set (Elapsed: 00:00:00.04)
gbase> select * from test_shengpi.t_utf8mb4;
+------+------+
| id | name |
+------+------+
| 1 | ? |
| 4 | ? |
| 2 | ? |
| 5 | 正 |
| 3 | ? |
| 6 | 常 |
+------+------+
6 rows in set (Elapsed: 00:00:00.03)
(2)会话为utf8mb4字符集
gbase> set names utf8mb4;
Query OK, 0 rows affected (Elapsed: 00:00:00.00)
gbase> show variables like '%character%';
+--------------------------+------------------------------------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8 |
| character_set_sort | binary |
| character_set_system | utf8mb4 |
| character_sets_dir | /data/192.168.122.31/gcluster/server/share/gbase/charsets/ |
+--------------------------+------------------------------------------------------------+
9 rows in set (Elapsed: 00:00:00.00)
gbase> select * from test_shengpi.t_utf8;
+------+------+
| id | name |
+------+------+
| 3 | ???? |
| 6 | 常 |
| 2 | ???? |
| 5 | 正 |
| 1 | ???? |
| 4 | ???? |
+------+------+
6 rows in set (Elapsed: 00:00:00.02)
gbase> select * from test_shengpi.t_utf8mb4;
+------+------+
| id | name |
+------+------+
| 3 | 𥦉 |
| 6 | 常 |
| 1 | 𧒽 |
| 4 | 𥞉 |
| 2 | 𥢉 |
| 5 | 正 |
+------+------+
6 rows in set (Elapsed: 00:00:00.02)