datastage mysql wire_DataStage 九、数据交换到MySQL以及乱码问题

DataStage序列文章

说明

默认情况下datastage 9.1版本中只支持Enterprise or Commercial版本的MySQL数据库,如果要使用MySQL Community版本数据库则可以通过安装和配置mysql-connector-odbc驱动来完成;以下的实验以datastage 9.1.0、MySQL Community 5.0.67 Edition和mysql驱动mysql-connector-odbc-5.3.4-1.el6.x86_64.rpm在Linux上实验为例。

1 安装MySQL ODBC驱动

在网上下载mysql-connector-odbc-5.3.4-1版本驱动,然后安装;

#rpm -ivh mysql-connector-odbc-5.3.4-1.el6.x86_64.rpm

Preparing... ########################################### [100%]

1:mysql-connector-odbc ########################################### [100%]

Success: Usage count is 1

Success: Usage count is 1

安装结束后得到如下目录和文件信息

/usr/share/doc/mysql-connector-odbc-5.3.4

/usr/share/mysql

/usr/lib64/mysql

/usr/lib64/mysql/libmysqlclient.so.16.0.0

/usr/lib64/mysql/libmysqlclient.so.16

/usr/lib64/mysql/libmysqlclient_r.so.16.0.0

/usr/lib64/mysql/libmysqlclient_r.so.16

/usr/lib64/libmyodbc5w.so

/usr/lib64/libmyodbc5a.so

/usr/bin/myodbc-installer

libmyodbc5w.so就是我们要用到的驱动文件。

2 配置ODBC

编辑$DSHOME/.odbc.ini文件,根据里面的示例创建mysql odbc;

[odbc_mysql]

Driver=/usr/lib64/libmyodbc5w.so

DriverUnicodeType=1

Description=DataDirect 7.0 MySQL Wire Protocol

Database=mysql

Server=150.18.44.99

User=root

Password=

Port=3306

QueryTimeout=0

ReportCodepageConversionErrors=0

TreatBinaryAsChar=0

TrustStore=

TrustStorePassword=

ValidateServerCertificate=1

Option=3

这里Driver指定新安装的驱动文件;

DriverUnicodeType指定字节长度,默认情况下datastage使用附带的DataDirect ODBC管理驱动和支持多种常用的数据库,但当使用第三方驱动时,第三方驱动与DataDirect ODBC可能不完全兼容,比如一般情况下ds DataDirect ODBC驱动使用4个字节长度的字符(比如UTF16),而第三方驱动使用2个字节长度的字符,这时如果不指定DriverUnicodeType则会发生Unicode converter buffer overflow错误;所以当在Driver下面设置了DriverUnicodeType属性时,DataDirect ODBC管理和维护统一的字节长度;

Server 指定服务器,不要弄错了,.odbc.ini文件中的示例是HostName;

Port指定端口;

接着将ODBC配置到项目目录下的uvodbc.config 文件中;

DBMSTYPE = ODBC

完成这些操作后再通过dssh测试odbc;

#dssh

DataStage Command Language 9.1 Licensed Materials - Property of IBM

(c) Copyright IBM Corp. 1997, 2012 All Rights Reserved.

dscluster logged on: Thursday, November 19, 2015 20:52

>logto dscluster

>ds_connect odbc_mysql

odbc_mysql> show tables;

Tables_in_mysql

---------------

char_test

columns_priv

db

func

help_category

help_keyword

help_relation

help_topic

host

3 乱码的产生

在一个字符集为AMERICAN_AMERICA.AL32UTF8的Oracle数据库中建立一张表并存入中文的数据;

create table char_test(

col1 varchar2(300),

col2 varchar2(50)

);

insert into char_test(col1,col2) values('曹操','三国演义中的武王');

insert into char_test(col1,col2) values('刘备','三国演义中的蜀王');

insert into char_test(col1,col2) values('诸葛亮','三国演义中的军师');

在MySQL数据库中建立相同的数据表;

create table char_test(

col1 varchar(300),

col2 varchar(50)

);

创建ds job完成oracle到mysql的数据交换;

d5503eb41fcbd74f712934cc2ac34ad9.png

数据交换完成后发现MySQL数据表中全是乱码;

mysql> select * from char_test;

+------+----------+

| col1 | col2 |

+------+----------+

| ?? | ???????? |

| ?? | ???????? |

| ??? | ???????? |

+------+----------+

4 乱码分析

根据MySQL文档得知,MySQL数据库可以在数据库层面、数据表层面、数据表列层面设置字符集,并且它们的设置都继承于上一级,所以如果在创建表时,没有为列指定字符集,则列继承于表的字符集;如果没有为表指定字符集,则表继承数据库的字符集。当客户端向服务端发送请求时字符将产生这样的转换过程(character_set_client(客户端来源数据使用的字符集)、character_set_connection(连接层字符集)、character_set_results(查询结果字符集)),so 根据这些来判断,首先想到的是MySQL数据库层面使用的字符集和字符序;

mysql> show variables like 'character%';

+--------------------------+--------------------------------------+

| Variable_name | Value |

+--------------------------+--------------------------------------+

| character_set_client | utf8 |

| character_set_connection | utf8 |

| character_set_database | latin1 |

| character_set_filesystem | binary |

| character_set_results | utf8 |

| character_set_server | latin1 |

| character_set_system | utf8 |

| character_sets_dir | D:\develope\MySQL5.0\share\charsets\ |

+--------------------------+--------------------------------------+

mysql> show variables like 'collation%';

+----------------------+-------------------+

| Variable_name | Value |

+----------------------+-------------------+

| collation_connection | utf8_general_ci |

| collation_database | latin1_swedish_ci |

| collation_server | latin1_swedish_ci |

+----------------------+-------------------+

此时数据库使用的字符集和字符序都是latin1,而在创建表时均未指定列和表使用的字符集,所以数据表和数据表列都继承了数据库的字符集;

mysql> show table status like 'char_test';

+-----------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-----

----------------+-------------+------------+-------------------+----------+----------------+----------------------+

| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Crea

te_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |

+-----------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-----

----------------+-------------+------------+-------------------+----------+----------------+----------------------+

| char_test | InnoDB | 10 | Compact | 0 | 0 | 16384 | 0 | 0 | 0 | NULL | 2015

-11-20 10:59:49 | NULL | NULL | latin1_swedish_ci | NULL | | InnoDB free: 9216 kB |

+-----------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-----

----------------+-------------+------------+-------------------+----------+----------------+----------------------+

1 row in set (0.00 sec)

mysql> show full columns from char_test;

+-------+--------------+-------------------+------+-----+---------+-------+---------------------------------+---------+

| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |

+-------+--------------+-------------------+------+-----+---------+-------+---------------------------------+---------+

| col1 | varchar(300) | latin1_swedish_ci | YES | | NULL | | select,insert,update,references | |

| col2 | varchar(50) | latin1_swedish_ci | YES | | NULL | | select,insert,update,references | |

+-------+--------------+-------------------+------+-----+---------+-------+---------------------------------+---------+

所以当客户端插入中文字符时,MySQL数据库做了如下转换:

utf8=>utf8=>latin1

原始数据中含有的(\u0000 ~ \u00ff)范围以外的Unicode字符会因为无法在latin1字符集中表示而被转换为"?"(0x3f)符号,以后查询不管连接字符集如何设置都无法恢复其原始内容。

5 乱码的解决方法

在分析了数据库字符集和乱码原因后知道了乱码的产生的真正原因了,那如何处理呢?抱着这些问题我做了三种可行的设想和解决方法:

1. 将表的字符集从latin1转换为utf8;

2. 将数据库默认字符设置为utf8;

3. 创建数据库时明确指定字符集;

5.1 将表的字符集从latin1转换为utf8

这种实现方案主要是为了不影响系统运行和其它用户并且生产迫切急需的情况下进行的,相对来说它的影响最小,速度快。

alter table char_test default character set=utf8;

alter table char_test convert to character set utf8;

注意:命令1是把表的默认字符设置为utf8(不包括列哦),命令2是把表转换为utf8(包括列哦)

mysql> show table status like 'char_test';

+-----------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-----

----------------+-------------+------------+-----------------+----------+----------------+----------------------+

| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Crea

te_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |

+-----------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-----

----------------+-------------+------------+-----------------+----------+----------------+----------------------+

| char_test | InnoDB | 10 | Compact | 3 | 5461 | 16384 | 0 | 0 | 0 | NULL | 2015

-11-20 10:59:49 | NULL | NULL | utf8_general_ci | NULL | | InnoDB free: 9216 kB |

+-----------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-----

mysql> show full columns from char_test;

+-------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+

| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |

+-------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+

| col1 | varchar(300) | utf8_general_ci | YES | | NULL | | select,insert,update,references | |

| col2 | varchar(50) | utf8_general_ci | YES | | NULL | | select,insert,update,references | |

+-------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+

mysql> truncate table char_test;

Query OK, 3 rows affected (0.05 sec)

此时再重新交换数据,数据就正确了。

mysql> set names gbk;

Query OK, 0 rows affected (0.00 sec)

mysql> select * from char_test;

+--------+------------------+

| col1 | col2 |

+--------+------------------+

| 刘备 | 三国演义中的蜀王 |

| 曹操 | 三国演义中的武王 |

| 诸葛亮 | 三国演义中的军师 |

+--------+------------------+

3 rows in set (0.01 sec)

注意:如果你使用一些客户端工具,比如MySQL-Front,连接数据库时要选择好字符集,如果没有选择默认是数据库设定的字符集,本文中数据库默认字符集是latin1,如果你没有手动选择为utf8,当查询如上实验表是又发生了utf8=>latin1,查询出来的结果肯定是乱码。

5.2 将数据库默认字符设置为utf8

这是一种长远考虑,比较安全,比较适合开发环境的方案,因为随着项目规模的扩大,人员的增退因素,不可能每建一个表都要去考虑字符集因素。但缺点就是它必须重启MySQL服务,并影响系统的运行。

编辑my.ini文件,修改如下的属性值为utf8;

[client]

port=3306

default-character-set=utf8

[mysql]

default-character-set=utf8

# The default character set that will be used when a new schema or table is

# created and no character set is defined

default-character-set=utf8

注意:每个MySQL版本的字符集设置都不一样,详细信息请查阅版本文档。

然后重启MySQL服务;服务启动完成后进一步检查数据库字符集信息;

mysql> show variables like 'character%';

+--------------------------+--------------------------------------+

| Variable_name | Value |

+--------------------------+--------------------------------------+

| character_set_client | utf8 |

| character_set_connection | utf8 |

| character_set_database | utf8 |

| character_set_filesystem | binary |

| character_set_results | utf8 |

| character_set_server | utf8 |

| character_set_system | utf8 |

| character_sets_dir | D:\develope\MySQL5.0\share\charsets\ |

+--------------------------+--------------------------------------+

mysql> show variables like 'coll%';

+----------------------+-----------------+

| Variable_name | Value |

+----------------------+-----------------+

| collation_connection | utf8_general_ci |

| collation_database | utf8_general_ci |

| collation_server | utf8_general_ci |

+----------------------+-----------------+

都为utf8了,根据上面的分析,如果此时创建表没有知道字符集,那么表和列的字符集都会继承数据库字符集;

drop table char_test;

create table char_test(

col1 varchar(300),

col2 varchar(50)

);

mysql> show table status like 'char_test';

+-----------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-----

----------------+-------------+------------+-----------------+----------+----------------+----------------------+

| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Crea

te_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |

+-----------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-----

----------------+-------------+------------+-----------------+----------+----------------+----------------------+

| char_test | InnoDB | 10 | Compact | 0 | 0 | 16384 | 0 | 0 | 0 | NULL | 2015

-11-20 10:59:49 | NULL | NULL | utf8_general_ci | NULL | | InnoDB free: 9216 kB |

+-----------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-----

----------------+-------------+------------+-----------------+----------+----------------+----------------------+

1 row in set (0.00 sec)

mysql> show full columns from char_test;

+-------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+

| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |

+-------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+

| col1 | varchar(300) | utf8_general_ci | YES | | NULL | | select,insert,update,references | |

| col2 | varchar(50) | utf8_general_ci | YES | | NULL | | select,insert,update,references | |

+-------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+

2 rows in set (0.00 sec)

没错吧!此时再交换数据也不会乱码了。

mysql> select * from char_test;

+--------+------------------+

| col1 | col2 |

+--------+------------------+

| 刘备 | 三国演义中的蜀王 |

| 曹操 | 三国演义中的武王 |

| 诸葛亮 | 三国演义中的军师 |

+--------+------------------+

5.3 创建数据库时明确指定字符集

这种方案适用于早期数据库规划阶段和当数据库需要经常迁移时使用;

create database sydb default character set utf8 collate utf8_general_ci;

mysql> use sydb

Database changed

mysql> show variables like 'character%';

+--------------------------+--------------------------------------+

| Variable_name | Value |

+--------------------------+--------------------------------------+

| character_set_client | utf8 |

| character_set_connection | utf8 |

| character_set_database | utf8 |

| character_set_filesystem | binary |

| character_set_results | utf8 |

| character_set_server | utf8 |

| character_set_system | utf8 |

| character_sets_dir | D:\develope\MySQL5.0\share\charsets\ |

+--------------------------+--------------------------------------+

8 rows in set (0.00 sec)

6 总结

乱码是数据库和数据交换中常见的问题之一,导致的原因很多,有字符集不一致的、有字符集不兼容的、有转换出错的、有应用程序导致的等;Oracle在创建数据库时使用character set 参数定义数据库的字符集,客户端中使用NLS_LANG变量定义客户端环境的字符集,当客户端和服务端的字符集不一样时,服务端先执行字符集转换,然后在存储数据;MySQL中字符集设置相对比较灵活,可以在数据库层面、数据表层面和数据表列层面设置,大大提高了数据表字符集扩展和管理。虽然每个数据库对字符的处理方式也不尽相同,但总的思想是相同的,字符集也是遵循标准或一定规则的,所以阅读官方文档了解原理和基础知识是一切的开始。

--The end(2015-11-20)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值