问题来源
最近统计DevOps所有开发人员提交的代码列表,还需要知道代码关联的业务,并且每天统计发送邮件。于是准备使用Python加定时任务解决这个问题,但遇到了编码上的问题:
- DevOps用的数据库是MySQL,字符集编码是utf8
- 业务系统用的数据库是Oracle11g,字符集编码是ZHS16GBK
- 统计结果,发送邮件用的是python3.7,主机的字符集编码是gbk
解决字符集编码问题的过程
以下过程仅供参考
参考资料:
Oracle 11G函数整理(转换函数) https://blog.csdn.net/ceclar123/article/details/12199179
Oracle字符集详细介绍 https://www.cnblogs.com/rootq/articles/2049324.html
1. 获取字符集编码方式:
- MySQL
--查看MySQL字符集编码
show variables like '%character%';
show variables like 'collation%';
--查看MySQL所支持的字符集
show charset;
- Oracle
--查看Oracle字符集编码
SELECT USERENV('LANGUAGE') FROM DUAL;
SELECT * FROM V$NLS_PARAMETERS WHERE PARAMETER LIKE '%CHARACTERSET%';
2. MySQL数据提取:
- utf8汉字转十六进制
-- 转换字符串编码
SELECT CONVERT('张三' USING GBK) FROM DUAL;
-- 字符串编码 转 十六进制字符串
SELECT HEX(CONVERT('张三' USING GBK)) FROM DUAL;
输出结果:
张三
D5C5C8FD
- 十六进制转utf8
-- 十六进制字符串 转成 二进制数据
SELECT UNHEX('D5C5C8FD') FROM DUAL;
-- 将字符串用gbk编码输出
SELECT CONVERT(UNHEX('D5C5C8FD') USING GBK) FROM DUAL;
-- 如需用utf8输出,还需要再转一次
SELECT CONVERT(CONVERT(UNHEX('D5C5C8FD') USING GBK) USING UTF8) FROM DUAL;
输出结果
����
张三
张三
3. Python将提取的数据存入Oracle:
import binascii
# 将十六进制字符串 转成 二进制数据
binascii.unhexlify(row[0])
# 将十六进制字符串 转成 gbk字符串
binascii.unhexlify(row[0]).decode('gbk')
-
注意
- 其中遇到了oracle客户端编码问题,导致的python报错
- 需要修改主机的环境变量 NLS_LANG 来指定客户端字符集。具体设置可以自行百度
4. Oracle统计出结果
- 十六进制转gbk(数据库默认编码)汉字
-- gbk汉字的十六进制 转换成 VARCHAR2编码格式的汉字
SELECT UTL_RAW.CAST_TO_VARCHAR2('D5C5C8FD') FROM DUAL;
-- utf16汉字的十六进制 转换成 NVARCHAR2编码格式的汉字
SELECT UTL_RAW.CAST_TO_NVARCHAR2('5F204E09') FROM DUAL;
输出结果:
张三
张三
- gbk(数据库默认编码)汉字转十六进制
-- 转换字符串编码为VARCHAR2(ZHS16GBK)编码格式的汉字(类似MySQL的CONVERT)以下三个语句等价
SELECT TRANSLATE('张三' USING CHAR_CS) FROM DUAL;
SELECT CONVERT('张三', 'ZHS16GBK', 'ZHS16GBK') FROM DUAL;
SELECT '张三' FROM DUAL;
-- 转换字符串编码为NVARCHAR2(AL16UTF16)编码格式的汉字(类似MySQL的CONVERT)以下二个语句等价
SELECT TRANSLATE('张三' USING NCHAR_CS) FROM DUAL; -- 展示成NVARCHAR2
SELECT CONVERT('张三', 'AL16UTF16', 'ZHS16GBK') FROM DUAL; -- 展示成乱码
-- VARCHAR2(ZHS16GBK)字符串编码转换成十六进制字符串
SELECT RAWTOHEX(TRANSLATE('张三' USING CHAR_CS)) FROM DUAL;
SELECT RAWTOHEX(CONVERT('张三', 'ZHS16GBK', 'ZHS16GBK')) FROM DUAL;
SELECT RAWTOHEX('张三') FROM DUAL;
-- NVARCHAR2(AL16UTF16)字符串编码转换成十六进制字符串
SELECT RAWTOHEX(TRANSLATE('张三' USING NCHAR_CS)) FROM DUAL;
SELECT RAWTOHEX(CONVERT('张三', 'AL16UTF16', 'ZHS16GBK')) FROM DUAL;
输出结果:
张三
张三
张三
张三
_ N
D5C5C8FD
D5C5C8FD
D5C5C8FD
5F204E09
5F204E09
5. Python获取统计结果,发送邮件
发送邮件的方法
import smtplib
from email.mime.text import MIMEText
from email.header import Header
# 发送邮件方法
def send_email(_mailserver: str, _sender: str, _senderpwd: str, _receiver: list, _mail_content: str, _subject: str, _mail_type: object = 'plain'):
try:
message = MIMEText(_mail_content, _subtype=_mail_type, _charset='utf-8')
message['From'] = Header('%s' % _sender, 'utf-8') # 发送者
message['To'] = Header('%s' % trans_receivers_to_str(_receiver[0]), 'utf-8') # 接收者
message['Cc'] = Header('%s' % trans_receivers_to_str(_receiver[1]), 'utf-8') # 抄送
message['Bcc'] = Header('%s' % trans_receivers_to_str(_receiver[2]), 'utf-8') # 密送
message['Subject'] = Header(_subject, 'utf-8')
smtpobj = smtplib.SMTP(_mailserver)
smtpobj.login(_sender, _senderpwd)
smtpobj.sendmail(_sender, trans_receivers_to_list(_receiver), message.as_string())
print('邮件发送成功')
except smtplib.SMTPException as Argument:
print('Error: 无法发送邮件' + Argument)
def trans_receivers_to_list(_receiver):
# receiver list: use join to trans to list
receivers = []
if type(_receiver) == list:
for rec in _receiver:
for recI in range(len(rec)):
receivers.append(rec[recI])
elif type(_receiver) == tuple:
for recI in range(len(_receiver)):
receivers.append(_receiver[recI])
return receivers
def trans_receivers_to_str(_receiver):
# receiver list: use join to trans to str
receivers = []
if type(_receiver) == list:
for rec in _receiver:
for recI in range(len(rec)):
receivers.append(rec[recI])
elif type(_receiver) == tuple:
for recI in range(len(_receiver)):
receivers.append(_receiver[recI])
return ';'.join(receivers)
# 调用发送邮件
defaultMailServer = 'xx.xxx.com'
defaultSender = 'xxx@xxx.com'
defaultSenderPwd = '123456'
defaultReceiver = [('To@xxx.com',), ('Cc@xxx.com',), ('Bcc@xxx.com',)]
send_email(defaultMailServer, defaultSender, defaultSenderPwd, defaultReceiver, "邮件正文", "邮件标题", _mail_type='html')
总结
- 从MySQL查出数据,汉字使用GBK编码的十六进制字符串
SELECT HEX(CONVERT('张三' USING GBK)) FROM DUAL;
- Python中将十六进制字符串转成中文,后续Oracle相关操作以及发送邮件均是gbk编码。完成任务!
name = binascii.unhexlify(row[0]).decode('gbk')