python字典编码,Python字典包含编码值

I have a pandas data frame oParameterData which I have built querying on Hadoop using Hive ODBC connection. I am using it to populate a Python dictionary called oParameter

import pyodbc

import pandas

oConnexionString = 'Driver={ClouderaHive};[...]'

oConnexion = pyodbc.connect(oConnexionString, autocommit=True)

oConnexion.setencoding(encoding='utf-8')

oQueryParameter = "select * from my_db.my_table;"

oParameterData = pandas.read_sql(oQueryParameter, oConnexion)

oCursor = oConnexion.cursor()

for oRow in oParameterData.index:

oParameter = {}

oParameter['pTableName'] = oParameterData.loc[oRow,'game']

oParameter['pDataPartition'] = oParameterData.loc[oRow,'partition']

oParameter['pDataLocation'] = oParameterData.loc[oRow,'data_path']

oParameter['pAvroSchemaURL'] = oParameterData.loc[oRow,'schema_path']

When I print the whole dictionary I have the following:

>>> print(oParameter)

>>> {'pDataLocation': '/\x00d\x00a\x00t\x00a\x00/\x00d\x00a\x00t\x00a\x00l\x00a\x00k\x00e\x00/\x00t\x00m\x00p\x00/\x00k\x00a\x00f\x00k\x00a\x00d\x00u\x00m\x00p\x00e\x00r\x00/\x00d\x00a\x00t\x00a\x00/\x00H\x00e\x00r\x00o\x00/\x00c\x00o\x00n\x00t\x00e\x00x\x00t\x00.\x00s\x00t\x00a\x00r\x00t\x00.\x00G\x00a\x00m\x00e\x00M\x00o\x00d\x00e\x00\x00/\x00v\x00=\x001\x00.\x00x\x00', 'pAvroSchemaURL': '/\x00d\x00a\x00t\x00a\x00/\x00d\x00a\x00t\x00a\x00l\x00a\x00k\x00e\x00/\x00t\x00m\x00p\x00/\x00k\x00a\x00f\x00k\x00a\x00d\x00u\x00m\x00p\x00e\x00r\x00/\x00d\x00a\x00t\x00a\x00/\x00H\x00e\x00r\x00o\x00/\x00c\x00o\x00n\x00t\x00e\x00x\x00t\x00.\x00s\x00t\x00a\x00r\x00t\x00.\x00G\x00a\x00m\x00e\x00M\x00o\x00d\x00e\x00\x00/\x00c\x00o\x00n\x00t\x00e\x00x\x00t\x00.\x00s\x00t\x00a\x00r\x00t\x00.\x00G\x00a\x00m\x00e\x00M\x00o\x00d\x00e\x00_\x001\x00.\x00x\x00.\x00a\x00v\x00s\x00c\x00', 'pTableName': 'h\x00e\x00r\x00o\x00_c\x00o\x00n\x00t\x00e\x00x\x00t\x00', 'pDataPartition': 'd\x00t\x00'}

But when I print Keys and Values one by one they display properly:

>>> print(oParameter['pTableName'])

>>> 'hero_game_context_gamemode'

>>> print(oParameter['pDataPartition'])

>>> 'dt'

Could you please explain why and how to have the dictionary properly encoded?

I am using these parameters in subsequent queries described here: Hive ParseException in Drop Table Statement

and I am guessing the queries fail due to this encoding issue.

解决方案

After investigating further, I found out the encoding was not correctly set when connecting to Hadoop using pyodbc.

I was connecting like this:

import pyodbc

import pandas

oConnexionString = 'Driver={ClouderaHive};[...]'

oConnexion = pyodbc.connect(oConnexionString, autocommit=True)

oConnexion.setencoding(encoding='utf-8')

I changed to connect like this:

import pyodbc

import pandas

oConnexionString = 'Driver={ClouderaHive};[...]'

oConnexion = pyodbc.connect(oConnexionString, autocommit=True)

oConnexion.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')

oConnexion.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')

oConnexion.setencoding(encoding='utf-8')

Now when I build my dictionary from the data frame it displays properly.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值