python .read .text .encoding_Python Pandas read_csv(encoding = 'utf16')仅适用于engine = 'python'?...

我在OSX El Capitan 10.11.2上的Python 2.7.10上使用Pandas 0.18.1,如果我没有设置 engine='python' ,则无法使用read_csv()读取UTF-16文件 .

文档指出Python解析器功能更完整,因此Pandas可能默认使用C解析器,并且它还不支持UTF-16 . 有人可以确认是否是这种情况,或者这里是否还有其他事情发生?

以下是最小的再现方案:

alanwagner : ~ ∴ pip2.7 freeze | grep pandas

pandas==0.18.1

alanwagner : ~ ∴ cat test.csv

col1,col2

val1,val2

alanwagner : ~ ∴ python

Python 2.7.10 (default, Oct 23 2015, 18:05:06)

[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin

Type "help", "copyright", "credits" or "license" for more information.

>>> import pandas as pd

>>> pd.read_csv('test.csv', encoding='utf8').to_csv('test-utf16.csv', encoding='utf16', index=False)

>>>

alanwagner : ~ ∴ cat test-utf16.csv

??col1,col2

val1,val2

alanwagner : ~ ∴ python

Python 2.7.10 (default, Oct 23 2015, 18:05:06)

[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin

Type "help", "copyright", "credits" or "license" for more information.

>>> import pandas as pd

>>> pd.read_csv('test-utf16.csv', encoding='utf16')

Traceback (most recent call last):

File "", line 1, in

File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 562, in parser_f

return _read(filepath_or_buffer, kwds)

File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 315, in _read

parser = TextFileReader(filepath_or_buffer, **kwds)

File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 645, in __init__

self._make_engine(self.engine)

File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 799, in _make_engine

self._engine = CParserWrapper(self.f, **self.options)

File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 1213, in __init__

self._reader = _parser.TextReader(src, **kwds)

File "pandas/parser.pyx", line 520, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:5129)

File "pandas/parser.pyx", line 701, in pandas.parser.TextReader._get_header (pandas/parser.c:7665)

File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_16.py", line 16, in decode

return codecs.utf_16_decode(input, errors, True)

UnicodeDecodeError: 'utf16' codec can't decode byte 0x63 in position 2: truncated data

>>> pd.read_csv('test-utf16.csv', encoding='utf16', engine='python')

col1 col2

0 val1 val2

>>>

通过将我的文件从UTF-16转换为UTF-8,然后将其加载到Pandas DataFrame中,我能够解决这个问题 .

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值