python csv文件复制时的编码问题_使用python读取CSV文件时的编码问题

尝试使用python读取CSV文件时遇到障碍。

更新:如果只想跳过字符或错误,可以打开文件,如下所示:

with open(os.path.join(directory, file), 'r', encoding="utf-8", errors="ignore") as data_file:

到目前为止,我已经尝试过了。

for directory, subdirectories, files in os.walk(root_dir):

for file in files:

with open(os.path.join(directory, file), 'r') as data_file:

reader = csv.reader(data_file)

for row in reader:

print (row)

我得到的错误是:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 224-225: character maps to

我试过了

with open(os.path.join(directory, file), 'r', encoding="UTF-8") as data_file:

错误:

UnicodeEncodeError: 'charmap' codec can't encode character '\u2026' in position 223: character maps to

现在,如果我只打印data_file,它说它们是cp1252编码的,但是如果我尝试

with open(os.path.join(directory, file), 'r', encoding="cp1252") as data_file:

我得到的错误是:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 224-225: character maps to

我也尝试了推荐的套餐。

我得到的错误是:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 224-225: character maps to

我要解析的行是:

2015-11-28 22:23:58,670805374291832832,479174464,"MarkCrawford15","RT @WhatTheFFacts: The tallest man in the world was Robert Pershing Wadlow of Alton, Illinois. He was slighty over 8 feet 11 inches tall.","None

任何想法或帮助表示赞赏。

解决方案

我将使用csvkit,它使用自动检测适当的编码和解码。例如

import csvkit

reader = csvkit.reader(data_file)

正如聊天解决方案所述,

for directory, subdirectories, files in os.walk(root_dir):

for file in files:

with open(os.path.join(directory, file), 'r', encoding="utf-8") as data_file:

reader = csv.reader(data_file)

for row in reader:

data = [i.encode('ascii', 'ignore').decode('ascii') for i in row]

print (data)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值