python2.7读取csv文件,Python 2.7 CSV文件读/写\xef\xbb\xbf代码

最新推荐文章于 2022-06-05 15:26:18 发布

抑郁研究所

最新推荐文章于 2022-06-05 15:26:18 发布

阅读量224

点赞数

文章标签： python2.7读取csv文件

I have a question about Python 2.7 read/write csv file with 'utf-8-sig' code, my csv . header is

['\xef\xbb\xbfID;timestamp;CustomerID;Email']

there have some code("\xef\xbb\xbfID") I read from file A.csv and I want write the same code and header to file B.csv

My print log is shows:

['\xef\xbb\xbfID;timestamp;CustomerID;Email']

But the actual output file header it looks like

ÔªøID;timestamp

Here is the code:

def remove_gdpr_info_from_csv(file_path, file_name, temp_folder, original_header):

new_temp_folder = tempfile.mkdtemp()

new_temp_file = new_temp_folder + "/" + file_name

# Blanked new file

with open(new_temp_file, 'wb') as outfile:

writer = csv.writer(outfile, delimiter=";")

print original_header

writer.writerow(original_header)

# File from SFTP

with open(file_path, 'r') as infile:

reader = csv.reader(infile, delimiter=";")

first_row = next(reader)

email = first_row.index('Email')

contract_detractor1 = first_row.index('Contact Detractor (Q21)')

contract_detractor2 = first_row.index('Contact Detractor (Q20)')

contract_detractor3 = first_row.index('Contact Detractor (Q43)')

contract_detractor4 = first_row.index('Contact Detractor(Q26)')

contract_detractor5 = first_row.index('Contact Detractor(Q27)')

contract_detractor6 = first_row.index('Contact Detractor(Q44)')

indexes = []

for column_name in header_list:

ind = first_row.index(column_name)

indexes.append(ind)

for row in reader:

output_row = []

for ind in indexes:

data = row[ind]

if ind == email:

data = ''

elif ind == contract_detractor1:

data = ''

elif ind == contract_detractor2:

data = ''

elif ind == contract_detractor3:

data = ''

elif ind == contract_detractor4:

data = ''

elif ind == contract_detractor5:

data = ''

elif ind == contract_detractor6:

data = ''

output_row.append(data)

writer.writerow(output_row)

s3core.upload_files(SPARKY_S3, DESTINATION_PATH, new_temp_file)

shutil.rmtree(temp_folder)

shutil.rmtree(new_temp_folder)

解决方案

'\xef\xbb\xbf' is the UTF8 encoded version of the unicode ZERO WIDTH NO-BREAK SPACE U+FEFF. It is often used as a Byte Order Mark at the beginning of unicode text files:

when you have 3 bytes: '\xef\xbb\xbf', then the file is utf8 encoded

when you have 2 bytes: '\xff\xfe', then the file is in utf16 little endian

when you have 2 bytes: '\xfe\xff', then the file is in utf16 big endian

The 'utf-8-sig' encoding explicitely asks for writing this BOM at the beginning of the file

To process it automatically at read time of a csv file in Python 2, you can use the codecs module:

with open(file_path, 'r') as infile:

reader = csv.reader(codecs.EncodedFile(infile, 'utf8-sig', 'utf8'), delimiter=";")

EncodedFile will wrap the original file object by decoding it in utf8-sig, actually skipping the BOM and re-encoding it in utf8 with no BOM.

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。