python怎么输入多行表示csv文件_在使用Python解析的csv文件中处理额外的换行符（回车）？...

最新推荐文章于 2023-06-01 12:08:51 发布

weixin_39599317

最新推荐文章于 2023-06-01 12:08:51 发布

阅读量158

点赞数

文章标签： python怎么输入多行表示csv文件

I have a CSV file that has fields that contain newlines e.g.:

A, B, C, D, E, F

123, 456, tree

, very, bla, indigo

(In this case third field in the second row is "tree\n"

I tried the following:

import csv

catalog = csv.reader(open('test.csv', 'rU'), delimiter=",", dialect=csv.excel_tab)

for row in catalog:

print "Length: ", len(row), row

and the result I got was this:

Length: 6 ['A', ' B', ' C', ' D', ' E', ' F']

Length: 3 ['123', ' 456', ' tree']

Length: 4 [' ', ' very', ' bla', ' indigo']

Does anyone have any idea how I can quickly remove extraneous newlines?

Thanks!

解决方案

Suppose you have this Excel spreadsheet:

Note:

the multi-line cell in C2;

embedded comma in C1 and D3;

blank cells, and cell with a space in D4.

Saving that as CSV in Excel, you will get this csv file:

A1,B1,"C1,+comma",D1

,B2,"line 1

line 2",D2

,,C3,"D3,+comma"

,,,D4 space

Assumably, you will want to read that into Python with the blank cells still having meaning and the embedded comma treated correctly.

So, this:

with open("test.csv", 'rU') as csvIN:

outCSV=(line for line in csv.reader(csvIN, dialect='excel'))

for row in outCSV:

print("Length: ", len(row), row)

correctly produces the 4x4 List of List matrix represented in Excel:

Length: 4 ['A1', 'B1', 'C1,+comma', 'D1']

Length: 4 ['', 'B2', 'line 1\nline 2', 'D2']

Length: 4 ['', '', 'C3', 'D3,+comma']

Length: 4 ['', '', '', 'D4 space']

The example CSV file you posted lacks quotes around the field with an 'extra newline' rendering the meaning of that newline ambiguous. Is it a new row or a multi-line field?

Therefor, you can only interpret this csv file:

A, B, C, D, E, F

123, 456, tree

, very, bla, indigo

as a one dimension list like so:

with open("test.csv", 'rU') as csvIN:

outCSV=[field.strip() for row in csv.reader(csvIN, delimiter=',')

for field in row if field]

Which produces this one dimensional list:

['A', 'B', 'C', 'D', 'E', 'F', '123', '456', 'tree', 'very', 'bla', 'indigo']

This can then be interpreted and regrouped into any sub grouping as you wish.

The idiomatic regrouping method in python uses zip like so:

>>> zip(*[iter(outCSV)]*6)

[('A', 'B', 'C', 'D', 'E', 'F'), ('123', '456', 'tree', 'very', 'bla', 'indigo')]

Or, if you want a list of lists, this is also idiomatic:

>>> [outCSV[i:i+6] for i in range(0, len(outCSV),6)]

[['A', 'B', 'C', 'D', 'E', 'F'], ['123', '456', 'tree', 'very', 'bla', 'indigo']]

If you can change how your CSV file is created, it will be less ambiguous to interpret.

weixin_39599317

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python怎么输入多行表示csv文件_在使用Python解析的csv文件中处理额外的换行符（回车）？...

I have a CSV file that has fields that contain newlines e.g.:A, B, C, D, E, F123, 456, tree, very, bla, indigo(In this case third field in the second row is "tree\n"I tried the following:import csvcat...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。