python无法读取文件-CSV阅读器对象无法读取整个文件[Python]

1586010002-jmsa.png

I am currently working on a project that uses the csv module in python. I have created a separate class to open a pre-existing csv file, modify the data on each line, then save the data to a new csv file.

The original file has 1438 rows, and by placing some test code into the class that handles the writing, it indicates that it is writing 1438 rows to the new csv file. Upon inspection of the file itself, there is infact 1438 rows in the newly created file. However, when I use the standard cvs module in this way:

reader = csv.reader(open('naiveData.csv', 'rb'))

It only goes to row 1410 (and not even then entire row, it ends one and a half indices before the end of the row. I am not sure what may be causing this.

This is how I am accessing the reader:

for row in reader:

print row

Here is the part of the output where it fails:

['UNPM', '16', '2.125', '910', 'athlete', 'enrolled']

['UNPM', '14', '2.357', '1020', 'non-athlete', 'enrolled']

['UNDC', '17', '2.071', '910', 'athlete', 'unenrolled']

['KINS', '15', '2.6', '910', 'athlete', 'enrolled']

['PHYS', '16', '1.5', '900', 'non-']

The last list should have ['PHYS', '16', '1.5', '900', 'non-athlete', 'enrolled'].

Any ideas as to what may be causing this? Thanks in advance!

Edit:

Here are the lines in the CVS file around the area the error is occuring:

KINS,15,2.6,910,athlete,enrolled

PHYS,16,1.5,900,non-athlete,enrolled

UNPL,15,3,960,non-athlete,enrolled

解决方案

I'm willing to bet this is the problem, although it's hard to be sure when you've only shown us 3 lines of code instead of a reproducible example.

You're doing something like this:

old_reader = csv.reader(open('old.csv', 'rb'))

writer = csv.writer(open('new.csv', 'wb'))

for row in old_reader:

writer.writerow(transform(row))

new_reader = csv.reader(open('new.csv', 'rb'))

for row in new_reader:

print row

At the time you open new.csv for reading, you haven't yet closed new.csv for writing. So the last buffer hasn't been flushed to disk. So you can't see it.

But then, when your script finishes, the writer goes out of scope, the file object no longer has any references, so it gets flushed and closed. So when you inspect it from outside of the program, after the script finishes, now it's complete. (Note that this behavior is explicitly not guaranteed; you're just getting lucky.)

And this is why you should never leak files by just putting an open in the middle of an expression. Use a with statement instead. For example:

with open('old.csv', 'rb') as oldf, open('new.csv', 'wb') as newf:

old_reader = csv.reader(oldf)

writer = csv.writer(newt)

for row in old_reader:

writer.writerow(transform(row))

with open('new.csv', 'rb') as newf:

new_reader = csv.reader(newf)

for row in new_reader:

print row

Python读取中文字符的CSV文件,通常需要指定文件的编码格式。CSV文件通常默认使用UTF-8编码,但有时候会使用GBK或其他编码,特别是在处理一些旧的或特定系统的数据时。为了正确读取中文字符,你需要确保在打开文件时指定了正确的编码。 以下是一个使用Python标准库中的`csv`模块来读取中文字符的CSV文件的示例代码: ```python import csv # 指定文件路径 csv_file_path = 'path/to/your/chinese.csv' # 使用'with'语句打开文件,确保文件会被正确关闭 with open(csv_file_path, 'r', encoding='utf-8') as csvfile: # 创建CSV阅读器对象,指定分隔符,例如逗号 csv_reader = csv.reader(csvfile, delimiter=',') # 遍历CSV文件中的每一行 for row in csv_reader: # 对于每一行,进行处理 print(row) ``` 这段代码中,`encoding='utf-8'`确保了文件是以UTF-8编码打开的。如果你知道文件实际上是使用其他编码,比如GBK,那么你应该将编码参数修改为`encoding='gbk'`。 如果你使用的是Pandas库来处理数据,代码会更加简洁: ```python import pandas as pd # 指定文件路径 csv_file_path = 'path/to/your/chinese.csv' # 使用Pandas的read_csv函数直接读取CSV文件 df = pd.read_csv(csv_file_path, encoding='utf-8') # 对于GBK编码的文件,使用encoding='gbk' # 输出DataFrame查看内容 print(df) ``` 在使用Pandas读取时,同样需要指定正确的编码格式以确保中文字符能被正确解析。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值