ParserError: Error tokenizing data. C error: EOF inside string starting at row 11020412

这个csv文件的数据(AIS数据,类似于GPS定位数据,字节较多)大概1200多万行,在用jupyter notebook pd.read_csv读取时,总是出现问题提示:ParserError: Error tokenizing data. C error: EOF inside string starting at row 11020412
加了
encoding='utf-8'、 header=None、 delimiter="\t"、error_bad_lines=False、 sep='\t'等等等,可是都没用,读出来的都是乱码,直接变成NaN了
改成

df=pd.read_csv(r'F:\AIS数据\oData201803\a20180402.csv',
names['MessageID','MMSI','NaviStatus','ROT_SENSOR','SOG','PosAcc','Lon','Lat','COG',
'TrueHeading','UTCSec','IMO','CallSign','ShipName','ShipCargoType','A','B','C','D',
'FixingDevice','ETA','dm','Destination','AidsToNavType','Off_Position','SarAltitude',
'm_bIsOwnShip','MessageBlendID','LTM','Z1','Z2','Z3'],
header=None , skiprows=1 , error_bad_lines=False , engine='python')

注释:
1:names[表头内容]
2:数据读取出来第一行时表头skiprows=1 是为了跳过第一行的表头
关键在engine='python’
这下可以了,但是会显示如下情况

Skipping line 586281: field larger than field limit (131072)
Skipping line 612412: ',' expected after '"'
Skipping line 639534: field larger than field limit (131072)
Skipping line 693336: field larger than field limit (131072)
Skipping line 774302: ',' expected after '"'
Skipping line 801625: field larger than field limit (131072)
Skipping line 855010: field larger than field limit (131072)
Skipping line 991076: ',' expected after '"'
Skipping line 1018778: field larger than field limit (131072)
.....

针对这个问题又搜了解决办法,在代码前面添加下面的代码

import sys
import csv
maxInt = sys.maxsize
decrement = True
while decrement:
    decrement = False
    try:
        csv.field_size_limit(maxInt)
    except OverflowError:
        maxInt = int(maxInt/10)
        decrement = True

又出现了下面的提示

Skipping line 256361: ',' expected after '"'
Skipping line 508779: ',' expected after '"'
Skipping line 523070: ',' expected after '"'
Skipping line 550192: ',' expected after '"'
Skipping line 563598: ',' expected after '"'
Skipping line 625841: ',' expected after '"'
Skipping line 653164: ',' expected after '"'
Skipping line 687986: ',' expected after '"'
....

跳过了很多
不知道有没有更好的读取这种大数据的方法,还请大神们告知

2021.3.25更新
加上import csv,读写文件时添加一个quoting=csv.QUOTE_NONE,即

df=pd.read_csv(r'F:\AIS数据\oData201803\a20180402.csv',
names['MessageID','MMSI','NaviStatus',...],
low_memory=False, header=None,skiprows=1,quoting=csv.QUOTE_NONE)

在这里插入图片描述
问题解决!

评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值