1. 不是utf-8格式,先读取一遍为utf-8,忽略掉错误:
filename = open('E://source_data/insured_utf-8.csv', encoding='utf-8',errors='ignore')
df_chunk = pd.read_csv('E://source_data/insured_utf-8.csv', chunksize=1000000, parse_dates=True, encoding = "utf-8", dtype='object',error_bad_lines= False,engine='c')
2. 出现null bytes错误,使用engine=‘c’:
df_chunk = pd.read_csv('E://source_data/insured_utf-8.csv', chunksize=1000000, parse_dates=True, encoding = "utf-8", dtype='object',error_bad_lines= False,engine='c')
3. 读取成功之后,保存为utf-8格式:
df_concat.to_csv('E://source_data/insured_utf-8.csv',header=True,index=False,encoding = "utf-8")