我试图只获取csv.gz文件的前100行,该文件在Python中有超过400万行。我还需要有关列和每个列的标题的信息。我该怎么做?
我查看了python: read lines from compressed text files以了解如何打开文件,但我正在努力了解如何实际打印前100行并获取列中信息的一些元数据。
我找到了这个Read first N lines of a file in python,但不知道如何将其与打开csv.gz文件并在不保存未压缩的csv文件的情况下读取它结合起来。
我写了这段代码:import gzip
import csv
import json
import pandas as pd
df = pd.read_csv('google-us-data.csv.gz', compression='gzip', header=0, sep=' ', quotechar='"', error_bad_lines=False)
for i in range (100):
print df.next()
我是Python新手,不了解结果。我确信我的代码是错误的,我一直在试着调试它,但我不知道该看哪个文档。
我得到了这些结果(它一直在控制台下运行-这是一个摘录):Skipping line 63: expected 3 fields, saw 7
Skipping line 64: expected 3 fields, saw 7
Skipping line 65: expected 3 fields, saw 7
Skipping line 66: expected 3 fields, saw 7
Skipping line 67: expected 3 fields, saw 7
Skipping line 68: expected 3 fields, saw 7
Skipping line 69: expected 3 fields, saw 7
Skipping line 70: expected 3 fields, saw 7
Skipping line 71: expected 3 fields, saw 7
Skipping line 72: expected 3 fields, saw 7