pandas.read_csv() 参数 header整理

本文详细解析了Pandas库中read_csv函数的header参数使用方法,通过实例展示了如何指定不同行数作为列名,以及如何处理多级标题,帮助读者深入理解数据加载过程。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

pandas.read_csv() 官方文档

 

header : int, list of int, default ‘infer’

指定行数用来作为列名,数据开始行数。如果文件中没有列名,则默认为0,否则设置为None。如果明确设定header=0 就会替换掉原来存在列名。header参数可以是一个list例如:[0,1,3],这个list表示将文件中的这些行作为列标题(意味着每一列有多个标题),介于中间的行将被忽略掉(例如本例中的2;本例中的数据1,2,4行将被作为多级标题出现,第3行数据将被丢弃,dataframe的数据从第5行开始。)。

注意:如果skip_blank_lines=True 那么header参数忽略注释行和空行,所以header=0表示第一行数据而不是文件的第一行。

举例如下:

导入pandas库

import pandas as pd  

1 数据有列名

 AgeGenderEducationEducationFieldMaritalStatusIncomeOverTime
037Male4Life SciencesDivorced5993No
154Female4Life SciencesDivorced10502No
234Male3Life SciencesSingle6074Yes
339Female1Life SciencesMarried12742No
428Male3MedicalDivorced2596No
524Female1MedicalMarried4162Yes
629Male5OtherSingle3983No
736Male2MedicalMarried7596No
833Female4MedicalMarried2622No
934Female4Technical DegreeSingle6687No
1024Male1Human ResourcesMarried1555No

1.1 header默认,文件中没有列名,则默认为0,否则设置为None。

data = pd.read_csv('./train.csv')
print(data.head(5))

输出结果:

   Age  Gender  Education EducationField MaritalStatus  Income OverTime
0   37    Male          4  Life Sciences      Divorced    5993       No
1   54  Female          4  Life Sciences      Divorced   10502       No
2   34    Male          3  Life Sciences        Single    6074      Yes
3   39  Female          1  Life Sciences       Married   12742       No
4   28    Male          3        Medical      Divorced    2596       No

1.2 header=0, header 等于n,则第n行作为列名,Dataframe 从n+1行的数据开始。

data = pd.read_csv('./train.csv', header=0)
print(data.head(5))

输出结果:

   Age  Gender  Education EducationField MaritalStatus  Income OverTime
0   37    Male          4  Life Sciences      Divorced    5993       No
1   54  Female          4  Life Sciences      Divorced   10502       No
2   34    Male          3  Life Sciences        Single    6074      Yes
3   39  Female          1  Life Sciences       Married   12742       No
4   28    Male          3        Medical      Divorced    2596       No

1.3 header=1, header 等于n,则第n行作为列名,Dataframe 从n+1行的数据开始。

data = pd.read_csv('./train.csv', header=1)
print(data.head(5))

输出结果:

   37    Male  4  Life Sciences  Divorced   5993   No
0  54  Female  4  Life Sciences  Divorced  10502   No
1  34    Male  3  Life Sciences    Single   6074  Yes
2  39  Female  1  Life Sciences   Married  12742   No
3  28    Male  3        Medical  Divorced   2596   No
4  24  Female  1        Medical   Married   4162  Yes

1.4 header=[2] 和 header=2 效果一样

data = pd.read_csv('./train.csv', header=[2])
print(data.head(5))

输出结果:

   54  Female  4  Life Sciences  Divorced  10502   No
0  34    Male  3  Life Sciences    Single   6074  Yes
1  39  Female  1  Life Sciences   Married  12742   No
2  28    Male  3        Medical  Divorced   2596   No
3  24  Female  1        Medical   Married   4162  Yes
4  29    Male  5          Other    Single   3983   No

1.5 header=[0, 2, 3],表示将文件中的 第 0, 2, 3 行 作为列标题(意味着每一列有多个标题)。数据的0,2,3行将被作为多级标题出现,  第1行数据将被丢弃,dataframe的数据从第4行开始。

data = pd.read_csv('./train.csv', header=[0,2,3])
print(data.head(5))

输出结果:

  Age  Gender Education EducationField MaritalStatus Income OverTime
   54  Female         4  Life Sciences      Divorced  10502       No
   34    Male         3  Life Sciences        Single   6074      Yes
0  39  Female         1  Life Sciences       Married  12742       No
1  28    Male         3        Medical      Divorced   2596       No
2  24  Female         1        Medical       Married   4162      Yes
3  29    Male         5          Other        Single   3983       No
4  36    Male         2        Medical       Married   7596       No

1.6 header = None

data = pd.read_csv('./train.csv', header=None)
print(data.head(5))

输出结果:

     0       1          2               3              4       5         6
0  Age  Gender  Education  EducationField  MaritalStatus  Income  OverTime
1   37    Male          4   Life Sciences       Divorced    5993        No
2   54  Female          4   Life Sciences       Divorced   10502        No
3   34    Male          3   Life Sciences         Single    6074       Yes
4   39  Female          1   Life Sciences        Married   12742        No

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值