今天我们来讨论 Pandas 中的 reset_index()
方法,包括为什么我们需要在 Pandas 中重置 DataFrame 的索引,以及我们应该如何应用该方法
在本文我们将使用 Kaggle 上的数据集样本 Animal Shelter Analytics 来作为我们的测试数据
Pandas 中的 Reset_Index() 是什么?
如果我们使用 Pandas 的 read_csv()
方法读取 csv 文件而不指定任何索引,则生成的 DataFrame 将具有默认的基于整数的索引,第一行从 0 开始,随后每行增加 1:
import pandas as pd
import numpy as np
df = pd.read_csv('Austin_Animal_Center_Intakes.csv').head()
df
Output:
Animal ID Name DateTime MonthYear Found Location Intake Type Intake Condition Animal Type Sex upon Intake Age upon Intake Breed Color
0 A786884 *Brock 01/03/2019 04:19:00 PM 01/03/2019 04:19:00 PM 2501 Magin Meadow Dr in Austin (TX) Stray Normal Dog Neutered Male 2 years Beagle Mix Tricolor
1 A706918 Belle 07/05/2015 12:59:00 PM 07/05/2015 12:59:00 PM 9409 Bluegrass Dr in Austin (TX) Stray Normal Dog Spayed Female 8 years English Springer Spaniel White/Liver
2 A724273 Runster 04/14/2016 06:43:00 PM 04/14/2016 06:43:00 PM 2818 Palomino Trail in Austin (TX) Stray Normal Dog Intact Male 11 months Basenji Mix Sable/White
3 A665644 NaN 10/21/2013 07:59:00 AM 10/21/2013 07:59:00 AM Austin (TX) Stray Sick Cat Intact Female 4 weeks Domestic Shorthair Mix Calico
4 A682524 Rio 06/29/2014 10:38:00 AM 06/29/2014 10:38:00 AM 800 Grove Blvd in Austin (TX) Stray Normal Dog Neutered Male 4 years Doberman Pinsch/Australian Cattle Dog Tan/Gray
在某些情况下,我们可能希望拥有更有意义的行标签,因此我们将选择 DataFrame 的其中一列作为 DataFrame 索引。我们可以使用 read_csv()
方法的 index_col 参数直接执行此操作:
df = pd.read_csv('Austin_Animal_Center_Intakes.csv', index_col='Animal ID').head()
df
Output:
Animal ID Name DateTime MonthYear Found Location Intake Type Intake Condition Animal Type Sex upon Intake Age upon Intake Breed Color
A786884 *Brock 01/03/2019 04:19:00 PM 01/03/2019 04:19:00 PM 2501 Magin Meadow Dr in Austin (TX) Stray Normal Dog Neutered Male 2 years Beagle Mix Tricolor
A706918 Belle 07/05/2015 12:59:00 PM 07/05/2015 12:59:00 PM 9409 Bluegrass Dr in Austin (TX) Stray Normal Dog Spayed Female 8 years English Springer Spaniel White/Liver
A724273 Runster 04/14/2016 06:43:00 PM 04/14/2016 06:43:00 PM 2818 Palomino Trail in Austin (TX) Stray Normal Dog Intact Male 11 months Basenji Mix Sable/White
A665644 NaN 10/21/2013 07:59:00 AM 10/21/2013 07:59:00 AM Austin (TX) Stray Sick Cat Intact Female 4 weeks Domestic Shorthair Mix Calico
A682524 Rio 06/29/2014 10:38:00 AM 06/29/2014 10:38:00 AM 800 Grove Blvd in Austin (TX) Stray Normal Dog Neutered Male 4 years Doberman Pinsch/Australian Cattle Dog Tan/Gray
或者我们还可以使用 set_index() 方法将 DataFrame 的任何列设置为 DataFrame 索引:
df = pd.read_csv('Austin_Animal_Center_Intakes.csv').head()
df.set_index('Animal ID', inplace=True)
df
Output:
Animal ID Name DateTime MonthYear Found Location Intake Type Intake Condition Animal Type Sex upon Intake Age upon Intake Breed Color
A786884 *Brock 01/03/2019 04:19:00 PM 01/03/2019 04:19:00 PM 2501 Magin Meadow Dr in Austin (TX) Stray Normal Dog Neutered Male 2 years Beagle Mix Tricolor
A706918 Belle 07/05/2015 12:59:00 PM 07/05/2015 12:59:00 PM 9409 Bluegrass Dr in Austin (TX) Stray Normal Dog Spayed Female 8 years English Springer Spaniel White/Liver
A724273 Runster 04/14/2016 06:43:00 PM 04/14/2016 06:43:00 PM 2818 Palomino Trail in Austin (TX) Stray Normal Dog Intact Male 11 months Basenji Mix Sable/White
A665644 NaN 10/21/2013 07:59:00 AM 10/21/2013 07:59:00 AM Austin (TX) Stray Sick Cat Intact Female 4 weeks Domestic Shorthair Mix Calico
A682524 Rio 06/29/2014 10:38:00 AM 06/29/2014 10:38:00 AM 800 Grove Blvd in Austin (TX) Stray Normal Dog Neutered Male 4 years Doberman Pinsch/Australian Cattle Dog Tan/Gray
如果在某个时候我们需要恢复默认的数字索引呢,这时就可以使用 reset_index()函数了
df.reset_index()
Output:
Animal ID Name DateTime MonthYear Found Location Intake Type Intake Condition Animal Type Sex upon Intake Age upon Intake Breed Color
0 A786884 *Brock 01/03/2019 04:19:00 PM 01/03/2019 04:19:00 PM 2501 Magin Meadow Dr in Austin (TX) Stray Normal Dog Neutered Male 2 years Beagle Mix Tricolor
1 A706918 Belle 07/05/2015 12:59:00 PM 07/05/2015 12:59:00 PM 9409 Bluegrass Dr in Austin (TX) Stray Normal Dog Spayed Female 8 years English Springer Spaniel White/Liver
2 A724273 Runster 04/14/2016 06:43:00 PM 04/14/2016 06:43:00 PM 2818 Palomino Trail in Austin (TX) Stray Normal Dog Intact Male 11 months Basenji Mix Sable/White
3 A665644 NaN 10/21/2013 07:59:00 AM 10/21/2013 07:59:00 AM Austin (TX) Stray Sick Cat Intact Female 4 weeks Domestic Shorthair Mix Calico
4 A682524 Rio 06/29/2014 10:38:00 AM 06/29/2014 10:38:00 AM 800 Grove Blvd in Austin (TX) Stray Normal Dog Neutered Male 4 years Doberman Pinsch/Australian Cattle Dog Tan/Gray
此方法的默认行为包括用默认的基于整数的索引替换现有的 DataFrame 索引,并将旧索引转换为与旧索引同名的新列(或名称索引)。此外,默认情况下,reset_index() 方法会从 MultiIndex 中删除所有级别并且不会影响原始 DataFrame