pandas学习笔记(第三弹)

注:本教程为系列教程此章节接前面第一弹

14 选取数据的子集

14.1 选取Series的行

14.1.1 从DataFrame中获取一列作为Series

city = college_data["CITY"]
print(city)
print("<"+"="*75+">")
print("类型为:",type(city))
INSTNM
Alabama A & M University                                            Normal
University of Alabama at Birmingham                             Birmingham
Amridge University                                              Montgomery
University of Alabama in Huntsville                             Huntsville
                                                                ...       
Rasmussen College - Overland Park                            Overland Park
National Personal Training Institute of Cleveland         Highland Heights
Bay Area Medical Academy - San Jose Satellite Location            San Jose
Excel Learning Center-San Antonio South                        San Antonio
Name: CITY, Length: 7535, dtype: object
<===========================================================================>
类型为: <class 'pandas.core.series.Series'>

14.1.2 iloc 用法

14.1.2.1 传入整数索引选取一个
city.iloc[0]
'Normal'
14.1.2.2 传入整数列表选取一个新的Series
# 当传入列表时发现获取出来的数据结构依然是一个Series
city.iloc[[0,1,2,3]]
INSTNM
Alabama A & M University                   Normal
University of Alabama at Birmingham    Birmingham
Amridge University                     Montgomery
University of Alabama in Huntsville    Huntsville
Name: CITY, dtype: object
14.1.2.3 分片获取
# 获取整数索引 [0,10) 步长为2,这样选出的依然为Series
city[0:10:2]
INSTNM
Alabama A & M University                     Normal
Amridge University                       Montgomery
Alabama State University                 Montgomery
Central Alabama Community College    Alexander City
Auburn University at Montgomery          Montgomery
Name: CITY, dtype: object

14.1.3 loc 用法

传入索引标签选取一个
city["Alabama A & M University"]
'Normal'
14.1.3.1 通过标签列表选取多行
# 这样通过传入的标签索引列表选取多行,返回的依然是Series
city[["Alabama A & M University","Amridge University"]]
INSTNM
Alabama A & M University        Normal
Amridge University          Montgomery
Name: CITY, dtype: object
14.1.3.2 分片选取
# 选取标签索引 [start_target,end_target] 步长为1的行,返回的是Series,注意这里是端点值都能取到
city["Alabama A & M University":"University of Alabama in Huntsville":1]
INSTNM
Alabama A & M University                   Normal
University of Alabama at Birmingham    Birmingham
Amridge University                     Montgomery
University of Alabama in Huntsville    Huntsville
Name: CITY, dtype: object

14.2 选取DataFrame的行

14.2.1 iloc用法

14.2.1.1 传入一个整数索引值获取一行数据(返回类型为Series)
college_data.iloc[0]
CITY                  Normal
STABBR                    AL
HBCU                       1
MENONLY                    0
                       ...  
PCTFLOAN              0.8284
UG25ABV               0.1049
MD_EARN_WNE_P10        30300
GRAD_DEBT_MDN_SUPP     33888
Name: Alabama A & M University, Length: 26, dtype: object
14.2.1.2 传入一个整数索引列表,返回多行数据,类型为(DataFrame)
college_data.iloc[[1,3,5,7,9]]
CITYSTABBRHBCUMENONLY...PCTFLOANUG25ABVMD_EARN_WNE_P10GRAD_DEBT_MDN_SUPP
INSTNM
University of Alabama at BirminghamBirminghamAL0.00.0...0.52140.24223970021941.5
University of Alabama in HuntsvilleHuntsvilleAL0.00.0...0.45960.26404550024097
The University of AlabamaTuscaloosaAL0.00.0...0.40100.08534190023750
Athens State UniversityAthensAL0.00.0...0.62960.64103900018595
Auburn UniversityAuburnAL0.00.0...0.34940.04154570021831

5 rows × 26 columns

14.2.1.3 分片获取
# 获取索引为 [1,10) 步长为2 中的数据行,返回为DataFrame
college_data.iloc[1:10:2]
CITYSTABBRHBCUMENONLY...PCTFLOANUG25ABVMD_EARN_WNE_P10GRAD_DEBT_MDN_SUPP
INSTNM
University of Alabama at BirminghamBirminghamAL0.00.0...0.52140.24223970021941.5
University of Alabama in HuntsvilleHuntsvilleAL0.00.0...0.45960.26404550024097
The University of AlabamaTuscaloosaAL0.00.0...0.40100.08534190023750
Athens State UniversityAthensAL0.00.0...0.62960.64103900018595
Auburn UniversityAuburnAL0.00.0...0.34940.04154570021831

5 rows × 26 columns

14.2.2 loc用法

14.2.2.1 传入一个标签获取一行
# 获取标签索引对应的数据行,返回类型为Series
college_data.loc["University of Alabama at Birmingham"]
CITY                  Birmingham
STABBR                        AL
HBCU                           0
MENONLY                        0
                         ...    
PCTFLOAN                  0.5214
UG25ABV                   0.2422
MD_EARN_WNE_P10            39700
GRAD_DEBT_MDN_SUPP       21941.5
Name: University of Alabama at Birmingham, Length: 26, dtype: object
14.2.2.2 传入一个标签列表获取多行
# 根据传入的标签列表返回相应的数据行,返回类型为DataFrame
college_data.loc[["University of Alabama at Birmingham","The University of Alabama"]]
CITYSTABBRHBCUMENONLY...PCTFLOANUG25ABVMD_EARN_WNE_P10GRAD_DEBT_MDN_SUPP
INSTNM
University of Alabama at BirminghamBirminghamAL0.00.0...0.52140.24223970021941.5
The University of AlabamaTuscaloosaAL0.00.0...0.40100.08534190023750

2 rows × 26 columns

14.2.2.3 分片获取
# 获取[start_target,end_tartget] 步长为 1 的数据行,返回为DataFrame
college_data.loc["University of Alabama at Birmingham":"University of Alabama in Huntsville":1]
CITYSTABBRHBCUMENONLY...PCTFLOANUG25ABVMD_EARN_WNE_P10GRAD_DEBT_MDN_SUPP
INSTNM
University of Alabama at BirminghamBirminghamAL0.00.0...0.52140.24223970021941.5
Amridge UniversityMontgomeryAL0.00.0...0.77950.85404010023370
University of Alabama in HuntsvilleHuntsvilleAL0.00.0...0.45960.26404550024097

3 rows × 26 columns

14.3 同时选取DataFrame的行和列

14.3.1 获取前n行m列

14.3.1.1 用 iloc 方法实现
# 获取前面两行三列数据
college_data.iloc[:2,:3]
CITYSTABBRHBCU
INSTNM
Alabama A & M UniversityNormalAL1.0
University of Alabama at BirminghamBirminghamAL0.0
14.3.1.2 用 loc方法 实现
# 获取行索引从[start_target,end_target]的行,和列索引为[start,end]的列
college_data.loc[:"University of Alabama at Birmingham",:"HBCU"]
CITYSTABBRHBCU
INSTNM
Alabama A & M UniversityNormalAL1.0
University of Alabama at BirminghamBirminghamAL0.0

14.3.2 获取全部行中的前n列

14.3.2.1 用 iloc 方法实现
college_data.iloc[:,:2]
CITYSTABBR
INSTNM
Alabama A & M UniversityNormalAL
University of Alabama at BirminghamBirminghamAL
Amridge UniversityMontgomeryAL
University of Alabama in HuntsvilleHuntsvilleAL
.........
Rasmussen College - Overland ParkOverland ParkKS
National Personal Training Institute of ClevelandHighland HeightsOH
Bay Area Medical Academy - San Jose Satellite LocationSan JoseCA
Excel Learning Center-San Antonio SouthSan AntonioTX

7535 rows × 2 columns

14.3.2.2 用 loc 方法实现
college_data.loc[:,:"STABBR"]
CITYSTABBR
INSTNM
Alabama A & M UniversityNormalAL
University of Alabama at BirminghamBirminghamAL
Amridge UniversityMontgomeryAL
University of Alabama in HuntsvilleHuntsvilleAL
.........
Rasmussen College - Overland ParkOverland ParkKS
National Personal Training Institute of ClevelandHighland HeightsOH
Bay Area Medical Academy - San Jose Satellite LocationSan JoseCA
Excel Learning Center-San Antonio SouthSan AntonioTX

7535 rows × 2 columns

14.3.3 选取不连续的行和列

14.3.3.1 用 iloc 方法实现
college_data.iloc[[1,3,5,7],[2,4,6,8]]
HBCUWOMENONLYSATVRMIDDISTANCEONLY
INSTNM
University of Alabama at Birmingham0.00.0570.00.0
University of Alabama in Huntsville0.00.0595.00.0
The University of Alabama0.00.0555.00.0
Athens State University0.00.0NaN0.0
14.3.3.2 用 loc 方法实现
# 这里实现的需求同上
college_data.loc[["University of Alabama at Birmingham","University of Alabama in Huntsville","The University of Alabama","Athens State University"],
                ["HBCU","WOMENONLY","SATVRMID","DISTANCEONLY"]]
HBCUWOMENONLYSATVRMIDDISTANCEONLY
INSTNM
University of Alabama at Birmingham0.00.0570.00.0
University of Alabama in Huntsville0.00.0595.00.0
The University of Alabama0.00.0555.00.0
Athens State University0.00.0NaN0.0

14.3.4 选取某一个标量的值

14.3.4.1 用 iloc方法实现
# 选取第四行四列的值
college_data.iloc[3,3]
0.0
14.3.4.2 用 loc 方法实现
# 实现的需求同上
college_data.loc["Athens State University","MENONLY"]
0.0
14.3.4.3 使用 iat 快速获取标量
%timeit college_data.iloc[1000,3]
# 可以看到使用iat方法,时间上大概节约了一半
%timeit college_data.iat[1000,3]
7.95 µs ± 691 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
4.74 µs ± 21.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
14.3.4.4 使用at快速获取标量
%timeit college_data.loc["Rasmussen College - Overland Park","CITY"]
# 同样发现使用at方法比loc的速度也快
%timeit college_data.at["Rasmussen College - Overland Park","CITY"]
6.41 µs ± 58.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
4.16 µs ± 20.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

14.4 补充

14.4.1 惰性切片

# 同样试用于Series
college_data[2:10:2]
CITYSTABBRHBCUMENONLY...PCTFLOANUG25ABVMD_EARN_WNE_P10GRAD_DEBT_MDN_SUPP
INSTNM
Amridge UniversityMontgomeryAL0.00.0...0.77950.85404010023370
Alabama State UniversityMontgomeryAL1.00.0...0.75540.12702660033118.5
Central Alabama Community CollegeAlexander CityAL0.00.0...0.39770.31532750016127
Auburn University at MontgomeryMontgomeryAL0.00.0...0.58030.29303500021335

4 rows × 26 columns

# 利用标签索引获取
college_data[:"Central Alabama Community College"]
CITYSTABBRHBCUMENONLY...PCTFLOANUG25ABVMD_EARN_WNE_P10GRAD_DEBT_MDN_SUPP
INSTNM
Alabama A & M UniversityNormalAL1.00.0...0.82840.10493030033888
University of Alabama at BirminghamBirminghamAL0.00.0...0.52140.24223970021941.5
Amridge UniversityMontgomeryAL0.00.0...0.77950.85404010023370
University of Alabama in HuntsvilleHuntsvilleAL0.00.0...0.45960.26404550024097
Alabama State UniversityMontgomeryAL1.00.0...0.75540.12702660033118.5
The University of AlabamaTuscaloosaAL0.00.0...0.40100.08534190023750
Central Alabama Community CollegeAlexander CityAL0.00.0...0.39770.31532750016127

7 rows × 26 columns

14.4.2 按照字母分片

# 按照字母分片必须先对标签索引进行排序
college_data.sort_index(ascending=True)["A":"E"]
CITYSTABBRHBCUMENONLY...PCTFLOANUG25ABVMD_EARN_WNE_P10GRAD_DEBT_MDN_SUPP
INSTNM
A & W Healthcare EducatorsNew OrleansLA0.00.0...0.85960.6667NaN19022.5
A T Still University of Health SciencesKirksvilleMO0.00.0...NaNNaN219800PrivacySuppressed
ABC Beauty AcademyGarlandTX0.00.0...0.00000.8286NaNPrivacySuppressed
ABC Beauty College IncArkadelphiaAR0.00.0...1.00000.4688PrivacySuppressed16500
..............................
Durham Technical Community CollegeDurhamNC0.00.0...0.17960.59612720011069.5
Dutchess BOCES-Practical Nursing ProgramPoughkeepsieNY0.00.0...0.62750.5430365009500
Dutchess Community CollegePoughkeepsieNY0.00.0...0.19360.18063250010250
Dyersburg State Community CollegeDyersburgTN0.00.0...0.24930.3097268007475

1900 rows × 26 columns

# 当然还可以反向获取
college_data.sort_index()["E":"F"]
CITYSTABBRHBCUMENONLY...PCTFLOANUG25ABVMD_EARN_WNE_P10GRAD_DEBT_MDN_SUPP
INSTNM
E Q School of Hair DesignCouncil BluffsIA0.00.0...0.67370.1471181007830
ECPI UniversityVirginia BeachVA0.00.0...0.50010.66333700020000
ECPI University-CharlestonNorth CharlestonSCNaNNaN...NaNNaNNaN20000
ECPI University-CharlotteCharlotteNCNaNNaN...NaNNaNNaN20000
..............................
Excelsior CollegeAlbanyNY0.00.0...0.08000.9337PrivacySuppressed11010
Expertise Cosmetology InstituteLas VegasNV0.00.0...1.00000.4828PrivacySuppressed8450
Exposito School of Hair DesignAmarilloTX0.00.0...0.62670.396615100PrivacySuppressed
Expression College for Digital ArtsEmeryvilleCA0.00.0...0.77360.3955PrivacySuppressed35662

381 rows × 26 columns

14.4.3 更换索引

college_data.set_index("CITY")
STABBRHBCUMENONLYWOMENONLY...PCTFLOANUG25ABVMD_EARN_WNE_P10GRAD_DEBT_MDN_SUPP
CITY
NormalAL1.00.00.0...0.82840.10493030033888
BirminghamAL0.00.00.0...0.52140.24223970021941.5
MontgomeryAL0.00.00.0...0.77950.85404010023370
HuntsvilleAL0.00.00.0...0.45960.26404550024097
..............................
Overland ParkKSNaNNaNNaN...NaNNaNNaN21163
Highland HeightsOHNaNNaNNaN...NaNNaNNaN6333
San JoseCANaNNaNNaN...NaNNaNNaNPrivacySuppressed
San AntonioTXNaNNaNNaN...NaNNaNNaN12125

7535 rows × 25 columns

14.4.4 复原索引

college_data.reset_index()
INSTNMCITYSTABBRHBCU...PCTFLOANUG25ABVMD_EARN_WNE_P10GRAD_DEBT_MDN_SUPP
0Alabama A & M UniversityNormalAL1.0...0.82840.10493030033888
1University of Alabama at BirminghamBirminghamAL0.0...0.52140.24223970021941.5
2Amridge UniversityMontgomeryAL0.0...0.77950.85404010023370
3University of Alabama in HuntsvilleHuntsvilleAL0.0...0.45960.26404550024097
..............................
7531Rasmussen College - Overland ParkOverland ParkKSNaN...NaNNaNNaN21163
7532National Personal Training Institute of ClevelandHighland HeightsOHNaN...NaNNaNNaN6333
7533Bay Area Medical Academy - San Jose Satellite ...San JoseCANaN...NaNNaNNaNPrivacySuppressed
7534Excel Learning Center-San Antonio SouthSan AntonioTXNaN...NaNNaNNaN12125

7535 rows × 27 columns

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值