以titanic的训练数据为例进行展示,为了简化取前十行为例
首先导入模块,导入数据
import pandas as pd
import numpy as np
df = pd.read_csv(r"C:\Users\admin\Desktop\train.csv")
df = df.head(10)
df.index=['a','b','c','d','e','f','g','h','i','g']
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked a 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S b 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C c 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S d 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S e 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S f 6 0 3 Moran, Mr. James male NaN 0 0 330877 8.4583 NaN Q g 7 0 1 McCarthy, Mr. Timothy J male 54.0 0 0 17463 51.8625 E46 S h 8 0 3 Palsson, Master. Gosta Leonard male 2.0 3 1 349909 21.0750 NaN S i 9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27.0 0 2 347742 11.1333 NaN S g 10 1 2 Nasser, Mrs. Nicholas (Adele Achem) female 14.0 1 0 237736 30.0708 NaN C
筛选单列
1.利用df[列名]或df[[列名]]或df.列名均可以筛选出单列,但稍微有点区别,df[列名]和df.列名均为Series类型,而df[[列名]]为DataFrame类型
#例如选取Age列,注意写成df[]形式时,列名需要加单引号或双引号
df.Age
df['Age']
df[['Age']]
2.loc[行索引,列索引],索引为普通索引,传入行索引名称和列索引名称,筛选列时,行索引可以写成":"
#删选单列时,列索引名称可以加[]也可以不加,不加时返回为Series,加时返回为DataFrame
df.loc[:,'Age']
df.loc[:,['Age']]
3.iloc[行索引,列索引],索引为位置索引,传入行数和列数,筛选列时,行索引可以写成":"
df.iloc[:,1]
df.iloc[:,[1]]
筛选多列
1.利用df[[列名1,列名2,列名n]]可以筛选出多列,返回结果为DataFrame类型,例如筛选年龄列和姓名列
df[['Name','Age']]
2.loc[行索引,列索引],索引为普通索引,传入行索引名称和列索引名称,筛选列时,行索引可以写成":",如果筛选列不连续需要写成列表形式,如果列连续可以写成切片形式
df.loc[:,['Name','Age']]
df.loc[:,'Name':'Age']
3.iloc[行索引,列索引],索引为位置索引,传入行数和列数,筛选列时,行索引可以写成":",如果筛选列不连续需要写成列表形式,如果列连续可以写成切片形式
df.iloc[:,[3,4,5,6]]
df.iloc[:,3:6]