利用pandas对表格中的数据进行筛选,找出符合条件的数据
导入包和数据
import numpy as np
import pandas as pd
df=pd.read_csv('mytrain.csv')
df.head(3)
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | Unnamed: 12 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S | NaN |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C | NaN |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S | NaN |
1.简单的条件筛选
筛选出age<10的所有信息
df[df["Age"]<10].head(3)
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | Unnamed: 12 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7 | 8 | 0 | 3 | Palsson, Master. Gosta Leonard | male | 2.0 | 3 | 1 | 349909 | 21.075 | NaN | S | NaN |
10 | 11 | 1 | 3 | Sandstrom, Miss. Marguerite Rut | female | 4.0 | 1 | 1 | PP 9549 | 16.700 | G6 | S | NaN |
16 | 17 | 0 | 3 | Rice, Master. Eugene | male | 2.0 | 4 | 1 | 382652 | 29.125 | NaN | Q | NaN |
筛选Age在[10,50]的信息,并重命名数据
midage=df[(df["Age"]>10)&(df["Age"]<50)]
midage.head(3)
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | Unnamed: 12 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S | NaN |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C | NaN |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S | NaN |
2.筛选显示指定行的指定列
筛选midage数据中第100行的"Pclass"和"Sex"的数据
midage=midage.reset_index(drop=True)
midage.head(3)
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | Unnamed: 12 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S | NaN |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C | NaN |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S | NaN |
reset_index()的作用:重置索引,使数据获得新的index,drop=True是不保留原来的索引,默认为Flase
midage.loc[[100],['Pclass','Sex']]
Pclass | Sex | |
---|---|---|
100 | 2 | male |
midage.loc[[100,105,108],['Pclass','Name','Sex']]
Pclass | Name | Sex | |
---|---|---|---|
100 | 2 | Byles, Rev. Thomas Roussel Davids | male |
105 | 3 | Cribb, Mr. John Hatfield | male |
108 | 3 | Calic, Mr. Jovo | male |
用iloc方法筛选
将midage中100,105,108行的"Pclass",“Name”,"Sex"筛选出来
midage.iloc[[100,105,108],[2,3,4]]
Pclass | Name | Sex | |
---|---|---|---|
100 | 2 | Byles, Rev. Thomas Roussel Davids | male |
105 | 3 | Cribb, Mr. John Hatfield | male |
108 | 3 | Calic, Mr. Jovo | male |