pandas是数据分析最常用的库之一。经过pd方法导入的数据都是DataFrame类型(二维)或者Series类型(一维)。在这里记录下DataFrame的数据筛选。
import os
import pandas as pd
aqicsv = pd.read_csv("D:\\aqifit_numsum10\\newaqifit.csv")
2.此时的aqicsv为DataFrame格式,可进行describe操作,获得一些统计上的指标
- aqicsv[“predictaqi_norm1”].describe()
aqicsv["predictaqi_norm1"].describe()
3.可筛选出predictaqi_norm1这一列大于100的行
- aqicsv[aqicsv[“predictaqi_norm1”]>100]
aqicsv[aqicsv["predictaqi_norm1"]>100]
- <span style=“font-family: Arial, Helvetica, sans-serif;”></span><pre name=“code” class=“python”><span style=“font-family: Arial, Helvetica, sans-serif;”>aqicsv[aqicsv.predictaqi_norm1>100]</span><span style=“font-family: Arial, Helvetica, sans-serif;”> </span>
<span style="font-family: Arial, Helvetica, sans-serif;"></span><pre name="code" class="python"><span style="font-family: Arial, Helvetica, sans-serif;">aqicsv[aqicsv.predictaqi_norm1>100]</span><span style="font-family: Arial, Helvetica, sans-serif;"> </span>
4.可使用&(并)与| (或)实现多条件筛选
- aqicsv[(aqicsv[“FID”]>37898) & (aqicsv[“FID”]<38766) ]
aqicsv[(aqicsv["FID"]>37898) & (aqicsv["FID"]<38766) ]
- aqicsv[(aqicsv.predictaqi_norm1>150) |(aqicsv.predictaqi_norm1<100) ]
aqicsv[(aqicsv.predictaqi_norm1>150) |(aqicsv.predictaqi_norm1<100) ]
5.如果只需要其中两列数据,而同时利用另外两列进行筛选时可以这样.如果只需要其中的某几列可以写为aqicsv[[‘FID’,’x’,’y’]]
- aqicsv[[‘x’,‘y’]][(aqicsv.FID >10000) | (aqicsv.predictaqi_norm1 >150)]
aqicsv[['x','y']][(aqicsv.FID >10000) | (aqicsv.predictaqi_norm1 >150)]
6.可用isin方法筛选一些特定值,但要将其写入某一列中
- testlist = aqicsv.predictaqi_norm1[:50]
testlist = aqicsv.predictaqi_norm1[:50]
- aqicsv[‘predictaqi_norm1’].isin(testlist)
aqicsv['predictaqi_norm1'].isin(testlist)