假如df为datafram格式,df['index']为series格式,df[['index']]为datafram格式
1.判断series是否是空,判断非空数值
pd.isnull(row)#row为series值
row.notnull() ##row为series,和上面功能一样
if (pd.notnull(row["Date"]))&(pd.notnull(row["Coupon_id"])): #row为DataFram格式
2.非空数值选取,选取满足条件的某列,最后数据形式是datafram
u1=fdf[pd.notnull(fdf['Date_received'])][['User_id']] ##fdf为datafram,最后u1为datafram格式,拥有User_id这一列
u1['u_coupon_count'] = 1 ##u1新增一列,值全部为1
u3=fdf[((pd.notnull(fdf['Date']))&(pd.notnull(fdf['Date_received'])))][['User_id']].copy()
um3 = fdf[fdf['Date_received'].notnull()][['User_id', 'Merchant_id']].copy()
df.loc[df.country=='Italy'] ##选取country 列全是‘Italy’的数据
3.选取符合条件的所有datafram,包含行和列
feature=row[(row['Date'] < 20160516) | ((pd.isnull(row['Date'])) & (row['Date_received'] < 20160516))].copy() ##综合条件选取,feature为datafram且包含多列
dataset = row[(row['Date_received'] >= 20160516) & (row['Date_received'] <= 20160615)] #row为datafram
4.对某列进行条件选择
data_train.Age[data_train.Pclass == 1] ##满足pclass==1的部分age列数据,
Survived_0 = data_train.Embarked[data_train.Survived == 0]
df.loc[(df.Cabin.notnull()), 'Cabin' ] = "Yes" ##把cabin列非空数值转换成yes
dataset.loc[(dataset['Age'] > 16) & (dataset['Age'] <= 32), 'Age'] = 1
5.选取某行
means = row[['Distance', 'label']].groupby(['label']).mean()
col1 = float(means.loc[1]) ##1为index的索引值
df.loc[['a','b']]#a,b两行
df.loc[df['third']>3]#选取在third列中值大于3的行
train_test.loc[train_test["Fare"].isnull()] ##series.loc是在index寻找
train_test.loc[train_test["Age"].isnull()]['Survived'].mean()##查看缺失年龄的人的死亡率
6.选取某行某列
df.loc['a':'c','fir']#这是a到c行返回fir这一列的值
loc 不仅可以输入数字也可以直接column名字,注意先行后列
df.loc[[0, 1, 10, 100], ['country', 'province', 'region_1', 'region_2']]
表示index(行)为0,1,10,100,列名为'country', 'province', 'region_1', 'region_2'。