1.修改字段
import numpy as np
import pandas as pd
df = pd.DataFrame([['Snow','M',22],['Tyrion','M',32],['Sansa','F',18],['Arya','F',14]], columns=['name','gender','age'])
df.loc[df['name']=='Snow','name']='shy' 修改字段值 可以修改单个字段
也可以修改这样修改整个行的数据
df = pd.DataFrame({'ID':[1,2,3,4,5],'value1':[10,'NA',40,'NA','NA'],'value2':[2,4,6,8,10]})
df.loc[df['value1']=='NA','value1' ]=df['value2']
df.loc[(df['value3']=='NA') | (df['value3'] != df['value2']),'value3' ]=df['value2']
同理 df['value1'] = map(lambda x,y: x if(x != 'NA') else y, df['value1'], df['value2']) 这个种方式在python 2.x 中可以 在python3.x 中需要修改一下:
df['value1'] = list(map(lambda x,y: x if(x != 'NA') else y, df['value1'], df['value2']))
因为map 的返回值有原来的在python2.x 中的list 变为 在python 3.x 中 object 因此在返回的值的过程中可以用list()
paData['quality'] = map(lambda x: '优' if(x > 40) else '差', paData['score'])
df['name'] = map(lambda x: 'aaa' if(x == 'bbb') else x, df['name'])
2. 字典 元组 相关处理 转换datafram
tmpList=[[1515459600, 'V1.0.0', '0551', '1', 'XXXXXXXA', '2018-01-04 14:01:59', '414069', '2018-01-09 09:03:08', '1'], [1515459600, 'V1.0.0', '0551', '1', 'XXXXXXXB', '2018-01-04 14:01:59', '414069', '2018-01-09 09:03:08', '1']]
tmpList2=[[1515459600, 'V1.0.0', '0551', '1', 'XXXXXXXC', '2018-01-04 14:01:59', '414069', '2018-01-09 09:03:08', '1']]
dict_list['ZTEGC94EE06A']=tmpList
dict_list['ZTEGC94EE06C']=tmpList2
[i for j in dict_list for i in dict_list[j]] 类似结构
tmplist=[dict_list [i][0] for i in dict_list 转换为 list 然后 在
tempdata= pd.DataFrame(tmplist, columns=names) 可转换为DataFrame
3.字符串处理
TempInfo['Id'] = TempInfo['Id'].str.replace('.0','')
这样避免将 10.0 替换后 为1 的情况 replace 也可以用正则表达式来匹配你所需要的替换的目标字符串。
TempInfo['Id'] = TempInfo['Id'].str.replace("\.0","")
4.groupby 相关用法
df.groupby('a').agg({'b':['sum','min'],'c':'max'}) 给某一个字段 求和 和计算最小值。
df.groupby(['deviceid'],as_index=True).agg({'deviceid':['count'],'subdeviceCnt':['sum']}).reset_index()
df = pd.DataFrame({'tip': [16.99, 10.34, 23.68, 23.68, 24.59],
... 'sex': ['Female', 'Male', 'Male', 'Male', 'Female']})
df
sex tip
0 Female 16.99
1 Male 10.34
2 Male 23.68
3 Male 23.68
4 Female 24.59
temp=df.groupby('sex').agg({'sex':['count'],'tip':['sum']}).reset_index()
temp
sex tip sex
sum count
0 Female 41.58 2
1 Male 57.70 3
temp.columns=['sex','tip','sexCnt']
temp
sex tip sexCnt
0 Female 41.58 2
1 Male 57.70 3
I want to get this result
temp=temp[['sex','sexCnt','tip']]
temp
sex sexCnt tip
0 Female 2 41.58
1 Male 3 57.70
可以这样
temp=df.groupby('sex',as_index=True).agg({'sex':['count'],'tip':['sum']}).reset_index()
索引字段的值 聚合的值在索引后面,那么直接
temp.columns=['sex','sexCnt','tip'] 重命名
temp=temp[['sex','sexCnt','tip']] 排序 都可以了
获取符合条件的数据条数
temp=df[((df['type'] == '2.4G') | (df['type'] =='5G')) & (df['power'] !='')].groupby(['deviceid'])['deviceid'].count()
pd.concat([df1,df2],ignore_index=True)
待条件drop
df.drop(df['id'] > 100000,axis=0, inplace=True)
nullData=df[(df['Name'].isnull().values==True)]