melt函数
pd.melt(frame,id_vars,value_vars,var_name,value_name,col_level)
frame:需要转换的数据集
id_vars:不需要转换的列名
value_vars:需要转换的列名,如果除了不需要转换的剩下的全部都需要转化则可以省略不写
var_name:将列名转置行后新命名的列名
value_name:给值重新命名的列名
例子:
df=pd.dataframe({
'city':{0:'china',1:'china',2:'china'},
'code':{0:'a',1:'b',2:'c'},
'level':{0:'sku',1:'brand',2:'cat'},
'value201601':{0:1,1:2,2:3},
'value201602':{0:2,1:4,2:5}} )
City Code Level Value200601 Value200602 0 China a SKU 1 2 1 China b Brand 2 4 2 China c CAT 3 5
pd.melt(df,id_vars=['City','Code','Level'],var_name='trans_all',value_name='value')
City Code Level trans_all value 0 China a SKU Value200601 1 1 China b Brand Value200601 2 2 China c CAT Value200601 3 3 China a SKU Value200602 2 4 China b Brand Value200602 4 5 China c CAT Value200602 5
如果没有设置var_name,则默认为variable
data=pd.melt(df,id_vars=['City','Code','Level'],value_name='value')
City Code Level variable value 0 China a SKU Value200601 1 1 China b Brand Value200601 2 2 China c CAT Value200601 3 3 China a SKU Value200602 2 4 China b Brand Value200602 4 5 China c CAT Value200602 5
pivot_table函数
pivote_table(data,value,index,columns)
data:需要转换的数据
index:不需要转换的列名(固定不变的列)
columns:需要将某列的值转化为列名的列名
value:值所对应的列名
例子:
data.pivot_table(index=['City','Code','Level'],columns=['variable'],values='value').reset_index()
variable City Code Level Value200601 Value200602 0 China a SKU 1 2 1 China b Brand 2 4 2 China c CAT 3 5
df=pd.DataFrame({
'From_To':['LoNDon_paris','Madrid_miLAN','londON_StockhOlm','Budapest_PaRis','Brussels_londOn'],
'FlighNumber':[10045.0,np.nan,10065.0,np.nan,10085.0],
'RecenDelays':[[23,47],[],[24,43,87],[13],[67,32]],
'Ariline':['KLM(!)','<Air France>(12)','(British Airways.)','12. Air France','"Swiss Air"']
})
Frome_to FlighNumber RecenDelays Ariline 0 LoNDon_paris 10045.0 [23,47] KLM(!) 1 Madrid_miLAN NaN [] (12) 2 iondON_StockhOlm 10065.0 [24,43,87] (British Airways.) 3 Budapest_PaRis NaN [13] 12. Air France 4 Brussels_londOn 10085.0 [67,32] “Swiss Air”
将某列的值转化为大写
df['From_to']=df.From_to.str.upper()
将某列的值转化为小写
df.From_to.str.lower()
将form_to 分开为from to 两列
方法一:
df1=df.From_to.str.split('_',expand=True).rename(columns={0:'From',1:'to'})
df_new=df1.join(df.drop('From_to',axis=1))
df_new['Ariline']=df.Ariline.str.extract('([a-zA-Z\s)]+)')
From to FlighNumber RecenDelays Ariline
From to FlithNumber RecenDelays Ariline 0 LONDON PARIS 10045.0 [23,47] KLM 1 MADRID MILAN NaN [] Air France 2 IONDON STOCKHOLM 10065.0 [24,43,87] British Airways 3 BUDAPEST PARIS NaN [13] Air France 4 BRUSSELS LONDON 10085.0 [67,32] Swiss Air
###方法二:
df2=df.From_to.str.extract('([a-zA-Z\s]+)_([a-zA-Z\s]+)').rename(columns={0:'from',1:'to'})
df3=df.drop('From_to',axis=1)
df_new1=df2.join(df3)
df_copy=df_new1.copy()
df_copy.drop('RecenDelays',axis=1,inplace=True)
df_copy.fillna(0)
From to FlithNumber RecenDelays Ariline 0 LONDON PARIS 10045.0 [23,47] KLM 1 MADRID MILAN 0.0 [] Air France 2 IONDON STOCKHOLM 10065.0 [24,43,87] British Airways 3 BUDAPEST PARIS 0.0 [13] Air France 4 BRUSSELS LONDON 10085.0 [67,32] Swiss Air