对数据表的操作经常需要重命名变量,或者中心赋值某个变量,可以使用numpy的where,也可以使用rename, 或者使用replace。
使用字典的方式比较方便。下面是代码:
import pandas as pd
import statsmodels.api as sm
import numpy as np
df = pd.read_excel("sn_short_for_reg.xlsx")
df.keys()
df['y_sat']
df_new = df[df['y_sat'].isnull() == False]
df_new.shape
y_sat = df_new['y_sat']
y_sat
y2 = np.where(y_sat >0, 1, 0)
y2
pd.Series(y2).value_counts()
df_new['y2'] = y2
df
df_new
df_new_short = df_new[df_new['sat_default_idx'] == 0]
df_new_short.shape
df_new_short
df_new_short
x3 = df_new_short['x3_post_level']
x3.value_counts()
d_x3 = {
"市":2,
"省":1,
"区":3,
"天津市":1,
"北京市":1,
"自治州":2,
"县":3,
"盟":2,
"重庆市":1,
"上海市":1
}
df_new_short['x3_post_level'].replace(d_x3)
df_new_short['x3_post_level']
df_new_short['x3_2'] =df_new_short['x3_post_level'].replace(d_x3)
df_new_short.head()
df_new_short.keys()
df_new_short['x1_psot_length']
d_newname = {
"x1_psot_length":"x1_post_length"
}
df_new_short.rename(columns=d_newname)
数据表概览
使用where赋值
使用字典 对变量重新赋值
结果如下:
使用字典重命名 变量,要使用inplace选项,不然数据表不会改