pandas将df赋值到另一个df_pandas基础

最新推荐文章于 2024-06-12 10:26:10 发布

塔比星上琉球生

最新推荐文章于 2024-06-12 10:26:10 发布

阅读量6k

点赞数 2

文章标签： pandas将df赋值到另一个df

本文链接：https://blog.csdn.net/weixin_33498603/article/details/112826050

版权

1 dataframe和series操作

1.1 构造dataframe

1) 空dataframe

通过columns,index参数指定dataframe的行索引、列名。

df1 = pd.DataFrame(columns=['c1', 'c2'], index=['ind1', 'ind2'])

2) 通过字典构造dataframe，并给dataframe添加列；可以通过index=[]参数设置df的索引。

dict_v = {'c1': ['a', 'b', 'c'],

'c2': [1, 2, 3]}

df1 = pd.DataFrame(dict_v)

print(df1.shape[1])

df1['randn_value'] = np.random.normal(0, 1, len(df1))

1.2 取dataframe数据

1) iterrows()

for index_, row in df1.iterrows():

print(index_)

print(row, row.values[0])

iterrows()按行遍历dataframe或series，返回两个元素：行索引和行值，其中行值为Series格式，可以根据list索引取具体的数值。

结果：

2) itertuples()

itertuples为每一行产生一个namedtuple，并且行的索引值作为元组的第一个元素，pandas类型。

for nametuple in df1.itertuples():

print(nametuple)

print(nametuple[0], type(nametuple))

3) dataframe按某列的值取数据

temp_df[temp_df.c1 == 'aa']['c2']

dataframe对象temp_df的c1列值为aa时temp_df的c2列的值；

4) dataframe根据列值取行索引

temp_df[temp_df.c1 == 'aa'].index.to_list()

temp_df的c1列值为aa时,temp_df的行的索引；

5) dataframe 根据行索引和列名修改dataframe的值

final_result.loc[10, 'c1'] = abc

将final_result的10行c1列的值改为abc；

6) dataframe根据列名取值

sorted(list(set(pre_result['c3'].to_list())))

返回list格式，set()去重，list()改回list格式，sorted()排序；

7) 取dataframe的部分列

df_after[['c1', 'c3', 'c6']]

8) dataframe.loc[0]——根据数字型行索引的值取值，

print(retemp.loc[0])

retemp:

asset_return asset_vol

0 1 2

1 11 12

output:

asset_return 1

asset_vol 2

9) dataframe.iloc[1]——根据行顺序取值，取第i行，从0开始

retemp:

asset_return

asset_vol

b a

ee dd

output：

asset_return dd

asset_vol ee

10)根据多个条件过滤数据

tmp = holding.loc[(holding['ACCOUNT'] == '高') & ((holding ['ASSET'] == '企业债') | (holding['ASSET'] == '金融债'))]

取名为holding 的dataframe的ACCOUNT列元素为“高”且ASSET列元素为'企业债'或'金融债'的所有行数据

holding:

ASSET ACCOUNT DV10

0 企业债高 5.000

1 金融债高 5.000

2 国债高 2.109

3 企业债资本 5.000

4 金融债资本 5.000

5 国债资本 2.568

6 企业债低 5.000

7 金融债低 5.000

8 国债低 1.745

output:

ASSET ACCOUNT DV10

0 企业债高 5.0

1 金融债高 5.0

1.3 索引

1) 修改列名或索引名

df_after = df_after.rename(columns={base_type: 'values'})

修改列名base_type为’values’。

2) dataframe.set_index('return')——将return列的值作为新的行索引

print(retemp, '\n', retemp.set_index('return'))

output：

return asset_vol

0 a b

1 dd ee

return

asset_vol

b a

ee dd

3) dataframe.reset_index()

重置行索引为数字型索引,参数drop=True时丢失原索引列，参数inplace=True时，修改原df，为false时，返回新的df

output： (drop=False)

index return vol

0 0 a b

1 1 dd ee

output： drop=True

return vol

0 a b

1 dd ee

retemp:

return

vol

b a

ee dd

print(retemp.reset_index(inplace=False))

print(retemp)

output：inplace=False时返回的结果

index return vol

0 0 a b

1 1 dd ee

原retemp没有改变

return vol

0 a b

1 dd ee

1.4 df.drop_duplicates(keep='first', inplace=False)

删除series的重复项

1.5 dataframe的apply方法

1) 使用apply方法对dataframe的列应用某函数

def apply_age(x,bias):

return x+bias

#以元组的方式传入额外的参数

data["age"] = data["age"].apply(apply_age,args=(-3,))

data.head()

修改前：

stature weight smoker gender age color

0 169 82 False 女 64 2

1 189 49 False 男 85 2

2 182 40 False 女 36 2

3 162 51 False 女 77 0

4 156 75 True 女 20 0

修改后：新增加一列'new_age'

stature weight smoker gender age color new_age

0 169 82 False 女 64 2 61

1 189 49 False 男 85 2 82

2 182 40 False 女 36 2 33

3 162 51 False 女 77 0 74

4 156 75 True 女 20 0 17

2) 对多列使用apply函数，并指定沿哪一轴

axis=0，列；axis=1，行。

# 沿着0轴求和

print(data)

res = data[["stature","weight","age"]].apply(np.sum, axis=0)

print(res)

data:

stature weight smoker gender age color new\_age

0 156 43 False 女 20 0 15

1 167 89 True 女 31 1 26

2 161 74 True 男 24 2 19

3 185 63 True 女 68 0 63

4 156 51 False 男 28 1 23

5 166 64 False 男 43 1 38

6 158 53 True 男 37 2 32

7 189 87 True 女 33 1 28

8 182 79 True 男 75 1 70

9 152 84 True 男 35 2 30

output:

stature 1672

weight 687

age 394

1.6 np.asarray(a, dtype=None, order=None)

将输入转化为数组

1.7 dataframe指定的列做运算，列名要一样

res_df['q'] = res_df['q'] / df['q']

res_df和df的q列相除

1.8 判断dataframe的数据是否为NaN

temp_df[temp_df.c1 == cc_]['c2'] is np.NaN

1.9 拼接dataframe，pd.concat()

拼接dataframe，列索引可以不一样，拼接后的结果会保留原df的行索引，当两个df的列数量不一样时，会填充NaN

df = pd.concat([df1, df2])

input：

Q1 Q2

0 asset path

1 asset path

2 asset path

3 asset path

Q1 Q2 Q3

0 quater 0.6641355 0.664235635

1 quater 0.6641355 0.664235635

2 quater 0.6641355 0.664235635

3 quater 0.6641355 0.664235635

4 quater 0.6641355 0.664235635

output:

Q1 Q2 Q3

0 asset path NaN

1 asset path NaN

2 asset path NaN

3 asset path NaN

0 quater 0.6641355 0.664235635

1 quater 0.6641355 0.664235635

2 quater 0.6641355 0.664235635

3 quater 0.6641355 0.664235635

4 quater 0.6641355 0.664235635

1.10 设置打印的数据小数点位数

pd.set_option('precision', 10)

1.11 字典转dataframe，设置数据格式

data1 = {'Q1': ['0.1', '0.2', 0.3],

'Q2': [1, 2, '3']}

df1 = pd.DataFrame(data1, dtype=np.float)

1.12 df.shift(n)

dataframe向下平移n行，n是负数时表示向上平移，n为正数则向下平移。

1.13 操作文件

(一)to_excel()可以选择要保存的sheet

writer = pd.ExcelWriter('df2.xlsx',)

df1.to_excel(writer, index=False, sheet_name='aaa')

(二)to_csv(),mode='a'——设置写入的模式：添加

一次写入

df1.to_csv(path_or_buf=file_name, index=False, mode='a')

分行写入，newline=''——去空行

with open("test0.csv", "a+", newline='') as csvfile:

writer = csv.writer(csvfile, dialect='excel')

# 先写入columns_name

writer.writerow(['Q1', "Q2", "Q3", "Q3"])

for i in range(len(column1)):

# 写入多行用writerows

writer.writerow([column1[i], colum

最低0.47元/天解锁文章

塔比星上琉球生

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
pandas将df赋值到另一个df_pandas基础

1 dataframe和series操作1.1 构造dataframe1) 空dataframe通过columns,index参数指定dataframe的行索引、列名。df1 = pd.DataFrame(columns=['c1', 'c2'], index=['ind1', 'ind2'])2) 通过字典构造dataframe，并给dataframe添加列；可以通过index=[]参数设置df...
复制链接

扫一扫