pandas4 pandas的数据运算

最新推荐文章于 2024-04-21 15:34:23 发布

bigdata7

最新推荐文章于 2024-04-21 15:34:23 发布

阅读量418

点赞数 1

分类专栏： # Pandas 文章标签： python numpy pandas

by顾辞嘤嘤怪

本文链接：https://blog.csdn.net/qq_43636709/article/details/115819823

版权

Pandas 专栏收录该内容

12 篇文章 2 订阅

订阅专栏

文章目录

- 4.pandas数据运算
- - - - 算术运算
        函数的应用和映射
        排序
        统计汇总

4.pandas数据运算

算术运算

如果有相同索引则进行算术运算，如果没有则会进行数据对齐，但会引入缺失值。对于DataFrame类型，数据对齐的操作会同时发生在行和列上。

import pandas as pd
import numpy as np

##Series相加
obj1 = pd.Series([1,4,-1,9,0,-8], index=['a','b','d','e','f','g'])
obj2 = pd.Series([4,9,0,-4,-1,10], index=['a','c','d','e','f','h'])
print("obj1:\n",obj1)
print("obj2:\n",obj2)

print(obj1+obj2)
obj1:
 a    1
b    4
d   -1
e    9
f    0
g   -8
dtype: int64
obj2:
 a     4
c     9
d     0
e    -4
f    -1
h    10
dtype: int64
a    5.0
b    NaN
c    NaN
d   -1.0
e    5.0
f   -1.0
g    NaN
h    NaN
dtype: float64

## DataFrame  行和列均会对齐给NaN值
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.arange(12).reshape(3,4), columns=['a','b','c','d'], index=['A','B','C'])
df2 = pd.DataFrame(np.arange(9).reshape(3,3), columns=['a','c','d'], index=['A','B','D'])
print("df1：\n",df1)
print("df1：\n",df2)
print(df1+df2)
df1：
    a  b   c   d
A  0  1   2   3
B  4  5   6   7
C  8  9  10  11
df1：
    a  c  d
A  0  1  2
B  3  4  5
D  6  7  8
     a   b     c     d
A  0.0 NaN   3.0   5.0
B  7.0 NaN  10.0  12.0
C  NaN NaN   NaN   NaN
D  NaN NaN   NaN   NaN

函数的应用和映射

1.定义函数进行较为复杂的数据处理过程

（1）.map函数：将函数套入到Series的每个元素中

（2）.apply函数：将函数套用到DataFrame的行和列上，行和列通过axis参数指定。

（3）.applymap函数：将函数套用到DataFrame的每个元素上。

匿名函数：lambda 参数列表 : 关于参数的表达式 e.g:lambda x,y:x+y：该函数的输入是x和y，输出是x+y的值

#去掉水果价格中的   元  字
data = {'fruit':['apple','grape','banana'],'price':['30元','40元','50元']}
df = pd.DataFrame(data)
print(df)
def f1(x):
    return x.split('元')[0]#按’元‘分割 取第一个位置的
df['price'] = df['price'].map(f1)#map函数会循环给所给数据的每一个元素执行f1函数
print(df)
    fruit price
0   apple   30元
1   grape   40元
2  banana   50元
    fruit price
0   apple    30
1   grape    40
2  banana    50

##apply函数  套用到df的行与列  axis[轴]    axis=1 按行运算
df = pd.DataFrame(np.random.randn(3,3), columns=['a','b','c'], index=['app','win','mic'])
print(df)
df.apply(np.mean,axis=1)
            a         b         c
app -0.336255 -0.446342 -0.888068
win  2.742748  2.432790 -1.444682
mic  0.567298 -0.268666  0.039183

app   -0.556888
win    1.243619
mic    0.112605
dtype: float64

##applymap  套用到df每个元素 对整个df进行批量处理
#匿名函数：lambda 参数列表 : 关于参数列表的表达式（一行）【输入是传递进来的参数列表的值，输出是根据表达式计算所得的值】
print(df)
df.applymap(lambda x:'%.3f'%x)
            a         b         c
app -0.336255 -0.446342 -0.888068
win  2.742748  2.432790 -1.444682
mic  0.567298 -0.268666  0.039183

a	b	c
app	-0.336	-0.446	-0.888
win	2.743	2.433	-1.445
mic	0.567	-0.269	0.039

排序

在Series中，通过sort_index方法对索引进行排序，通过sort_values对数值进行排序，默认升序，降序加参数ascending=False。

##排序  Series  dataframe   sort_index([ascending=False]) 默认升序，False降序  sort_values([by='列名'])
obj = pd.Series([-1,0,-9,9,5],index=['a','c','b','e','d'])
print('值排序：\n',obj.sort_values())
print('索引降序：\n',obj.sort_index(ascending=False))

值排序：
 b   -9
a   -1
c    0
d    5
e    9
dtype: int64
索引降序：
 e    9
d    5
c    0
b   -9
a   -1
dtype: int64

对于DataFrame的排序，通过指定axis轴的方向，使用sort_index对行或列索引进行排序，若要进行列排序，用sort_values(by='列名')。

#DataFrame
print(df)
print(df.sort_values(by='a'))
            a         b         c
app -0.336255 -0.446342 -0.888068
win  2.742748  2.432790 -1.444682
mic  0.567298 -0.268666  0.039183
            a         b         c
app -0.336255 -0.446342 -0.888068
mic  0.567298 -0.268666  0.039183
win  2.742748  2.432790 -1.444682

统计汇总

1.数据汇总：sum函数可以对每列求和汇总。axis=1可以实现按行汇总

##数据汇总  axis=1是按行  默认按列
print(df)
print('按列汇总：\n',df.sum())
print('按行汇总：\n',df.sum(axis=1))
            a         b         c
app -0.336255 -0.446342 -0.888068
win  2.742748  2.432790 -1.444682
mic  0.567298 -0.268666  0.039183
按列汇总：
 a    2.973791
b    1.717783
c   -2.293567
dtype: float64
按行汇总：
 app   -1.670665
win    3.730856
mic    0.337815
dtype: float64

2.数据的描述与统计

描述性统计表：

方法名称	说明	方法名称	说明
min	最小值	max	最大值
mean	均值	ptp	极差
std	标准差	var	方差
cov	协方差	sem	标准误差
median	中位数	mode	众数
skew	样本偏度	kurt	样本峰度
quantitle	四分位数	count	非空值数目
describe	统计描述	mad	平均绝对离差

对于类别型特征的描述性统计，可以使用频数统计表。unique获取不重复的值。value_counts实现频数统计。

#数据的描述与统计
obj = pd.Series([1,2,3,0,5,6,0,0,3])
print('去重：\n',obj.unique())
print('频数统计：\n',obj.value_counts())
去重：
 [1 2 3 0 5 6]
频数统计：
 0    3
3    2
1    1
2    1
5    1
6    1
dtype: int64

bigdata7

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
2
评论
pandas4 pandas的数据运算

4.pandas数据运算算术运算如果有相同索引则进行算术运算，如果没有则会进行数据对齐，但会引入缺失值。对于DataFrame类型，数据对齐的操作会同时发生在行和列上。import pandas as pdimport numpy as np##Series相加obj1 = pd.Series([1,4,-1,9,0,-8], index=['a','b','d','e','f','g'])obj2 = pd.Series([4,9,0,-4,-1,10], index=['a','c','
复制链接

扫一扫