动手学数据分析第二节数据重构

最新推荐文章于 2023-01-31 00:30:00 发布

mumuok

最新推荐文章于 2023-01-31 00:30:00 发布

阅读量96

点赞数

本文链接：https://blog.csdn.net/mumuok/article/details/118861744

版权

#数据重构

2.4 数据的合并

2.4.2：任务二：使用concat方法：

*注意：横向合并要加axis=1，纵向合并不需要，默认的。
数据列分布在不同的csv文件里了，要合并回一起
使用concat方法：将数据train-left-up.csv和train-right-up.csv横向合并为一张表


list_up = [text_left_up,text_right_up]
result_up = pd.concat(list_up,axis=1)
result_up.head()

使用上面同样的方法：将train-left-down和train-right-down横向合并为一张表，并保存这张表为result_down。
然后将上边的result_up和result_down纵向合并为result。

list_down=[text_left_down,text_right_down]
result_down = pd.concat(list_down,axis=1)
result = pd.concat([result_up,result_down])
result.head()

2.4.4 任务四：使用DataFrame自带的方法join方法和append：完成任务二和任务三的任务

注意：join是横向的。append是纵向。

resul_up = text_left_up.join(text_right_up)
result_down = text_left_down.join(text_right_down)
result = result_up.append(result_down)
result.head()
resul_up = text_left_up.join(text_right_up)
result_down = text_left_down.join(text_right_down)
result = result_up.append(result_down)
result.head()

2.4.5 任务五：使用Panads的merge方法和DataFrame的append方法：完成任务二和任务三的任务

注意：merge要指定关联表用的外键，或两表有相同列名。
index索引作为连接键：

result_up = pd.merge(text_left_up,text_right_up,left_index=True,right_index=True)
result_down = pd.merge(text_left_down,text_right_down,left_index=True,right_index=True)
result = resul_up.append(result_down)
result.head()

2.5 换一种角度看数据

2.5.1 任务一：将我们的数据变为Series类型的数据

注意：stack（）函数将dataframe变成了series，所有列挤到一列
stack()即“堆叠”，作用是将列旋转到行
unstack()即stack()的反操作，将行旋转到列
在这里插入图片描述

第一部分：数据聚合与运算

group by

2.4.2：任务二：计算泰坦尼克号男性与女性的平均票价

df  = text['Fare'].groupby(text['Sex'])
means = df.mean()
means

我习惯写成一行，也是可以的

means = text.groupby(text['Sex'])['Fare'].mean()

2.4.3：任务三：统计泰坦尼克号中男女的存活人数

survived_sex = text[‘Survived’].groupby(text[‘Sex’]).sum()
survived_sex.head()
Sex
female 233
male 109
Name: Survived, dtype: int64

2.4.4：任务四：计算客舱不同等级的存活人数

survived_pclass = text[‘Survived’].groupby(text[‘Pclass’])
survived_pclass.sum()

结论：女性平均票价高，存活率高。

注意：agg方法要熟悉

python中的agg函数通常用于调用groupby（）函数之后，对数据做一些聚合操作，包括sum，min,max以及其他一些聚合函数
如下所示：

>>> df = pd.read_excel(r"D:/myExcel/1.xlsx")
>>> df
        A   B   C
0     bob  12  45
1  millor  15  23
2     bob  34  88
3     bob  98  23

（1）获取按A分组后B列的最大值
>>> df.groupby(by='A').agg({'B':'max'})
         B
A         
bob     98
millor  15
（2）获取按A分组后B列的最大值和最小值
>>> df.groupby(by='A').agg({'B':['max','min']})
         B    
       max min
A             
bob     98  12
millor  15  15
（3）获取按A分组后B列的最大值和最小值以及C列的最大值
>>> df.groupby(by='A').agg({'B':['max','min'], 'C':'min'})
         B       C
       max min min
A                 
bob     98  12  23
millor  15  15  23
（4）默认是以函数名称命名的，可以修改
>>> df.groupby(by='A').agg(
b_min=pd.NamedAgg(column='B', aggfunc='min'),
b_max=pd.NamedAgg(column='B', aggfunc='max'))
        b_min  b_max
A                   
bob        12     98
millor     15     15

text.groupby('Sex').agg({'Fare': 'mean', 'Pclass': 'count'}).rename(columns=
                            {'Fare': 'mean_fare', 'Pclass': 'count_pclass'})

mumuok

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
动手学数据分析第二节数据重构

#数据重构2.4 数据的合并2.4.2：任务二：使用concat方法：*注意：横向合并要加axis=1，纵向合并不需要，默认的。数据列分布在不同的csv文件里了，要合并回一起使用concat方法：将数据train-left-up.csv和train-right-up.csv横向合并为一张表list_up = [text_left_up,text_right_up]result_up = pd.concat(list_up,axis=1)result_up.head()使用上面同样的方法
复制链接

扫一扫