数据分析第三篇——Pandas之交叉表与透视表

3.9 高级处理——交叉表与透视表

  • 3.9.1 交叉表与透视表有什么作用
  • 3.9.2 使用crosstab(交叉表)实现
  • 3.9.3 使用pivot_table(透视表)实现

3.9.1 交叉表与透视表的作用

  • 探究两个变量之间的关系

3.9.2 使用crosstab(交叉表)实现

  • 交叉表:交叉表用于计算一列数据对于另一列数据的分组个数(寻找两个列之间的关系)
    • pd.crosstab(value1, value2)

1.准备数据

  • 星期数据以及涨跌幅是好是坏数据
    • pd.crosstab(星期数据列, 涨跌幅数据列)
import pandas as pd
stock = pd.read_excel('stocks.xlsx', index_col=0)
stock
closeopenhighlowpre_closechangepct_chgvolamount
trade_date
2020-03-132887.42652804.23222910.88122799.98412923.4856-36.0591-1.2334366450436393019665.2
2020-03-142923.48562936.01632944.46512906.28382968.5174-45.0318-1.5170307778457328209202.4
2020-03-152968.51743001.76163010.02862968.51742996.7618-28.2444-0.9425352470970378766619.0
2020-03-162996.76182918.93473000.29632904.79892943.290753.47111.8167393296648425017184.8
2020-03-172943.29072987.18052989.20512940.71383034.5113-91.2206-3.0061414560736438143854.6
2020-03-183034.51133039.93953052.44393029.46323071.6771-37.1658-1.2100362061533377388542.7
2020-03-193071.67713036.15453074.25713022.92623011.665760.01141.9926445425806482770471.4
2020-03-203011.66572981.80603012.00352974.35832992.896818.76890.6271353338278389893917.5
2020-03-212992.89683006.88883026.84202976.62302970.931221.96560.7394410108047447053681.5
2020-03-222970.93122899.31002982.50682899.31002880.303890.62743.1465367333369397244201.2
2020-03-232880.30382924.64072948.12612878.54432991.3288-111.0250-3.7116401216914432657775.0
2020-03-242991.32882992.49193009.45752980.47742987.92873.40010.1138350523658395955641.5
2020-03-252987.92872978.41953028.77882974.94233013.0501-25.1214-0.8338469049552495341447.3
2020-03-263013.05012982.06963016.94582943.71683031.2333-18.1832-0.5999441622762513128644.6
2020-03-273031.23333027.89253042.18213007.35573039.6692-8.4359-0.2775370430044451601363.1
2020-03-283039.66923022.24553058.89803020.14103030.15429.51500.3140364557276445062076.7
2020-03-293030.15422981.88023031.37062968.44512975.401954.75231.8402345732881413761364.1
2020-03-302975.40192979.52232998.27182971.82192984.9716-9.5697-0.3206315141151381331160.4
2020-03-312984.97162981.40972990.60032960.77512983.62241.34920.0452311665913374998562.6
2020-04-012983.62242924.99132983.63712924.99132917.007766.61472.2837313198007367014340.1
stock.index
DatetimeIndex(['2020-03-13', '2020-03-14', '2020-03-15', '2020-03-16',
               '2020-03-17', '2020-03-18', '2020-03-19', '2020-03-20',
               '2020-03-21', '2020-03-22', '2020-03-23', '2020-03-24',
               '2020-03-25', '2020-03-26', '2020-03-27', '2020-03-28',
               '2020-03-29', '2020-03-30', '2020-03-31', '2020-04-01'],
              dtype='datetime64[ns]', name='trade_date', freq=None)

pandas日期类型

pd.to_datetime()

# 将标准格式的输入参数转化为pandas的日期类型
date = pd.to_datetime(stock.index)
date # 0表示星期一
DatetimeIndex(['2020-03-13', '2020-03-14', '2020-03-15', '2020-03-16',
               '2020-03-17', '2020-03-18', '2020-03-19', '2020-03-20',
               '2020-03-21', '2020-03-22', '2020-03-23', '2020-03-24',
               '2020-03-25', '2020-03-26', '2020-03-27', '2020-03-28',
               '2020-03-29', '2020-03-30', '2020-03-31', '2020-04-01'],
              dtype='datetime64[ns]', name='trade_date', freq=None)
date.year
Int64Index([2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020,
            2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020],
           dtype='int64', name='trade_date')
date.weekday
Int64Index([4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2], dtype='int64', name='trade_date')
# 准备星期数据列
stock['weekday'] = date.weekday
stock
closeopenhighlowpre_closechangepct_chgvolamountweekday
trade_date
2020-03-132887.42652804.23222910.88122799.98412923.4856-36.0591-1.2334366450436393019665.24
2020-03-142923.48562936.01632944.46512906.28382968.5174-45.0318-1.5170307778457328209202.45
2020-03-152968.51743001.76163010.02862968.51742996.7618-28.2444-0.9425352470970378766619.06
2020-03-162996.76182918.93473000.29632904.79892943.290753.47111.8167393296648425017184.80
2020-03-172943.29072987.18052989.20512940.71383034.5113-91.2206-3.0061414560736438143854.61
2020-03-183034.51133039.93953052.44393029.46323071.6771-37.1658-1.2100362061533377388542.72
2020-03-193071.67713036.15453074.25713022.92623011.665760.01141.9926445425806482770471.43
2020-03-203011.66572981.80603012.00352974.35832992.896818.76890.6271353338278389893917.54
2020-03-212992.89683006.88883026.84202976.62302970.931221.96560.7394410108047447053681.55
2020-03-222970.93122899.31002982.50682899.31002880.303890.62743.1465367333369397244201.26
2020-03-232880.30382924.64072948.12612878.54432991.3288-111.0250-3.7116401216914432657775.00
2020-03-242991.32882992.49193009.45752980.47742987.92873.40010.1138350523658395955641.51
2020-03-252987.92872978.41953028.77882974.94233013.0501-25.1214-0.8338469049552495341447.32
2020-03-263013.05012982.06963016.94582943.71683031.2333-18.1832-0.5999441622762513128644.63
2020-03-273031.23333027.89253042.18213007.35573039.6692-8.4359-0.2775370430044451601363.14
2020-03-283039.66923022.24553058.89803020.14103030.15429.51500.3140364557276445062076.75
2020-03-293030.15422981.88023031.37062968.44512975.401954.75231.8402345732881413761364.16
2020-03-302975.40192979.52232998.27182971.82192984.9716-9.5697-0.3206315141151381331160.40
2020-03-312984.97162981.40972990.60032960.77512983.62241.34920.0452311665913374998562.61
2020-04-012983.62242924.99132983.63712924.99132917.007766.61472.2837313198007367014340.12
# 准备涨跌幅数据列
import numpy as np
stock['pon'] = np.where(stock['change'] > 0, 1, 0)
stock
closeopenhighlowpre_closechangepct_chgvolamountweekdaypon
trade_date
2020-03-132887.42652804.23222910.88122799.98412923.4856-36.0591-1.2334366450436393019665.240
2020-03-142923.48562936.01632944.46512906.28382968.5174-45.0318-1.5170307778457328209202.450
2020-03-152968.51743001.76163010.02862968.51742996.7618-28.2444-0.9425352470970378766619.060
2020-03-162996.76182918.93473000.29632904.79892943.290753.47111.8167393296648425017184.801
2020-03-172943.29072987.18052989.20512940.71383034.5113-91.2206-3.0061414560736438143854.610
2020-03-183034.51133039.93953052.44393029.46323071.6771-37.1658-1.2100362061533377388542.720
2020-03-193071.67713036.15453074.25713022.92623011.665760.01141.9926445425806482770471.431
2020-03-203011.66572981.80603012.00352974.35832992.896818.76890.6271353338278389893917.541
2020-03-212992.89683006.88883026.84202976.62302970.931221.96560.7394410108047447053681.551
2020-03-222970.93122899.31002982.50682899.31002880.303890.62743.1465367333369397244201.261
2020-03-232880.30382924.64072948.12612878.54432991.3288-111.0250-3.7116401216914432657775.000
2020-03-242991.32882992.49193009.45752980.47742987.92873.40010.1138350523658395955641.511
2020-03-252987.92872978.41953028.77882974.94233013.0501-25.1214-0.8338469049552495341447.320
2020-03-263013.05012982.06963016.94582943.71683031.2333-18.1832-0.5999441622762513128644.630
2020-03-273031.23333027.89253042.18213007.35573039.6692-8.4359-0.2775370430044451601363.140
2020-03-283039.66923022.24553058.89803020.14103030.15429.51500.3140364557276445062076.751
2020-03-293030.15422981.88023031.37062968.44512975.401954.75231.8402345732881413761364.161
2020-03-302975.40192979.52232998.27182971.82192984.9716-9.5697-0.3206315141151381331160.400
2020-03-312984.97162981.40972990.60032960.77512983.62241.34920.0452311665913374998562.611
2020-04-012983.62242924.99132983.63712924.99132917.007766.61472.2837313198007367014340.121
# 产生交叉表
data = pd.crosstab(stock['weekday'], stock['pon'])
# 反映了星期几涨的有多少,跌的有多少
data
pon01
weekday
021
112
221
311
421
512
612
# 计算得到百分比
data.sum(axis=1)
weekday
0    3
1    3
2    3
3    2
4    3
5    3
6    3
dtype: int64
# 除数按列方向逐个除,对应不上除数的被除数为NaN被除,得到NaN
data.div(data.sum(axis=1)) # 用data除以data.sum(axis=1)
0123456
weekday
00.6666670.333333NaNNaNNaNNaNNaN
10.3333330.666667NaNNaNNaNNaNNaN
20.6666670.333333NaNNaNNaNNaNNaN
30.3333330.333333NaNNaNNaNNaNNaN
40.6666670.333333NaNNaNNaNNaNNaN
50.3333330.666667NaNNaNNaNNaNNaN
60.3333330.666667NaNNaNNaNNaNNaN
# 除数按列方向逐个除,对应不上除数的被除数为NaN被除,得到NaN
data / data.sum(axis=1)
0123456
weekday
00.6666670.333333NaNNaNNaNNaNNaN
10.3333330.666667NaNNaNNaNNaNNaN
20.6666670.333333NaNNaNNaNNaNNaN
30.3333330.333333NaNNaNNaNNaNNaN
40.6666670.333333NaNNaNNaNNaNNaN
50.3333330.666667NaNNaNNaNNaNNaN
60.3333330.666667NaNNaNNaNNaNNaN
# 按行方向一行一行地除
data.div(data.sum(axis=1), axis=0) # 唯一正确解
pon01
weekday
00.6666670.333333
10.3333330.666667
20.6666670.333333
30.5000000.500000
40.6666670.333333
50.3333330.666667
60.3333330.666667
# 除数按列方向逐个除,对应不上除数的被除数为NaN被除,得到NaN
data.div(data.sum(axis=1), axis=1)
0123456
weekday
00.6666670.333333NaNNaNNaNNaNNaN
10.3333330.666667NaNNaNNaNNaNNaN
20.6666670.333333NaNNaNNaNNaNNaN
30.3333330.333333NaNNaNNaNNaNNaN
40.6666670.333333NaNNaNNaNNaNNaN
50.3333330.666667NaNNaNNaNNaNNaN
60.3333330.666667NaNNaNNaNNaNNaN
# 画图显示
data.div(data.sum(axis=1), axis=0).plot(kind='bar')
data.div(data.sum(axis=1), axis=0).plot(kind='bar', stacked=True)

在这里插入图片描述

3.9.3 使用pivot_table(透视表)实现

  • DataFrame.pivot_table([], index=[])

使用透视表,刚才的过程更加简单

# 使用透视表,将整个过程变得更简单一些
stock.pivot_table(['pon'], index=['weekday']) # 得到的是['pon']列的‘1’对应的百分比
pon
weekday
00.333333
10.666667
20.333333
30.500000
40.333333
50.666667
60.666667
  • 0
    点赞
  • 1
    收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

汤圆不是油条

你的鼓励将是我创作的最大动力

¥2 ¥4 ¥6 ¥10 ¥20
输入1-500的整数
余额支付 (余额:-- )
扫码支付
扫码支付:¥2
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值