# 数据分析第三篇——Pandas之交叉表与透视表

23 篇文章 0 订阅

## 3.9 高级处理——交叉表与透视表

• 3.9.1 交叉表与透视表有什么作用
• 3.9.2 使用crosstab(交叉表)实现
• 3.9.3 使用pivot_table(透视表)实现

• 探究两个变量之间的关系

### 3.9.2 使用crosstab(交叉表)实现

• 交叉表：交叉表用于计算一列数据对于另一列数据的分组个数（寻找两个列之间的关系）
• pd.crosstab(value1, value2)

#### 1.准备数据

• 星期数据以及涨跌幅是好是坏数据
• pd.crosstab(星期数据列, 涨跌幅数据列)
import pandas as pd
stock

closeopenhighlowpre_closechangepct_chgvolamount
2020-03-132887.42652804.23222910.88122799.98412923.4856-36.0591-1.2334366450436393019665.2
2020-03-142923.48562936.01632944.46512906.28382968.5174-45.0318-1.5170307778457328209202.4
2020-03-152968.51743001.76163010.02862968.51742996.7618-28.2444-0.9425352470970378766619.0
2020-03-162996.76182918.93473000.29632904.79892943.290753.47111.8167393296648425017184.8
2020-03-172943.29072987.18052989.20512940.71383034.5113-91.2206-3.0061414560736438143854.6
2020-03-183034.51133039.93953052.44393029.46323071.6771-37.1658-1.2100362061533377388542.7
2020-03-193071.67713036.15453074.25713022.92623011.665760.01141.9926445425806482770471.4
2020-03-203011.66572981.80603012.00352974.35832992.896818.76890.6271353338278389893917.5
2020-03-212992.89683006.88883026.84202976.62302970.931221.96560.7394410108047447053681.5
2020-03-222970.93122899.31002982.50682899.31002880.303890.62743.1465367333369397244201.2
2020-03-232880.30382924.64072948.12612878.54432991.3288-111.0250-3.7116401216914432657775.0
2020-03-242991.32882992.49193009.45752980.47742987.92873.40010.1138350523658395955641.5
2020-03-252987.92872978.41953028.77882974.94233013.0501-25.1214-0.8338469049552495341447.3
2020-03-263013.05012982.06963016.94582943.71683031.2333-18.1832-0.5999441622762513128644.6
2020-03-273031.23333027.89253042.18213007.35573039.6692-8.4359-0.2775370430044451601363.1
2020-03-283039.66923022.24553058.89803020.14103030.15429.51500.3140364557276445062076.7
2020-03-293030.15422981.88023031.37062968.44512975.401954.75231.8402345732881413761364.1
2020-03-302975.40192979.52232998.27182971.82192984.9716-9.5697-0.3206315141151381331160.4
2020-03-312984.97162981.40972990.60032960.77512983.62241.34920.0452311665913374998562.6
2020-04-012983.62242924.99132983.63712924.99132917.007766.61472.2837313198007367014340.1
stock.index

DatetimeIndex(['2020-03-13', '2020-03-14', '2020-03-15', '2020-03-16',
'2020-03-17', '2020-03-18', '2020-03-19', '2020-03-20',
'2020-03-21', '2020-03-22', '2020-03-23', '2020-03-24',
'2020-03-25', '2020-03-26', '2020-03-27', '2020-03-28',
'2020-03-29', '2020-03-30', '2020-03-31', '2020-04-01'],


#### pandas日期类型

pd.to_datetime()

# 将标准格式的输入参数转化为pandas的日期类型
date = pd.to_datetime(stock.index)
date # 0表示星期一

DatetimeIndex(['2020-03-13', '2020-03-14', '2020-03-15', '2020-03-16',
'2020-03-17', '2020-03-18', '2020-03-19', '2020-03-20',
'2020-03-21', '2020-03-22', '2020-03-23', '2020-03-24',
'2020-03-25', '2020-03-26', '2020-03-27', '2020-03-28',
'2020-03-29', '2020-03-30', '2020-03-31', '2020-04-01'],

date.year

Int64Index([2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020,
2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020],

date.weekday

Int64Index([4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2], dtype='int64', name='trade_date')

# 准备星期数据列
stock['weekday'] = date.weekday
stock

closeopenhighlowpre_closechangepct_chgvolamountweekday
2020-03-132887.42652804.23222910.88122799.98412923.4856-36.0591-1.2334366450436393019665.24
2020-03-142923.48562936.01632944.46512906.28382968.5174-45.0318-1.5170307778457328209202.45
2020-03-152968.51743001.76163010.02862968.51742996.7618-28.2444-0.9425352470970378766619.06
2020-03-162996.76182918.93473000.29632904.79892943.290753.47111.8167393296648425017184.80
2020-03-172943.29072987.18052989.20512940.71383034.5113-91.2206-3.0061414560736438143854.61
2020-03-183034.51133039.93953052.44393029.46323071.6771-37.1658-1.2100362061533377388542.72
2020-03-193071.67713036.15453074.25713022.92623011.665760.01141.9926445425806482770471.43
2020-03-203011.66572981.80603012.00352974.35832992.896818.76890.6271353338278389893917.54
2020-03-212992.89683006.88883026.84202976.62302970.931221.96560.7394410108047447053681.55
2020-03-222970.93122899.31002982.50682899.31002880.303890.62743.1465367333369397244201.26
2020-03-232880.30382924.64072948.12612878.54432991.3288-111.0250-3.7116401216914432657775.00
2020-03-242991.32882992.49193009.45752980.47742987.92873.40010.1138350523658395955641.51
2020-03-252987.92872978.41953028.77882974.94233013.0501-25.1214-0.8338469049552495341447.32
2020-03-263013.05012982.06963016.94582943.71683031.2333-18.1832-0.5999441622762513128644.63
2020-03-273031.23333027.89253042.18213007.35573039.6692-8.4359-0.2775370430044451601363.14
2020-03-283039.66923022.24553058.89803020.14103030.15429.51500.3140364557276445062076.75
2020-03-293030.15422981.88023031.37062968.44512975.401954.75231.8402345732881413761364.16
2020-03-302975.40192979.52232998.27182971.82192984.9716-9.5697-0.3206315141151381331160.40
2020-03-312984.97162981.40972990.60032960.77512983.62241.34920.0452311665913374998562.61
2020-04-012983.62242924.99132983.63712924.99132917.007766.61472.2837313198007367014340.12
# 准备涨跌幅数据列
import numpy as np
stock['pon'] = np.where(stock['change'] > 0, 1, 0)
stock

closeopenhighlowpre_closechangepct_chgvolamountweekdaypon
2020-03-132887.42652804.23222910.88122799.98412923.4856-36.0591-1.2334366450436393019665.240
2020-03-142923.48562936.01632944.46512906.28382968.5174-45.0318-1.5170307778457328209202.450
2020-03-152968.51743001.76163010.02862968.51742996.7618-28.2444-0.9425352470970378766619.060
2020-03-162996.76182918.93473000.29632904.79892943.290753.47111.8167393296648425017184.801
2020-03-172943.29072987.18052989.20512940.71383034.5113-91.2206-3.0061414560736438143854.610
2020-03-183034.51133039.93953052.44393029.46323071.6771-37.1658-1.2100362061533377388542.720
2020-03-193071.67713036.15453074.25713022.92623011.665760.01141.9926445425806482770471.431
2020-03-203011.66572981.80603012.00352974.35832992.896818.76890.6271353338278389893917.541
2020-03-212992.89683006.88883026.84202976.62302970.931221.96560.7394410108047447053681.551
2020-03-222970.93122899.31002982.50682899.31002880.303890.62743.1465367333369397244201.261
2020-03-232880.30382924.64072948.12612878.54432991.3288-111.0250-3.7116401216914432657775.000
2020-03-242991.32882992.49193009.45752980.47742987.92873.40010.1138350523658395955641.511
2020-03-252987.92872978.41953028.77882974.94233013.0501-25.1214-0.8338469049552495341447.320
2020-03-263013.05012982.06963016.94582943.71683031.2333-18.1832-0.5999441622762513128644.630
2020-03-273031.23333027.89253042.18213007.35573039.6692-8.4359-0.2775370430044451601363.140
2020-03-283039.66923022.24553058.89803020.14103030.15429.51500.3140364557276445062076.751
2020-03-293030.15422981.88023031.37062968.44512975.401954.75231.8402345732881413761364.161
2020-03-302975.40192979.52232998.27182971.82192984.9716-9.5697-0.3206315141151381331160.400
2020-03-312984.97162981.40972990.60032960.77512983.62241.34920.0452311665913374998562.611
2020-04-012983.62242924.99132983.63712924.99132917.007766.61472.2837313198007367014340.121
# 产生交叉表
data = pd.crosstab(stock['weekday'], stock['pon'])
# 反映了星期几涨的有多少，跌的有多少
data

pon01
weekday
021
112
221
311
421
512
612
# 计算得到百分比
data.sum(axis=1)

weekday
0    3
1    3
2    3
3    2
4    3
5    3
6    3
dtype: int64

# 除数按列方向逐个除，对应不上除数的被除数为NaN被除，得到NaN
data.div(data.sum(axis=1)) # 用data除以data.sum(axis=1)

0123456
weekday
00.6666670.333333NaNNaNNaNNaNNaN
10.3333330.666667NaNNaNNaNNaNNaN
20.6666670.333333NaNNaNNaNNaNNaN
30.3333330.333333NaNNaNNaNNaNNaN
40.6666670.333333NaNNaNNaNNaNNaN
50.3333330.666667NaNNaNNaNNaNNaN
60.3333330.666667NaNNaNNaNNaNNaN
# 除数按列方向逐个除，对应不上除数的被除数为NaN被除，得到NaN
data / data.sum(axis=1)

0123456
weekday
00.6666670.333333NaNNaNNaNNaNNaN
10.3333330.666667NaNNaNNaNNaNNaN
20.6666670.333333NaNNaNNaNNaNNaN
30.3333330.333333NaNNaNNaNNaNNaN
40.6666670.333333NaNNaNNaNNaNNaN
50.3333330.666667NaNNaNNaNNaNNaN
60.3333330.666667NaNNaNNaNNaNNaN
# 按行方向一行一行地除
data.div(data.sum(axis=1), axis=0) # 唯一正确解

pon01
weekday
00.6666670.333333
10.3333330.666667
20.6666670.333333
30.5000000.500000
40.6666670.333333
50.3333330.666667
60.3333330.666667
# 除数按列方向逐个除，对应不上除数的被除数为NaN被除，得到NaN
data.div(data.sum(axis=1), axis=1)

0123456
weekday
00.6666670.333333NaNNaNNaNNaNNaN
10.3333330.666667NaNNaNNaNNaNNaN
20.6666670.333333NaNNaNNaNNaNNaN
30.3333330.333333NaNNaNNaNNaNNaN
40.6666670.333333NaNNaNNaNNaNNaN
50.3333330.666667NaNNaNNaNNaNNaN
60.3333330.666667NaNNaNNaNNaNNaN
# 画图显示
data.div(data.sum(axis=1), axis=0).plot(kind='bar')
data.div(data.sum(axis=1), axis=0).plot(kind='bar', stacked=True)


### 3.9.3 使用pivot_table(透视表)实现

• DataFrame.pivot_table([], index=[])

# 使用透视表，将整个过程变得更简单一些
stock.pivot_table(['pon'], index=['weekday']) # 得到的是['pon']列的‘1’对应的百分比

pon
weekday
00.333333
10.666667
20.333333
30.500000
40.333333
50.666667
60.666667
• 0
点赞
• 1
收藏
• 打赏
• 0
评论
05-21 3502
09-25 3万+
01-15 300
05-19 6117
04-23 229
09-09 90
07-29 3010
08-06 8663
12-21 349

### “相关推荐”对你有帮助么？

• 非常没帮助
• 没帮助
• 一般
• 有帮助
• 非常有帮助

¥2 ¥4 ¥6 ¥10 ¥20

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、C币套餐、付费专栏及课程。