Pandas数据规整

Pandas数据规整

数据分析和建模方面的大量编程工作都是用在数据准备上的,有时候存放在文件或数据库中的数据并不能满足数据处理应用的要求

Pandas提供了一组高级的、灵活的、高效的核心函数和算法,它们能够轻松地将数据规整化为你需要的形式


合并

连接

Pandas提供了大量方法,能轻松的对Series,DataFrame和Panel执行合并操作

连接pandas对象 .concat()

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randn(10, 4))
df.head()
0123
00.2313081.193636-0.0332880.826399
1-0.421474-0.618510-1.266325-0.439435
2-0.2794570.5781441.131353-0.639720
3-1.197750-0.4465790.4957280.900704
4-0.638926-0.233019-1.106248-0.762133
pieces = [df[:2], df[3:5], df[7:]] # 这里面切片是前闭后开的
pieces
[          0         1         2         3
 0  0.231308  1.193636 -0.033288  0.826399
 1 -0.421474 -0.618510 -1.266325 -0.439435,
           0         1         2         3
 3 -1.197750 -0.446579  0.495728  0.900704
 4 -0.638926 -0.233019 -1.106248 -0.762133,
           0         1         2         3
 7 -0.265515 -0.705797  0.695531 -0.257374
 8  0.552615 -0.137180  0.859215 -0.853752
 9 -1.014105  0.392409 -1.832748  0.612679]
df2 = pd.concat(pieces)
df2
0123
00.2313081.193636-0.0332880.826399
1-0.421474-0.618510-1.266325-0.439435
3-1.197750-0.4465790.4957280.900704
4-0.638926-0.233019-1.106248-0.762133
7-0.265515-0.7057970.695531-0.257374
80.552615-0.1371800.859215-0.853752
9-1.0141050.392409-1.8327480.612679

追加 .append()

df = pd.DataFrame(np.random.randn(4, 4), columns=['A','B','C','D'])
df
ABCD
01.295901-0.7426360.873728-0.810075
11.0734560.3446270.1565971.460616
21.696282-1.2724571.226460-1.944458
3-0.4730470.147528-0.5382310.125467
s = df.iloc[2]
s
A    1.696282
B   -1.272457
C    1.226460
D   -1.944458
Name: 2, dtype: float64
df.append(s, ignore_index=True)
ABCD
01.295901-0.7426360.873728-0.810075
11.0734560.3446270.1565971.460616
21.696282-1.2724571.226460-1.944458
3-0.4730470.147528-0.5382310.125467
41.696282-1.2724571.226460-1.944458

分组

group by():一般指以下一个或多个操作步骤

  • Splitting 将数据分组
  • Applying 对每个分组应用不同的function
  • Combining 使用某种数据结果展示结果
df = pd.DataFrame({
    'A' : ['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'],
    'B' : ['one', 'one', 'two', 'three','two', 'two', 'one', 'three'],
    'C' : np.random.randn(8),
    'D' : np.random.randn(8)
    })
df
ABCD
0fooone0.5566991.543716
1barone-0.905349-0.054870
2footwo1.220397-0.589706
3barthree0.637305-0.046351
4footwo-0.150553-0.889157
5bartwo-0.7711320.196547
6fooone0.008275-0.571672
7foothree0.228275-1.164593
# 分组后sum求和:
a = df.groupby('A').sum()
a
CD
A
bar-1.0391760.095325
foo1.863094-1.671411
a = df.groupby('A',as_index=False).sum()
a
ACD
0bar-1.0391760.095325
1foo1.863094-1.671411
# 对多列分组后sum:
b = df.groupby(['A','B']).sum()
b
CD
AB
barone-0.905349-0.054870
three0.637305-0.046351
two-0.7711320.196547
fooone0.5649750.972044
three0.228275-1.164593
two1.069844-1.478862
b = df.groupby(['A','B'],as_index=False).sum()
b
ABCD
0barone-0.905349-0.054870
1barthree0.637305-0.046351
2bartwo-0.7711320.196547
3fooone0.5649750.972044
4foothree0.228275-1.164593
5footwo1.069844-1.478862
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值