pandas python groupby_Pandas的数据分组-groupby函数

29. Pandas的数据分组-groupby函数

在SQL语言里有group by功能,在Pandas里有groupby函数与之功能相对应。DataFrame数据对象经groupby()之后有ngroups和groups等属性,本质是DataFrame类的子类DataFrameGroupBy的实例对象。ngroups反应的是分组的个数,而groups类似dict结构,key是分组的index或label,value则为index或label所对应的分组数据。size函数则是可以返回所有分组的字节大小。count函数可以统计分组后各列数据项个数。get_group函数可以返回指定组的数据信息。而discribe函数可以返回分组后的数据的统计数据。

基于单列的分组。

import pandas as pd

idx = [101,101,101,102,102,102,103,103,103]

name = ["apple","pearl","orange", "apple","pearl","orange","apple","pearl","orange"]

price = [1.0,2.0,3.0,4.00,5.0,6.0,7.0,8.0,9.0]

df0 = pd.DataFrame({ "fruit": name, "price" : price, "supplier" :idx})

print "*" * 30

print df0

dg = df0.groupby("fruit")

print "*" * 30

for n, g in dg:

print "group_name:", n, "\n|",g,"|"

print "*" * 30

for n,_ in dg:

print "group_name:", n, "\n|",dg.get_group(n),"|"

print "*" * 30

print dg.describe()

print "*" * 30

程序执行结果:

******************************

fruit price supplier

0 apple 1 101

1 pearl 2 101

2 orange 3 101

3 apple 4 102

4 pearl 5 102

5 orange 6 102

6 apple 7 103

7 pearl 8 103

8 orange 9 103

******************************

group_name: apple

| fruit price supplier

0 apple 1 101

3 apple 4 102

6 apple 7 103 |

group_name: orange

| fruit price supplier

2 orange 3 101

5 orange 6 102

8 orange 9 103 |

group_name: pearl

| fruit price supplier

1 pearl 2 101

4 pearl 5 102

7 pearl 8 103 |

******************************

group_name: apple

| fruit price supplier

0 apple 1 101

3 apple 4 102

6 apple 7 103 |

group_name: orange

| fruit price supplier

2 orange 3 101

5 orange 6 102

8 orange 9 103 |

group_name: pearl

| fruit price supplier

1 pearl 2 101

4 pearl 5 102

7 pearl 8 103 |

******************************

price supplier

fruit

apple count 3.0 3.0

mean 4.0 102.0

std 3.0 1.0

min 1.0 101.0

25% 2.5 101.5

50% 4.0 102.0

75% 5.5 102.5

max 7.0 103.0

orange count 3.0 3.0

mean 6.0 102.0

std 3.0 1.0

min 3.0 101.0

25% 4.5 101.5

50% 6.0 102.0

75% 7.5 102.5

max 9.0 103.0

pearl count 3.0 3.0

mean 5.0 102.0

std 3.0 1.0

min 2.0 101.0

25% 3.5 101.5

50% 5.0 102.0

75% 6.5 102.5

max 8.0 103.0

******************************

多列数据分组。在groupby函数内给出一个列表,列表里列出多列列名。

import pandas as pd

idx = [101,101,101,102,102,102,103,103,103]

name = ["apple","pearl","orange", "apple","pearl","orange","apple","pearl","orange"]

price = [1.0,2.0,3.0,4.00,5.0,6.0,7.0,8.0,9.0]

df0 = pd.DataFrame({ "fruit": name, "price" : price, "supplier" :idx})

print "*" * 30

print df0

print "*" * 30

dg2 = df0.groupby(["fruit", "supplier"])

for n, g in dg2:

print "multiGroup on:", n, "\n|",g ,"|"

print "*" * 30

程序里基于["fruit", "supplier"]两列进行分组。执行结果如下:

******************************

fruit price supplier

0 apple 1 101

1 pearl 2 101

2 orange 3 101

3 apple 4 102

4 pearl 5 102

5 orange 6 102

6 apple 7 103

7 pearl 8 103

8 orange 9 103

******************************

multiGroup on: ('apple', 101)

| fruit price supplier

0 apple 1 101 |

multiGroup on: ('apple', 102)

| fruit price supplier

3 apple 4 102 |

multiGroup on: ('apple', 103)

| fruit price supplier

6 apple 7 103 |

multiGroup on: ('orange', 101)

| fruit price supplier

2 orange 3 101 |

multiGroup on: ('orange', 102)

| fruit price supplier

5 orange 6 102 |

multiGroup on: ('orange', 103)

| fruit price supplier

8 orange 9 103 |

multiGroup on: ('pearl', 101)

| fruit price supplier

1 pearl 2 101 |

multiGroup on: ('pearl', 102)

| fruit price supplier

4 pearl 5 102 |

multiGroup on: ('pearl', 103)

| fruit price supplier

7 pearl 8 103 |

******************************

使用set_index进行分组。

import pandas as pd

idx = [101,101,101,102,102,102,103,103,103]

name = ["apple","pearl","orange", "apple","pearl","orange","apple","pearl","orange"]

price = [1.0,2.0,3.0,4.00,5.0,6.0,7.0,8.0,9.0]

df0 = pd.DataFrame({ "fruit": name, "price" : price, "supplier" :idx})

print "*" * 30

print df0

print "*" * 30

dg2 = df0.groupby(["fruit", "supplier"])

for n, g in dg2:

print "multiGroup on:", n, "\n|",g ,"|"

print "*" * 30

si = df0.set_index(["fruit", "supplier"])

print si

print "*" * 30

程序的执行结果:

******************************

fruit price supplier

0 apple 1 101

...

8 orange 9 103

******************************

multiGroup on: ('apple', 101)

| fruit price supplier

0 apple 1 101 |

...

multiGroup on: ('pearl', 103)

| fruit price supplier

7 pearl 8 103 |

******************************

price

fruit supplier

apple 101 1

...

orange 103 9

******************************

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值