pandas_Sample-Beginner

本篇主要是pandas50练习题的进阶部分,也就是23~31题,难度已经逐渐上来了,尤其是从26题开始已经有较多逻辑的融合需要理解,后续更新++++++++
import pandas as pd
import numpy as np

23.有一列整数列A的DatraFrame,删除数值重复的行

df = pd.DataFrame({'A': [1, 2, 2, 3, 4, 5, 5, 5, 6, 7, 7]})
df
A
01
12
22
33
44
55
65
75
86
97
107
df1 = df.loc[df['A'].shift() != df['A']]
df1
A
01
12
33
44
55
86
97
df2 = df.drop_duplicates(subset='A')
df2
A
01
12
33
44
55
86
97

24.一个全数值DatraFrame,每个数字减去该行的平均数

df = pd.DataFrame(np.random.random(size=(5,3)))
df
012
00.8609840.7625100.880017
10.3643290.0221230.755357
20.3248320.9443990.585825
30.9377630.2410150.798840
40.4981580.5843220.634986
df3 = df.sub(df.mean(axis=1),axis=0)
df3
012
00.026480-0.0719930.045513
1-0.016274-0.3584800.374754
2-0.2935200.326047-0.032527
30.278557-0.4181910.139634
4-0.0743300.0118340.062497

25.一个有5列的DataFrame,求哪一列的和最小

df = pd.DataFrame(np.random.random(size=(5,5)),columns=list('abcde'))
print(df)
          a         b         c         d         e
0  0.364215  0.569260  0.625360  0.616201  0.366357
1  0.310033  0.669937  0.403421  0.632327  0.252816
2  0.698762  0.510754  0.946358  0.802647  0.855209
3  0.346315  0.361260  0.675928  0.980343  0.906240
4  0.986167  0.037916  0.235484  0.712136  0.207447
df.sum().idxmin()
#可以自己试一下顺序:df.sum()>>>df.sum().min()>>>df.sum().idxmin()
'b'

26.给定DataFrame,求A列每个值的前3大的B的和

df = pd.DataFrame({'A': list('aaabbcaabcccbbc'), 
                   'B': [12,345,3,1,45,14,4,52,54,23,235,21,57,3,87]})
print(df)
    A    B
0   a   12
1   a  345
2   a    3
3   b    1
4   b   45
5   c   14
6   a    4
7   a   52
8   b   54
9   c   23
10  c  235
11  c   21
12  b   57
13  b    3
14  c   87
df1 = df.groupby('A')['B'].nlargest(3).sum(level=0)
df1
A
a    409
b    156
c    345
Name: B, dtype: int64

27.给定DataFrame,有列A, B,A的值在1-100(含),对A列每10步长,求对应的B的和

df = pd.DataFrame({'A': [1,2,11,11,33,34,35,40,79,99], 
                   'B': [1,2,11,11,33,34,35,40,79,99]})
print(df)
df1 = df.groupby(pd.cut(df['A'], np.arange(0, 101, 10)))['B'].sum()
print(df1)
    A   B
0   1   1
1   2   2
2  11  11
3  11  11
4  33  33
5  34  34
6  35  35
7  40  40
8  79  79
9  99  99
A
(0, 10]        3
(10, 20]      22
(20, 30]       0
(30, 40]     142
(40, 50]       0
(50, 60]       0
(60, 70]       0
(70, 80]      79
(80, 90]       0
(90, 100]     99
Name: B, dtype: int64

28.给定DataFrame,计算每个元素至左边最近的0(或者至开头)的距离,生成新列y

df = pd.DataFrame({'X': [7, 2, 0, 3, 4, 2, 5, 0, 3, 4]})

izero = np.r_[-1, (df['X'] == 0).to_numpy().nonzero()[0]] # 标记0的位置
idx = np.arange(len(df))
df['Y'] = idx - izero[np.searchsorted(izero - 1, idx) - 1]
print(df)

# 方法二
# x = (df['X'] != 0).cumsum()
# y = x != x.shift()
# df['Y'] = y.groupby((y != y.shift()).cumsum()).cumsum()

# 方法三
# df['Y'] = df.groupby((df['X'] == 0).cumsum()).cumcount()
#first_zero_idx = (df['X'] == 0).idxmax()
# df['Y'].iloc[0:first_zero_idx] += 1
   X  Y
0  7  1
1  2  2
2  0  0
3  3  1
4  4  2
5  2  3
6  5  4
7  0  0
8  3  1
9  4  2

29.一个全数值的DataFrame,返回最大3值的坐标

df = pd.DataFrame(np.random.random(size=(5, 3)))
print(df)
          0         1         2
0  0.655695  0.208415  0.770685
1  0.538737  0.653572  0.423824
2  0.324956  0.747991  0.128974
3  0.894755  0.053863  0.039284
4  0.866307  0.405564  0.889443
df.unstack().sort_values()[-3:].index.tolist()
[(0, 4), (2, 4), (0, 3)]

30.给定DataFrame,将负值代替为同组的平均值

df = pd.DataFrame({'grps': list('aaabbcaabcccbbc'), 
                   'vals': [-12,345,3,1,45,14,4,-52,54,23,-235,21,57,3,87]})
print(df)
   grps  vals
0     a   -12
1     a   345
2     a     3
3     b     1
4     b    45
5     c    14
6     a     4
7     a   -52
8     b    54
9     c    23
10    c  -235
11    c    21
12    b    57
13    b     3
14    c    87
def replace(group):
    mask = group<0
    group[mask] = group[~mask].mean()
    return group
df['vals'] = df.groupby(['grps'])['vals'].transform(replace)
print(df)
   grps        vals
0     a  117.333333
1     a  345.000000
2     a    3.000000
3     b    1.000000
4     b   45.000000
5     c   14.000000
6     a    4.000000
7     a  117.333333
8     b   54.000000
9     c   23.000000
10    c   36.250000
11    c   21.000000
12    b   57.000000
13    b    3.000000
14    c   87.000000

31.计算3位滑动窗口的平均值,忽略NAN

df = pd.DataFrame({'group': list('aabbabbbabab'),
                    'value': [1, 2, 3, np.nan, 2, 3, np.nan, 1, 7, 3, np.nan, 8]})
print(df)

g1 = df.groupby(['group'])['value']
g2 = df.fillna(0).groupby(['group'])['value'] 

s = g2.rolling(3, min_periods=1).sum() / g1.rolling(3, min_periods=1).count()

s.reset_index(level=0, drop=True).sort_index() 
   group  value
0      a    1.0
1      a    2.0
2      b    3.0
3      b    NaN
4      a    2.0
5      b    3.0
6      b    NaN
7      b    1.0
8      a    7.0
9      b    3.0
10     a    NaN
11     b    8.0





0     1.000000
1     1.500000
2     3.000000
3     3.000000
4     1.666667
5     3.000000
6     3.000000
7     2.000000
8     3.666667
9     2.000000
10    4.500000
11    4.000000
Name: value, dtype: float64
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值