pandas练习（一）

最新推荐文章于 2024-04-17 14:38:25 发布

Sweeney Chen

最新推荐文章于 2024-04-17 14:38:25 发布

阅读量1.8k

点赞数 2

分类专栏： pandas 文章标签： pandas

本文链接：https://blog.csdn.net/weixin_41792682/article/details/90114657

版权

pandas 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

pandas练习（一）

建立一个以 2018 年每一天为索引，值为随机数的 Series

dti = pd.date_range(start='2018-01-01',end='2018-12-31',freq='D')
s = pd.Series(np.random.rand(len(dti)),index=dti)
s

统计`s` 中每一个周三对应值的和

s[s.index.weekday == 2].sum()

统计`s`中每个月值的平均值

s.resample('M').mean()

将 Series 中的时间进行转换（秒转分钟）

s = pd.date_range('today', periods=100, freq='S')
ts = pd.Series(np.random.randint(0, 500, len(s)), index=s)
ts.resample('Min').sum()

UTC 世界时间标准

s = pd.date_range('today', periods=1, freq='D')
ts = pd.Series(np.random.randn(len(s)), s)
ts_utc = ts.tz_localize('UTC')
ts_utc

转换为上海所在时区

ts_utc.tz_convert('Asia/Shanghai')

不同时间表示方式的转换

rng = pd.date_range('1/1/2018', periods=5, freq='M')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
print(ts)
ps = ts.to_period()
print(ps)
ps.to_timestamp()

创建多重索引 Series

letters = ['A', 'B', 'C']
numbers = list(range(10))
mi = pd.MultiIndex.from_product([letters, numbers])
s = pd.Series(np.random.rand(30), index=mi)
s

多重索引 Series 查询

s.loc[:, [1, 3, 6]]

多重索引 Series 切片

s.loc[pd.IndexSlice[:'B', 5:]]

根据多重索引创建 DataFrame

frame = pd.DataFrame(np.arange(12).reshape(6,2),
                     index=[list('AAABBB'),list('123123')], 
                     columns=['hello','python'])
frame

多重索引设置列名称

frame.index.names = ['first', 'second']
frame

DataFrame 多重索引分组求和

frame.groupby('first').sum()

DataFrame 行列名称转换

print(frame)
frame.stack()

DataFrame 索引转换

print(frame)
frame.unstack()

DataFrame 条件查找

data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
        'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
        'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(data, index=labels)
df[df['age'] > 3]

根据行列索引切片

df.iloc[2:4, 1:3]

DataFrame 多重条件查询

df = pd.DataFrame(data, index=labels)
df[(df['animal'] == 'cat') & (df['age'] < 3)]

DataFrame 按关键字查询

df[df['animal'].isin(['cat', 'dog'])]

DataFrame 按标签及列名查询

df.loc[df2.index[[3, 4, 8]], ['animal', 'age']]

DataFrame 多条件排序

df.sort_values(by=['age', 'visits'], ascending=[False, True])

DataFrame 多值替换

df['priority'].map({'yes':True, 'no':False})

DataFrame 分组求和

df.groupby('animal').sum()

使用列表拼接多个 DataFrame

temp_df1 = pd.DataFrame(np.random.randn(5, 4))  # 生成由随机数组成的 DataFrame 1
temp_df2 = pd.DataFrame(np.random.randn(5, 4))  # 生成由随机数组成的 DataFrame 2
temp_df3 = pd.DataFrame(np.random.randn(5, 4))  # 生成由随机数组成的 DataFrame 3

print(temp_df1)
print(temp_df2)
print(temp_df3)

pieces = [temp_df1, temp_df2, temp_df3]
pd.concat(pieces)

找出 DataFrame 表中和最小的列

df = pd.DataFrame(np.random.random(size=(5, 10)), 
                  columns=list('abcdefghij'))
print(df)
df.sum().idxmin()  # idxmax(), idxmin() 为 Series 函数返回最大最小值的索引值

DataFrame 中每个元素减去每一行的平均值

pd.DataFrame(np.random.random(size=(5,3)))
print(df)
df.sub(df.mean(axis=1), axis=0)

DataFrame 分组，并得到每一组中最大三个数之和

df = pd.DataFrame({'A': list('aaabbcaabcccbbc'),
                   'B': [12, 345, 3, 1, 45, 14, 4, 52, 54, 23, 235, 21, 57, 3, 87]})
print(df)
df.groupby('A')['B'].nlargest(3).sum(level=0)

Sweeney Chen

关注

2
点赞
踩
12

收藏

觉得还不错? 一键收藏
打赏
0
评论
pandas练习（一）

pandas练习（一）建立一个以 2018 年每一天为索引，值为随机数的 Seriesdti = pd.date_range(start='2018-01-01',end='2018-12-31',freq='D')s = pd.Series(np.random.rand(len(dti)),index=dti)s统计s 中每一个周三对应值的和s[s.index.weekday ==...
复制链接

扫一扫