【利用Python进行数据分析】第12章高阶pandas笔记

IoniaProphet

已于 2023-03-05 19:36:02 修改

阅读量59

点赞数

分类专栏： Python数据分析机器学习文章标签： python pandas 数据分析

于 2023-03-05 16:55:00 首次发布

本文链接：https://blog.csdn.net/m0_68738986/article/details/129345110

版权

Python数据分析机器学习专栏收录该内容

1 篇文章 0 订阅

订阅专栏

维度表示(take)

values = pd.Series([0,1,0,0]*2)
dim = pd.Series(['apple','orange'])
dim.take(values)#values作为index,dim作为对应的val,'0'对应apple,1对应'orange'

Categorical

fruit_cat = df['fruit'].astype('category')#产生categorical实例
#属性
c.categoricals#Index类型,内容为val
c.codes#array型，为categorical的索引
#df的列转为categorical
df['fruit'] = df['fruit'].astype('category')
#直接生成categorical
c = pd.Categorical(['foo','bar'])#可传参categories=['']
#从分类编码数据（2个list）生成Categorical
c=pd.Categorical.from_codes(codes,categories)#ordered = True指定顺序为list中的顺序
c.as_ordered()#指定排序

使用Categorical计算

#分箱
bins = pd.qcut(draws,4)#draws分为4个分位，bins为Categorical
#可附加条件:labels=['Q1','Q2','Q3','Q4']
#汇总统计值
bins = pd.Series(bins,name='quartiles')
result = (pd.Series(draws).groupby(bins))#依据quartile列分组

分类方法

#改变类别
cat_s.cat.set_categories(list)
cat_s.remove_unused_categories()#去除未观察到的类别

Categorical转df(get_dummies)

cat_s = pd.Series(['a','b','c','d']*2,dtype = 'category')
pd.get_dummies(cat_s)

GroupBy

transform

g = df.groupby('key').value
g.mean()#problem:行数由key的数量决定
g.transform(lambda x: x.mean())#产生与df等长的Series
g.transform('mean')
g.transform(lambda x: s.rank(ascending = False))

分组时间重采样

df.set_index('time').resample('5min').count()
#对每个key重采样
time_key = pd.TimeGrouper('5min')
resampled = df.set_index('time').groupoby(['key',time_key])#使用TimeGrouper索引必须为pd或df的time

方法链

assign赋值

df2 = df.assign(k=v)#k为列名，v为值或赋值函数

pipe方法(调用自定义或第三方函数）

#对df调用多个函数
result = df.pipe(f)
     		.pipe(g)
     	    .pipe(h)

IoniaProphet

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【利用Python进行数据分析】第12章高阶pandas笔记

《利用python进行数据分析》的章节笔记
复制链接

扫一扫

专栏目录