数据处理pandas常用的函数

最新推荐文章于 2024-06-26 22:04:40 发布

baixiaofu

最新推荐文章于 2024-06-26 22:04:40 发布

阅读量2.4k

点赞数 1

分类专栏： Python 数据挖掘

本文链接：https://blog.csdn.net/baixiaofu/article/details/80756479

版权

Python 同时被 2 个专栏收录

8 篇文章 0 订阅

订阅专栏

数据挖掘

8 篇文章 0 订阅

订阅专栏

这一部分主要是数据处理中常用的一些函数

最近突然要处理大量的数据，而且都是基本统计相关的操作，这些都是可以在excel中实现的只是实现起来不能auto，所以使用Python进行操作来实现自动化。先来熟悉一波函数，在实际的操作中使用到如下所示。

函数名/方法/属性	参数	返回值	其他
方法groupby()	groupby(columns,axis= 0, level=None , as_index=True, sort=True, group_keys=True, squeeze=False, observed=False, **kwargs) 由于参数过多，详细请点我,例子点我	Groupby objection
方法或函数map()	map(func, seq1[, seq2,…])	返回的是一个集合	其中func是函数，或者使用lambda表达式。作用在seq1,seq2…的每一个元素上面
lambda表达式	lambda x,y,z,… : 包含x,y,z,…的表达式	返回的是一个函数，参数是x,y,z,…
属性pd.columns	数据框属性，没有参数。可以通过这个属性对列名进行赋值	数据框的列名
方法pd.rename()	rename(columns={‘old_columns’:’new_columns’},inplace=True)	对指定的列重新命名	inplace=True是直接对原表进行修改不会产生副本
方法或函数pd.nlargest()	nlargest(n,columns=,keep=’first’),keep=’first’ or ‘last’ but first is the default	Return the first n rows with the largest values in columns, in descending order. The columns that are not specified are returned as well, but not used for ordering.
方法或函数pd.fillna()	fillna(number,inplace=True)	返回填充后的数据框
方法或函数pd.dropna()	dropna(axis=0)	返回删除包含空值的行	如果axis=1，则会删除包含空值的列，一般不会使用
方法或函数pd.notnull()	可以是方法也可是函数，当时函数的时候pd.notnull(obj)	返回值是bool型数据,整体数据类型和obj一样	可用来作为条件筛选
函数或方法pd.apply()	参数是函数	如果应用的对象是数据框，返回值也是数据框和pd的数据类型保持一致
drop_duplicates()	drop_duplicates(subset=None,keep=’first’,inplace=True); 1.subset:column label or sequence of labels, optional Only consider certain columns for identifying duplicates, by default use all of the columns; 2.keep : {‘first’, ‘last’, False}, default ‘first’.the funcitons of these params we can see as felllow: * ‘first’:Drop duplicates except for the first occurrence. * ‘last’:Drop duplicates except for the last occurrence. * ‘False’:Drop all duplicates.	Return DataFrame
Series.value_counts()，series可以是数据框的一列	series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True) 1.dropna = True,Don’t include counts of NaN. 2.normalize=False,只是统计value的频数，当normalize=True的时候，这个时候统计的是value的频率	Returns object containing counts of unique values.数据类型是series 这些unique values是行索引,索引的顺序可以通过sort来进行排序，default True Sort by values
pd.reset_index()	reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill=”) 1.level=是行的索引的层数，Only remove the given levels from the index. Removes all levels by default 2.drop = False,Do not try to insert index into dataframe columns. This resets the index to the default integer index. 3.col_level = 0 int or str default 0,If the columns have multiple levels, determines which level the labels are inserted into. By default it is inserted into the first level.这一个参数是对列名都多层的时候可以使用 4.col_fill = object, default ‘’;If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated. 例子	DataFrame	这个方法是对数据框多层索引的时候，可以用到。或者分组的时候也可用到
四舍五入pd.round()	round(decimals=0, args, *kwargs) decimals : int, dict, Series 详细的介绍	DataFrame object

baixiaofu

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
数据处理pandas常用的函数

这一部分主要是数据处理中常用的一些函数最近突然要处理大量的数据，而且都是基本统计相关的操作，这些都是可以在excel中实现的只是实现起来不能auto，所以使用Python进行操作来实现自动化。先来熟悉一波函数，在实际的操作中使用到如下所示。函数名/方法/属性参数返回值其他方法groupby() groupby(columns,axis=) G...
复制链接

扫一扫

专栏目录