【Function】Pandas库

咸鱼鲸

已于 2024-01-30 18:45:08 修改

阅读量373

点赞数 7

文章标签： pandas python

于 2024-01-27 18:48:52 首次发布

本文链接：https://blog.csdn.net/goldfisheye/article/details/135886652

版权

Pandas(Python) ——1

文章目录

Pandas(Python) ——1
1.read_csv函数
2.describe()函数

1.read_csv函数

encoding 指定字符集类型，通常指定为 ‘utf-8’。根据情况也可能是’ISO-8859-1’，本文中所有的encoding=‘gbk’ ，如果要显示中文，可以用“non-unicode”，可以强行读入。

pd.read_csv(file_path,encoding='gbk')

usecols=[0, 1]，使用某一列数据（第一和第二列）；

# 选择名称等于三的列
pd.read_csv(file_path, encoding='gbk', usecols=lambda x:len(x)==3)

skiprows=[0]，过滤某一行数据（第一行）

pd.read_csv(file_path, encoding='gbk', skiprows=[0])

nrows=5，读取的行数（读取4行数据）

pd.read_csv(file_path, encoding='gbk', nrows=4)

parse_dates=[‘date’]，设定某一行为日期格式
date_parser设置时间格式

from datetime import datetime
pd.read_csv(file_path, encoding='gbk', parse_dates=['发行日'], date_parser=lambda x:datetime.strptime(x, '%Y/%m/%d'))

2.describe()函数

DataFrame.describe(percentiles=None, include=None, exclude=None)
# return:Series or DataFrame.   Summary statistics of the Series or Dataframe provided.

include=‘all’,代表对所有列进行统计，如果不加这个参数，则只对数值列进行统计；
percentiles可以设定数值型特征的统计量，默认[.25, .5, .75],返回25%，50%，75%时候的数据，可修改参数，如：df.describe(percentiles=[.10, .75, .8])；
exclude可以指定不选择哪些列，如：exclude=[‘0’]。

输出结果：
count：数量统计，此列共有多少有效值
unipue：不同的值有多少个
std：标准差
min：最小值
25%：四分之一分位数
50%：二分之一分位数
75%：四分之三分位数
max：最大值
mean：均值

咸鱼鲸

关注

7
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
【Function】Pandas库

percentiles可以设定数值型特征的统计量，默认[.25, .5, .75],返回25%，50%，75%时候的数据，可修改参数，如：df.describe(percentiles=[.10, .75, .8])；include=‘all’,代表对所有列进行统计，如果不加这个参数，则只对数值列进行统计；exclude可以指定不选择哪些列，如：exclude=[‘0’]。count：数量统计，此列共有多少有效值。25%：四分之一分位数。50%：二分之一分位数。75%：四分之三分位数。
复制链接

扫一扫