十分钟学会pandas《10 Minutes to pandas》

最新推荐文章于 2022-03-07 22:13:27 发布

Watch_dou

最新推荐文章于 2022-03-07 22:13:27 发布

阅读量4k

点赞数 25

分类专栏： Python 文章标签： pandas

本文链接：https://blog.csdn.net/u012111465/article/details/77803102

版权

本文基于pandas官方文档《10 Minutes to pandas》，详细介绍了pandas的基础操作，包括Series和DataFrame的创建，数据查看，选择、缺失值处理，统计和操作，数据合并，分组，重塑，时间序列，Categorical类型以及数据的可视化和导入导出。pandas是强大的数据分析工具，提供了丰富的功能用于数据处理和分析。

摘要由CSDN通过智能技术生成

pandas官方网站上的《10 Minutes to pandas》点这里查看，讲解浅显易懂，本文在官网的基础上作了补充。详细的介绍请参考：Cookbook 。pandas是非常强大的数据分析包，pandas 是基于 Numpy 构建的含有更高级数据结构和工具的数据分析包。就好比 Numpy 的核心是 ndarray，pandas 围绕着 Series 和 DataFrame 两个核心数据结构展开。Series 和 DataFrame 分别对应于一维的序列和二维表结构。

1. 创建对象
2. 查看数据
3. 选择
4. 缺失值处理
5. 相关操作
6. 合并
7. 分组
8. Reshaping
9. 时间序列
10. Categorical
11. 可视化
12. 导入导出数据

pandas 约定俗成的导入方法如下：

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: import matplotlib.pyplot as plt

1. 创建对象

这里详细讲讲几类对象：

1.1 Series

Series 可以看做一个定长的有序字典，它是能够保存任何数据类型（整数，字符串，浮点数，Python对象等）的一维标记数组。Series 对象包含两个主要的属性：index 和 values。创建一个Series对象的基本方法是调用：

In [1]: s = pd.Series(data, index=index)

在这里，数据可以是很多不同的东西：

    Python字典
    ndarray
    scalar value标量值（如5）

传递的索引是轴标签列表。因此，根据什么数据分为几种情况：

从ndarray

如果数据是ndarray，则索引的长度必须与数据相同。如果没有传递索引，将创建一个具有值[0，…，len（data） - 1]的索引。

1. dict

In [7]: d = {
  'a' : 0., 'b' : 1., 'c' : 2.}

In [8]: pd.Series(d)
Out[8]: 
a    0.0
b    1.0
c    2.0
dtype: float64

In [9]: pd.Series(d, index=['b', 'c', 'd', 'a'])
Out[9]: 
b    1.0
c    2.0
d    NaN
a    0.0
dtype: float64

2. ndarray

#指定索引

In [5]: s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])

In [6]: s
Out[6]: 
a    0.2941
b    0.2869
c    1.7098
d   -0.2126
e    0.2696
dtype: float64

In [7]: s.index
Out[7]: Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

#不指定索引的默认情况

In [8]: pd.Series(np.random.randn(5))
Out[8]: 
0   -0.4531
1   -1.8215
2   -0.1263
3   -0.1533
4    0.4055
dtype: float64

3. scalar value

In [10]: pd.Series(5., index=['a', 'b', 'c', 'd', 'e'])
Out[10]: 
a    5.0
b    5.0
c    5.0
d    5.0
e    5.0
dtype: float64

4. 引用

In [8]: s = pd.Series(np.random.randn(5), index=['a', 'b', 'c
   ...: ', 'd', 'e'])

In [9]: s[0]
Out[9]: -0.054946649009729349

In [10]: s[:3]
Out[10]:
a   -0.054947
b   -0.662185
c    0.932696
dtype: float64

In [11]: s[s > s.median()]
Out[11]:
a   -0.054947
c    0.932696
dtype: float64

这里写图片描述

名字属性，及其修改
这里写图片描述

1.2 DataFrame

DataFrame是二维标记数据结构。您可以将其视为电子表格或SQL表，或Series对象。它通常是最常用的pandans对象。像Series一样，DataFrame接受许多不同种类的输入：

    Dict of 1D ndarrays, lists, dicts, or Series
    2-D numpy.ndarray
    Structured or record ndarray
    A Series
    Another DataFrame

1. From dict of Series or dicts

这里写图片描述

2. From dict of ndarrays / lists

这里写图片描述

3. From structured or record array

这里写图片描述

4. From a list of dicts

这里写图片描述

5. From a dict of tuples

这里写图片描述

2. 查看数据

1. 查看frame中头部和尾部的行

这里写图片描述

2. 显示索引、列和底层的numpy数据

最低0.47元/天解锁文章

Watch_dou

关注

25
点赞
踩
104

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录