Pandas学习-CSDN博客

点击以下链接阅读原文

Pandas, Intro to Data Structures
http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dsintro
Pandas中文速查手册, 知乎
https://zhuanlan.zhihu.com/p/25630700

首先模块导入别忘了

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

DataStructure

assign函数

In [69]: iris = pd.read_csv('data/iris.data')

In [70]: iris.head()
Out[70]: 
   SepalLength  SepalWidth  PetalLength  PetalWidth         Name
0          5.1         3.5          1.4         0.2  Iris-setosa
1          4.9         3.0          1.4         0.2  Iris-setosa
2          4.7         3.2          1.3         0.2  Iris-setosa
3          4.6         3.1          1.5         0.2  Iris-setosa
4          5.0         3.6          1.4         0.2  Iris-setosa

In [71]: (iris.assign(sepal_ratio = iris['SepalWidth'] / iris['SepalLength'])
   ....:      .head())
   ....: 
Out[71]: 
   SepalLength  SepalWidth  PetalLength  PetalWidth         Name  sepal_ratio
0          5.1         3.5          1.4         0.2  Iris-setosa       0.6863
1          4.9         3.0          1.4         0.2  Iris-setosa       0.6122
2          4.7         3.2          1.3         0.2  Iris-setosa       0.6809
3          4.6         3.1          1.5         0.2  Iris-setosa       0.6739
4          5.0         3.6          1.4         0.2  Iris-setosa       0.7200

Assign函数使用时优先进行计算

Warning Since the function signature of assign is **kwargs, a dictionary, the order of the new columns in the resulting DataFrame cannot be guaranteed to match the order you pass in.
 To make things predictable, items are inserted alphabetically (by key) at the end of the DataFrame.
All expressions are computed first, and then assigned. So you can’t refer to another column being assigned in the same call to assign. For example:

In [74]: # Don't do this, bad reference to `C`
        df.assign(C = lambda x: x['A'] + x['B'],
                  D = lambda x: x['A'] + x['C'])

In [2]: # Instead, break it into two assigns
        (df.assign(C = lambda x: x['A'] + x['B'])
           .assign(D = lambda x: x['A'] + x['C']))

切片

Operation	Syntax	Result
Select column	df[col]	Series
Select row by label	df.loc[label]	Series
Select row by integer location	df.iloc[loc]	Series
Slice rows	df[5:10]	DataFrame
Select rows by boolean vector	df[bool_vec]	DataFrame

颠倒(矩阵中行列颠倒)

To transpose, access the T attribute (also the transpose function), similar to an ndarray:

# only show the first 5 rows
In [95]: df[:5].T
Out[95]: 
   2000-01-01  2000-01-02  2000-01-03  2000-01-04  2000-01-05
A     -0.0817     -0.5056     -0.0259      0.0492      1.2432
B      1.3905      0.0213      0.8407      0.4879     -0.6222
C     -1.9620     -0.3171      1.4135      0.4263     -0.5386

Pandas支持Numpy中的转换函数,支持矩阵乘法
Pandas支持数据在Console的显示格式设置

三维或更多维数据目前应该用不到了

重点在这里~

等我用到再写
Visualization, 数据可视化

原文请戳
http://pandas.pydata.org/pandas-docs/stable/visualization.html#visualization

基本可视化函数
plot()
If the index consists of dates, it calls gcf().autofmt_xdate()
to try to format the x-axis nicely as per above.

文中的实例函数是逐个增加, 对应的Plot图示是
Numpy的CumSum()函数

>>> a = np.array([[1,2,3], [4,5,6]])
>>> a
array([[1, 2, 3],
       [4, 5, 6]])
>>> np.cumsum(a)
array([ 1,  3,  6, 10, 15, 21])
>>> np.cumsum(a, dtype=float)     # specifies type of output value(s)
array([  1.,   3.,   6.,  10.,  15.,  21.])
>>> np.cumsum(a,axis=0)      # sum over rows for each of the 3 columns
array([[1, 2, 3],
       [5, 7, 9]])
>>> np.cumsum(a,axis=1)      # sum over columns for each of the 2 rows
array([[ 1,  3,  6],
       [ 4,  9, 15]])

# bar柱状图, barh横向柱状图, kde概率分布图, 此外还有散点图等等
# x, y代表x和y轴的标签名, color是图表颜色, r为red, g为green等
# stack可以将图表累加在一起
# cumulative绘制累积图
plot(kind = bar / barh / kde, x = , y= , color = , stack = True / False, cumulative = True/False)