文章目录
1 Pandas介绍
- 专门用于数据挖掘的开源python库
- 以Numpy为基础,借力Numpy模块在计算方面性能优势
- 基于matplotlib,能够简便的画图
- 独特的数据结构
Pandas 是一种数据处理工具,它是以下面三组词汇组成
panel + data + analysis
panel:面板数据是计量经济学中常用,一般表示三维数据
2 为什么使用Pandas
- 便捷的数据处理能力
- 读取文件方便
- 封装了Matplotlib、Numpy的画图和计算
3 DataFrame
从图中可以看到DataFrame由三部分组成
- column label即列标签
- index label即行标签
- data即数据
那么标记轴axis表示什么呢,axis有两种取值,分别是1和0,其中1表示横轴,方向从左到右;0表示纵轴,方向从上到下。
import numpy as np
#创建一个符合正态分布的10个股票5天的涨跌幅数据
stock_change =np.random.normal(0,1,(10,5))
stock_change
array([[ 0.55780125, 0.47366431, 0.58266456, -0.29946146, -0.03390217],
[-0.24385523, 0.08817049, 1.38707642, -0.57688673, -0.34760394],
[ 0.95549368, 0.9414475 , -1.25056314, 0.18178455, -0.29557978],
[ 1.61507705, 1.99202826, 2.80758189, 0.03192688, -0.57838353],
[ 1.4956878 , -1.23262134, 2.50024192, -0.58850329, 0.7102027 ],
[-0.662319 , 1.76285879, 1.51286286, -0.53192944, -0.47949495],
[ 0.73735599, 0.48964047, -1.32854508, -0.07826431, 0.36766669],
[ 0.79199457, -1.74662017, -0.334844 , -1.47935611, -0.12609656],
[-1.77942406, -1.67284383, -0.90279781, 0.06015451, 0.66952752],
[-1.93908274, -1.93232172, 0.8559445 , 1.13113002, 1.33307564]])
import pandas as pd
pd.DataFrame(stock_change)
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | 0.557801 | 0.473664 | 0.582665 | -0.299461 | -0.033902 |
1 | -0.243855 | 0.088170 | 1.387076 | -0.576887 | -0.347604 |
2 | 0.955494 | 0.941448 | -1.250563 | 0.181785 | -0.295580 |
3 | 1.615077 | 1.992028 | 2.807582 | 0.031927 | -0.578384 |
4 | 1.495688 | -1.232621 | 2.500242 | -0.588503 | 0.710203 |
5 | -0.662319 | 1.762859 | 1.512863 | -0.531929 | -0.479495 |
6 | 0.737356 | 0.489640 | -1.328545 | -0.078264 | 0.367667 |
7 | 0.791995 | -1.746620 | -0.334844 | -1.479356 | -0.126097 |
8 | -1.779424 | -1.672844 | -0.902798 | 0.060155 | 0.669528 |
9 | -1.939083 | -1.932322 | 0.855944 | 1.131130 | 1.333076 |
给股票数据增加行列索引
pd.DataFrame(stock_change,行索引,列索引)
# 添加行索引 列表解析
stock = ["股票{}".format(i) for i in range(10)]
pd.DataFrame(stock_change,stock)
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
股票0 | 0.557801 | 0.473664 | 0.582665 | -0.299461 | -0.033902 |
股票1 | -0.243855 | 0.088170 | 1.387076 | -0.576887 | -0.347604 |
股票2 | 0.955494 | 0.941448 | -1.250563 | 0.181785 | -0.295580 |
股票3 | 1.615077 | 1.992028 | 2.807582 | 0.031927 | -0.578384 |
股票4 | 1.495688 | -1.232621 | 2.500242 | -0.588503 | 0.710203 |
股票5 | -0.662319 | 1.762859 | 1.512863 | -0.531929 | -0.479495 |
股票6 | 0.737356 | 0.489640 | -1.328545 | -0.078264 | 0.367667 |
股票7 | 0.791995 | -1.746620 | -0.334844 | -1.479356 | -0.126097 |
股票8 | -1.779424 | -1.672844 | -0.902798 | 0.060155 | 0.669528 |
股票9 | -1.939083 | -1.932322 | 0.855944 | 1.131130 | 1.333076 |
#添加列索引(先行后列)
date = pd.date_range(start ="20180101",periods=5,freq="B")
data =pd.DataFrame(stock_change,stock,date)
结构:既有行索引,又有列索引的二维数组
2018-01-01 00:00:00 | 2018-01-02 00:00:00 | 2018-01-03 00:00:00 | 2018-01-04 00:00:00 | 2018-01-05 00:00:00 | |
---|---|---|---|---|---|
股票0 | 0.557801 | 0.473664 | 0.582665 | -0.299461 | -0.033902 |
股票1 | -0.243855 | 0.088170 | 1.387076 | -0.576887 | -0.347604 |
股票2 | 0.955494 | 0.941448 | -1.250563 | 0.181785 | -0.295580 |
股票3 | 1.615077 | 1.992028 | 2.807582 | 0.031927 | -0.578384 |
股票4 | 1.495688 | -1.232621 | 2.500242 | -0.588503 | 0.710203 |
股票5 | -0.662319 | 1.762859 | 1.512863 | -0.531929 | -0.479495 |
股票6 | 0.737356 | 0.489640 | -1.328545 | -0.078264 | 0.367667 |
股票7 | 0.791995 | -1.746620 | -0.334844 | -1.479356 | -0.126097 |
股票8 | -1.779424 | -1.672844 | -0.902798 | 0.060155 | 0.669528 |
股票9 | -1.939083 | -1.932322 | 0.855944 | 1.131130 | 1.333076 |
3.1 属性
- shape 形状
- index 行索引
- columns 列索引
- values NDARRAY数据
- T 转置
data.shape
(10, 5)
data.index
Index(['股票0', '股票1', '股票2', '股票3', '股票4', '股票5', '股票6', '股票7', '股票8', '股票9'], dtype='object')
data.values
array([[ 0.55780125, 0.47366431, 0.58266456, -0.29946146, -0.03390217],
[-0.24385523, 0.08817049, 1.38707642, -0.57688673, -0.34760394],
[ 0.95549368, 0.9414475 , -1.25056314, 0.18178455, -0.29557978],
[ 1.61507705, 1.99202826, 2.80758189, 0.03192688, -0.57838353],
[ 1.4956878 , -1.23262134, 2.50024192, -0.58850329, 0.7102027 ],
[-0.662319 , 1.76285879, 1.51286286, -0.53192944, -0.47949495],
[ 0.73735599, 0.48964047, -1.32854508, -0.07826431, 0.36766669],
[ 0.79199457, -1.74662017, -0.334844 , -1.47935611, -0.12609656],
[-1.77942406, -1.67284383, -0.90279781, 0