Python：Pandas的使用

最新推荐文章于 2024-08-23 09:48:05 发布

尔玉先生

最新推荐文章于 2024-08-23 09:48:05 发布

阅读量221

点赞数

分类专栏： Python 文章标签： python

本文链接：https://blog.csdn.net/weixin_44330955/article/details/107800825

版权

Python 专栏收录该内容

19 篇文章 0 订阅

订阅专栏

1. Pandas介绍

什么是Pandas

数据处理工具
专门用于数据挖掘的开源python库
以Numpy为基础，借力Numpy模块在计算方面性能高的优势
基于matplotlib，能够简便的画图
独特的数据结构

为什么使用Pandas

便捷的数据处理能力
读取文件方便
封装了Matplotlib、Numpy的画图和计算

Pandas的三大核心数据结构

DataFrame： 带行列索引的二维数组
Panel： 存储三维数据的结构
Series： 带索引的一维数组

DataFrame的介绍

结构：DataFrame可以看作既有行索引，又有列索引的二维数组。

属性：shape（形状）、index（行索引）、columns（列索引）、values（原始数据）、T（转置）

方法：head（）默认显式前几行的数据及索引、tail（）默认显式后几行的数据及索引

注意：DataFrame修改索引时只能整体修改，不能单个修改；而且传递DataFrame参数时可以传递字典，则自动生成有索引的数据

import numpy as np
import pandas as pd

# 创建一个符合标准正态分布的10个股票5天的涨跌幅数据
data = np.random.normal(0, 1, (10,5))
# 对data进行添加默认索引
data1 = pd.DataFrame(data)
# 添加行索引
data_index = ["股票{}".format(i + 1) for i in range(10)]
# 添加列索引
data_columns = ["第{}天".format(i + 1) for i in range(5)]
# 对原始数据进行修改
data2 = pd.DataFrame(data, data_index, data_columns)

# 属性shape
print('shape:' + str(data2.shape))
# 属性index
print('index:' + str(data2.index))
# 属性columns
print('columns:' + str(data2.columns))
# 属性values
print('values:' + str(data2.values))
# 转置T
print('T:' + str(data2.T))

# 方法head()，默认输出前5行，也可以在()中填写数字指定行数
print(data2.head())
# 方法tail()，默认输出后5行，也可以在()中填写数字指定行数
print(data2.tail())

shape:(10, 5)
index:Index(['股票1', '股票2', '股票3', '股票4', '股票5', '股票6', '股票7', '股票8', '股票9', '股票10'], dtype='object')
columns:Index(['第1天', '第2天', '第3天', '第4天', '第5天'], dtype='object')
values:[[ 0.36833502 -0.10985847  0.01758469  2.8953737  -0.8990899 ]
 [ 0.82203116  0.73794113 -0.18325558 -1.03071591  0.80655588]
 [-0.55074501  0.62553913  0.5536485   1.26707348 -1.7806026 ]
 [ 0.85206036 -0.2663546  -0.09123825 -0.42519567  0.40458683]
 [ 0.44465066  1.13645238  2.95282522 -0.27825732 -0.97490694]
 [ 0.98894857 -0.09091942  1.98937608  1.4541496   0.81665986]
 [ 1.80891512  1.17705914  1.40490257 -0.25588641 -0.64963894]
 [-1.29888722  0.53645297  1.3262874   1.66382867 -0.17785908]
 [ 0.69582494 -0.60796553 -0.17765606  0.57865517  1.39733349]
 [-0.60679997  1.07496135  0.30156879 -0.47350182  1.14001339]]
T:          股票1       股票2       股票3  ...       股票8       股票9      股票10
第1天  0.368335  0.822031 -0.550745  ... -1.298887  0.695825 -0.606800
第2天 -0.109858  0.737941  0.625539  ...  0.536453 -0.607966  1.074961
第3天  0.017585 -0.183256  0.553649  ...  1.326287 -0.177656  0.301569
第4天  2.895374 -1.030716  1.267073  ...  1.663829  0.578655 -0.473502
第5天 -0.899090  0.806556 -1.780603  ... -0.177859  1.397333  1.140013

[5 rows x 10 columns]
          第1天       第2天       第3天       第4天       第5天
股票1  0.368335 -0.109858  0.017585  2.895374 -0.899090
股票2  0.822031  0.737941 -0.183256 -1.030716  0.806556
股票3 -0.550745  0.625539  0.553649  1.267073 -1.780603
股票4  0.852060 -0.266355 -0.091238 -0.425196  0.404587
股票5  0.444651  1.136452  2.952825 -0.278257 -0.974907
           第1天       第2天       第3天       第4天       第5天
股票6   0.988949 -0.090919  1.989376  1.454150  0.816660
股票7   1.808915  1.177059  1.404903 -0.255886 -0.649639
股票8  -1.298887  0.536453  1.326287  1.663829 -0.177859
股票9   0.695825 -0.607966 -0.177656  0.578655  1.397333
股票10 -0.606800  1.074961  0.301569 -0.473502  1.140013

Series的介绍

用列表创建Series
用字典创建Series
属性index
属性values

import numpy as np
import pandas as pd

# 创建一个Series的数据
data = pd.Series(np.arange(3,9,2),index=["a","b","c"])
print(data)

# 用字典创建Series
data1 = pd.Series({'red':100,'blue':200,'green':500,'yellow':1000})
print(data1)

# Series属性index
print('index:' + str(data.index))
# Series属性values
print('values:' + str(data.values))

a    3
b    5
c    7
dtype: int32
red        100
blue       200
green      500
yellow    1000
dtype: int64
index:Index(['a', 'b', 'c'], dtype='object')
values:[3 5 7]

2. 基本数据操作

2.1 索引操作

有四种索引方法：

直接索引
按名字索引
按数字索引
组合索引：将数字和名字进行组合索引，使用.ix，这里不进行演示

下面来具体看看：

import numpy as np
import pandas as pd

# 创建原始股票数据
data = np.random.normal(0, 1, (10,5))
# 添加行索引
data_index = ["股票{}".format(i + 1) for i in range(10)]
# 添加列索引
data_columns = ["第{}天".format(i + 1) for i in range(5)]
# 对原始数据进行修改
data2 = pd.DataFrame(data, data_index, data_columns)

# 直接索引:必须使用先列后行，使用索引值进行索引
print('直接索引：' + str(data2['第2天']['股票2']))
# 按名字索引：可以使用先行后列，但需使用函数loc
print('名字索引：' + str(data2.loc['股票2']['第2天']))
# 按数字索引:可以使用数字进行索引，但需使用函数iloc
print('数字索引：' + str(data2.iloc[1][1]))

直接索引：0.23412717542123185
名字索引：0.23412717542123185
数字索引：0.23412717542123185

2.2 赋值操作

可以将整行或整列进行赋值操作，也可以单独的数据进行修改。

下面来具体看看

import numpy as np
import pandas as pd

# 创建原始股票数据
data = np.random.normal(0, 1, (10,5))
# 添加行索引
data_index = ["股票{}".format(i + 1) for i in range(10)]
# 添加列索引
data_columns = ["第{}天".format(i + 1) for i in range(5)]
# 对原始数据进行修改
data2 = pd.DataFrame(data, data_index, data_columns)

# 对单个数据进行修改
data2.iloc[1][1]= 100
print('对单个数据进行修改:' + str(data2.iloc[1][1]))

# 对整行数据进行修改
data2['第2天'] = 100
print('对整行数据进行修改:' + str(data2))

对单个数据进行修改:100.0
对单个数据进行修改:           第1天  第2天       第3天       第4天       第5天
股票1   0.740752  100 -1.095628  1.826934  1.102712
股票2   0.653884  100  0.569641  1.879457 -1.125036
股票3  -0.758834  100 -0.431924  1.117955 -0.737071
股票4   1.822570  100  0.322414 -0.660553  0.067220
股票5   0.223287  100 -0.469243  0.261170 -0.100051
股票6   0.289240  100  2.060748  1.292384 -0.120213
股票7  -0.122596  100  0.201011 -0.624944  1.845011
股票8  -1.295979  100  0.909599 -0.374124 -0.379970
股票9   0.847421  100  0.139327  0.798434 -1.631011
股票10 -0.011450  100 -1.293688 -0.979475 -0.114795

2.3 排序操作

排序有两种：按数据进行排序和按照索引进行排序。

下面来具体看看。

import numpy as np
import pandas as pd

# 创建原始股票数据
data = np.random.normal(0, 1, (10,5))
# 添加行索引
data_index = ["股票{}".format(i + 1) for i in range(10)]
# 添加列索引
data_columns = ["第{}天".format(i + 1) for i in range(5)]
# 对原始数据进行修改
data2 = pd.DataFrame(data, data_index, data_columns)

# 按照数据排序
# by代表按照那一列进行排序，可以多列排序
# ascending = False代表从大到小，True代表从小到大
data3 = data2.sort_values(by = ['第1天'], ascending = False)
print('按照数据排序:' + '\n' + str(data3))

按照数据排序:
           第1天       第2天       第3天       第4天       第5天
股票3   0.875219 -0.248079  0.718127  1.161606 -0.612040
股票8   0.765177 -1.491168 -0.111170 -0.214186  0.561835
股票7   0.563078 -1.424150  1.437518 -0.663060 -0.701299
股票6   0.538858 -1.519885 -0.143532 -1.078606  0.545076
股票4   0.527372 -0.880388 -0.630165 -1.243945  2.359285
股票5   0.331653 -1.032393  0.462573 -0.464426 -0.999261
股票9   0.220232  0.682369 -1.048880  0.394690 -1.253890
股票10 -0.092479 -0.187950 -2.105753  0.272641  1.354318
股票1  -0.553340  0.306372  0.646005  0.971399  1.462445
股票2  -0.629727  0.071026 -0.042803 -0.768057 -0.581646