Python数据分析之Pandas入门（下）

Larissa857

已于 2023-09-13 23:48:23 修改

阅读量82

点赞数

分类专栏： # Python数据分析文章标签： python 数据分析 pandas

于 2023-09-08 14:09:54 首次发布

本文链接：https://blog.csdn.net/m0_68045296/article/details/132758335

版权

Python数据分析专栏收录该内容

8 篇文章 0 订阅

订阅专栏

前言

本系列共有三篇文章，依次按照pandas数据类型及其结构、内置模块对数据处理功能、可视化工具以及形如房价预测的案例分析内容展开介绍。参考自书籍《Python for Data Analysis(Second Edition)》，本篇文章的代码均已测试通过，数据集下载详见【资源】。
📢注意：代码文件应和解压后的数据及文件夹在同一目录下才能相对路径引用到，当然也可使用绝对路径。

八、绘图与可视化

IPython中执行 %matplotlib

import numpy as np
import pandas as pd
PREVIOUS_MAX_ROWS = pd.options.display.max_rows
pd.options.display.max_rows = 20
np.random.seed(12345)
import matplotlib.pyplot as plt
import matplotlib
plt.rc('figure', figsize=(10, 6))
np.set_printoptions(precision=4, suppress=True)

1.图片与子图

fig = plt.figure()
ax3 = fig.add_subplot(2, 2, 3)
# 'k--' 绘制黑色分段线的style选项
plt.plot(np.random.randn(50).cumsum(), 'k--')

fig = plt.figure()
ax1 = fig.add_subplot(2, 2, 1)
ax2 = fig.add_subplot(2, 2, 2)
ax3 = fig.add_subplot(2, 2, 3)

# 绘制直方图 -> hist,设置显示字体大小、颜色、画布分辨率
_ = ax1.hist(np.random.randn(100), bins=20, color='k', alpha=0.3)
# 绘制散点图 -> scatter
ax2.scatter(np.arange(30), np.arange(30) + 3 * np.random.randn(30))
# 在最后一个子图绘制
plt.plot(np.random.randn(50).cumsum(), 'k--') 
plt.close('all')

输出为：

pyplot.subplot 选项

参数	描述
nrows	子图的行数
ncols	子图的列数
sharex	所有子图使用相同的x轴刻度（调整xlim会影响所有子图）
sharey	所有子图使用相同的y轴刻度（调整ylim会影响所有子图）
subplot_kw	传入`add_subplot`的关键字参数字典，用于生成子图
**fig_kw	在生成图片时使用的额外的关键字参数，例如`plt.subplots(2, 2, figsize=(8, 6))`

调整子图周围的间距（Adjusting the spacing around subplots）

subplots(left=None, bottom=None, right=None, top=None, wspace=None, hspace=None)

①颜色、标记与线性类型

from numpy.random import randn
plt.plot(randn(30).cumsum(), color='k', linestyle='dashed', marker='o')

data = np.random.randn(30).cumsum()
plt.plot(data, 'k--', label='Default')
plt.plot(data, 'k-', drawstyle='steps-post', label='steps-post')
plt.legend(loc='best')      # 图例标签显示位置在顶部

输出为：
Colors, Markers, and Line Styles

②刻度、标签和图例

设置标题、轴标签、刻度和刻度标签

# 绘制随机漫步
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.plot(np.random.randn(1000).cumsum())

# 在数据范围内设置标签
ticks = ax.set_xticks([0, 250, 500, 750, 1000])
# 为设置好的标签赋值
labels = ax.set_xticklabels(['one', 'two', 'three', 'four', 'five'],
                            rotation=30, fontsize='small')      # rotation将X轴刻度标签旋转30度
# 设置标题
ax.set_title('My first matplotlib plot')
# 设置x轴名称
ax.set_xlabel('Stages')

添加图例

from numpy.random import randn
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.plot(randn(1000).cumsum(), 'k', label='one')         # 直线
ax.plot(randn(1000).cumsum(), 'k--', label='two')       # 虚线
ax.plot(randn(1000).cumsum(), 'k.', label='three')      # 散点
# 自动生成图例
ax.legend(loc='best')

输出为：
Setting the title, axis labels, ticks, and ticklabels

③注释与子图加工

from datetime import datetime

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

data = pd.read_csv('examples/spx.csv', index_col=0, parse_dates=True)
spx = data['SPX']

spx.plot(ax=ax, style='k-')

crisis_data = [
    (datetime(2007, 10, 11), 'Peak of bull market'),
    (datetime(2008, 3, 12), 'Bear Stearns Fails'),
    (datetime(2008, 9, 15), 'Lehman Bankruptcy')
]

for date, label in crisis_data:
    ax.annotate(label, xy=(date, spx.asof(date) + 75),
                xytext=(date, spx.asof(date) + 225),
                arrowprops=dict(facecolor='black', headwidth=4, width=2,
                                headlength=4),
                horizontalalignment='left', verticalalignment='top')

# Zoom in on 2007-2010
ax.set_xlim(['1/1/2007', '1/1/2011'])
ax.set_ylim([600, 1800])

ax.set_title('Important dates in the 2008-2009 financial crisis')

Annotations and Drawing on a Subplot

④将图片保存到文件（Saving Plots to File）

savefig()参数如下：

参数	描述
fname	包含文件路径或Python文件型对象的字符串。图片格式是从文件扩展名中推断出来的
dpi	每英寸点数的分辨率，默认情况下是100，可以设置
facecolor，edgecolor	子图之外的图形背景颜色，默认情况下是’w’（白色）
format	文件格式（‘png’, ‘jpg’, ‘pdf’, ‘svg’, ‘ps’, ‘eps’…）
bbox_inches	要保存的图片范围，如果传递’tight’，将会去除图片周围空白的部分

# plt.savefig('figpath.png', dpi=400, bbox_inches='tight')

# 不一定写入硬盘，可以将图片写入一切文件型对象中
from io import BytesIO
buffer = BytesIO()
# plt.savefig(buffer)
plot_data = buffer.getvalue()

2.使用pandas和seaborn绘图

①折线图（Line Plots）

Series.plot方法参数

参数	描述
label	图例标签
ax	绘图所用的matplotlib子图对象；如果未传值，则使用当前的matplotlib子图
style	传给matplotlib的样式字符串，比如 `ko--`
alpha	图片不透明度
kind	可以是’area’, ‘bar’, ‘barh’, ‘density’, ‘hist’, ‘kde’, ‘line’
logy	在y轴上使用对数缩放
use_index	使用对象索引刻度标签
rot	刻度标签的旋转（0到360）
xticks	用于x轴刻度的值
yticks	用于y轴刻度的值
xlims	x轴的范围
ylims	y轴的范围
grid	展示轴网络（默认打开）

DataFrame的plot参数

参数	描述
subplots	将DataFrame的每一列绘制在独立的字图中
sharex	如果subplots=True，则共享相同的x轴、刻度和范围
sharey	如果subplots=True，则共享相同的y轴、刻度和范围
figsize	用于生成图片尺寸的元组
title	标题字符串
legend	添加子图图例
sort_columns	按字母顺序绘制各列

s = pd.Series(np.random.randn(10).cumsum(), index=np.arange(0, 100, 10))
s.plot()        # 默认为折线图

df = pd.DataFrame(np.random.randn(10, 4).cumsum(0),
                  columns=['A', 'B', 'C', 'D'],
                  index=np.arange(0, 100, 10))
df.plot()   # 等价于： df.plot.line()

②柱状图

fig, axes = plt.subplots(2, 1)
data = pd.Series(np.random.rand(16), index=list('abcdefghijklmnop'))

# 水平柱状图
data.plot.bar(ax=axes[0], color='k', alpha=0.7)
# 垂直柱状图
data.plot.barh(ax=axes[1], color='k', alpha=0.7)

df = pd.DataFrame(np.random.rand(6, 4),
                  index=['one', 'two', 'three', 'four', 'five', 'six'],
                  columns=pd.Index(['A', 'B', 'C', 'D'], name='Genus'))
df
df.plot.bar()

df.plot.barh(stacked=True, alpha=0.5)

# 统计一天内日参加派对的人数，并根据派对的规模算出日收益

tips = pd.read_csv('examples/tips.csv')
party_counts = pd.crosstab(tips['day'], tips['size'])
party_counts
# Not many 1- and 6-person parties
party_counts = party_counts.loc[:, 2:5]

# Normalize to sum to 1
party_pcts = party_counts.div(party_counts.sum(1), axis=0)
party_pcts
party_pcts.plot.bar()

plt.close('all')

import seaborn as sns
tips['tip_pct'] = tips['tip'] / (tips['total_bill'] - tips['tip'])
print(tips.head(), '\n')
sns.barplot(x='tip_pct', y='day', data=tips, orient='h')

plt.close('all')

# 改变外观
sns.set(style="whitegrid")

4.散点图或点图

macro = pd.read_csv('examples/macrodata.csv')
data = macro[['cpi', 'm1', 'tbilrate', 'unemp']]
trans_data = np.log(data).diff().dropna()
trans_data[-5:]

plt.figure()

# 绘制散点图并拟合出一条渐进回归线
sns.regplot('m1', 'unemp', data=trans_data)
plt.title('Changes in log %s versus log %s' % ('m1', 'unemp'))
# 设置画布参数，绘制回归/散点图
sns.pairplot(trans_data, diag_kind='kde', plot_kws={'alpha': 0.2})

在这里插入图片描述

5.分面网格和分类数据

sns.factorplot()

sns.factorplot(x='day', y='tip_pct', hue='time', col='smoker',
               kind='bar', data=tips[tips.tip_pct < 1])

在这里插入图片描述

6.其他Python可视化工具

Bokeh
Plotly

九、pandas高级应用

摘自《Python for Data Analysis(Second Edition)》
- 详见【资源】- 案例分析.zip

分类数据
Python建模库介绍
数据分析示例

Larissa857

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
1
评论
Python数据分析之Pandas入门（下）

本系列共有三篇文章，依次按照pandas数据类型及其结构、内置模块对数据处理功能、可视化工具以及形如房价预测的案例分析内容展开介绍。参考自书籍，本篇文章的代码均已测试通过，数据集下载详见【资源】。📢注意：代码文件应和解压后的数据及文件夹在同一目录下才能相对路径引用到，当然也可使用绝对路径。
复制链接

扫一扫