Python数据分析学习笔记-使用Matplotlib入门

一、Matplotlib简介

1. Matplotlib特点

专门用于开发2D图表,包括3D图表:
使用起来简单、方便;
以渐进、交互式方式实现数据可视化。

2. 为什么要学习Matplotlib

数据可视化是结果呈现的重要方式,清晰地观察数据,可为后续进一步分析处理提供帮助:Matplotlib将数据可视化,可以更直观地呈现。
先简单地看一下效果:

# 导入模块
import matplotlib.pyplot as plt
plt.figure(figsize=(10,8),dpi=80)
plt.plot([1,2,3],[4,5,6])
plt.show()

在这里插入图片描述

3. Matplotlib框架构成

Matplotlib的框架分为3层,3层构成了一个栈,上层可以调用下层。
Scripting(脚本)-> Artist(美工)-> Backend(后端)
后端层:Matplotlib的底层,用以实现大量的抽象接口类;
美工层:图形中所有能看到的元素都属于Artist对象,即标题、轴标签、刻度等:
• Figure:指整个图形,包括所有的元素;
• Axes(坐标系):数据的绘图区域;
• Axis(坐标轴):坐标系中的一条轴,包含大小限制、刻度和刻度标签;
• 一个figure可以包含多个坐标系axes,但一个坐标系只能属于一个figure;
• 一个axes可以包含多个坐标轴axis,有两个axis即为2d坐标系,三个为3d坐标系。
脚本层:主要用于可视化编程,pyplot模块给用户提供了API接口,通过pyplot模块的函数从而操作整个程序包以绘制所需的图形。

二、Matplotlib绘图

matplotlib.pyplot包含了一系列类似于MATLAB的画图函数。它的函数作用于当前图形(figure)的当前坐标系(axes)。
matplotlib能够绘制折线图,散点图,条形图,直方图,饼图等等。
具体可参考matplotlib官方文档:https://matplotlib.org/stable/gallery/index.html

1. 绘制折线图

折线图是以折线的上升下降来表示统计数量的增减变化的统计图。
特点:能够显示数据的变化趋势,反映事物的变化情况。
折线图可以通过plot()函数来绘制:
plt.plot(x, y) # 使用默认的线样式及颜色绘制x,y构建的图形

实例1:绘制上海一周以来的单日最高气温,代码如下:

# 导入模块
import matplotlib.pyplot as plt

# 生成图片
# figsize:指定图的长宽, dpi:图像的清晰度
plt.figure(figsize=(10,5),dpi=100)
# 创建数据
week = [1,2,3,4,5,6,7]
temp = [20,23,17,19,21,20,25]
# 绘制折线图
plt.plot(week, temp)
# 展示图片
plt.show()
# 保存图片到本地
plt.savefig("test.png")

在这里插入图片描述
以上实例,x数组对应图形x轴的值,y数组对应图形y轴的值,并且通过plt.plot()绘制之后,通过plt.show()展示图片,释放内存。
可以看到,上图的展示有点单调,缺少许多元素,在后续的例子中一点点补足。

2. plt.plot()方法

实际上,plt.plot()方法十分复杂,常用的参数有线型、数据标签、数据点的形状等。
我们不需要对它死记硬背,可以通过在jupyter中使用plt.plot??命令查看其常用参数的使用方法:

Signature: plt.plot(*args, scalex=True, scaley=True, data=None, **kwargs)
Docstring:
Plot y versus x as lines and/or markers.

Call signatures::

    plot([x], y, [fmt], *, data=None, **kwargs)
    plot([x], y, [fmt], [x2], y2, [fmt2], ..., **kwargs)

The coordinates of the points or line nodes are given by *x*, *y*.

The optional parameter *fmt* is a convenient way for defining basic
formatting like color, marker and linestyle. It's a shortcut string
notation described in the *Notes* section below.

>>> plot(x, y)        # plot x and y using default line style and color
>>> plot(x, y, 'bo')  # plot x and y using blue circle markers
>>> plot(y)           # plot y using x as index array 0..N-1
>>> plot(y, 'r+')     # ditto, but with red plusses

You can use `.Line2D` properties as keyword arguments for more
control on the appearance. Line properties and *fmt* can be mixed.
The following two calls yield identical results:

>>> plot(x, y, 'go--', linewidth=2, markersize=12)
>>> plot(x, y, color='green', marker='o', linestyle='dashed',
...      linewidth=2, markersize=12)

When conflicting with *fmt*, keyword arguments take precedence.


**Plotting labelled data**

There's a convenient way for plotting objects with labelled data (i.e.
data that can be accessed by index ``obj['y']``). Instead of giving
the data in *x* and *y*, you can provide the object in the *data*
parameter and just give the labels for *x* and *y*::

>>> plot('xlabel', 'ylabel', data=obj)

All indexable objects are supported. This could e.g. be a `dict`, a
`pandas.DataFrame` or a structured numpy array.


**Plotting multiple sets of data**

There are various ways to plot multiple sets of data.

- The most straight forward way is just to call `plot` multiple times.
  Example:

  >>> plot(x1, y1, 'bo')
  >>> plot(x2, y2, 'go')

- Alternatively, if your data is already a 2d array, you can pass it
  directly to *x*, *y*. A separate data set will be drawn for every
  column.

  Example: an array ``a`` where the first column represents the *x*
  values and the other columns are the *y* columns::

  >>> plot(a[0], a[1:])

- The third way is to specify multiple sets of *[x]*, *y*, *[fmt]*
  groups::

  >>> plot(x1, y1, 'g^', x2, y2, 'g-')

  In this case, any additional keyword argument applies to all
  datasets. Also this syntax cannot be combined with the *data*
  parameter.

By default, each line is assigned a different style specified by a
'style cycle'. The *fmt* and line property parameters are only
necessary if you want explicit deviations from these defaults.
Alternatively, you can also change the style cycle using
:rc:`axes.prop_cycle`.


Parameters
----------
x, y : array-like or scalar
    The horizontal / vertical coordinates of the data points.
    *x* values are optional and default to ``range(len(y))``.

    Commonly, these parameters are 1D arrays.

    They can also be scalars, or two-dimensional (in that case, the
    columns represent separate data sets).

    These arguments cannot be passed as keywords.

fmt : str, optional
    A format string, e.g. 'ro' for red circles. See the *Notes*
    section for a full description of the format strings.

    Format strings are just an abbreviation for quickly setting
    basic line properties. All of these and more can also be
    controlled by keyword arguments.

    This argument cannot be passed as keyword.

data : indexable object, optional
    An object with labelled data. If given, provide the label names to
    plot in *x* and *y*.

    .. note::
        Technically there's a slight ambiguity in calls where the
        second label is a valid *fmt*. ``plot('n', 'o', data=obj)``
        could be ``plt(x, y)`` or ``plt(y, fmt)``. In such cases,
        the former interpretation is chosen, but a warning is issued.
        You may suppress the warning by adding an empty format string
        ``plot('n', 'o', '', data=obj)``.

Returns
-------
list of `.Line2D`
    A list of lines representing the plotted data.

Other Parameters
----------------
scalex, scaley : bool, default: True
    These parameters determine if the view limits are adapted to the
    data limits. The values are passed on to `autoscale_view`.

**kwargs : `.Line2D` properties, optional
    *kwargs* are used to specify properties like a line label (for
    auto legends), linewidth, antialiasing, marker face color.
    Example::

    >>> plot([1, 2, 3], [1, 2, 3], 'go-', label='line 1', linewidth=2)
    >>> plot([1, 2, 3], [1, 4, 9], 'rs', label='line 2')

    If you make multiple lines with one plot call, the kwargs
    apply to all those lines.

    Here is a list of available `.Line2D` properties:

    Properties:
    agg_filter: a filter function, which takes a (m, n, 3) float array and a dpi value, and returns a (m, n, 3) array
    alpha: float or None
    animated: bool
    antialiased or aa: bool
    clip_box: `.Bbox`
    clip_on: bool
    clip_path: Patch or (Path, Transform) or None
    color or c: color
    contains: unknown
    dash_capstyle: {'butt', 'round', 'projecting'}
    dash_joinstyle: {'miter', 'round', 'bevel'}
    dashes: sequence of floats (on/off ink in points) or (None, None)
    data: (2, N) array or two 1D arrays
    drawstyle or ds: {'default', 'steps', 'steps-pre', 'steps-mid', 'steps-post'}, default: 'default'
    figure: `.Figure`
    fillstyle: {'full', 'left', 'right', 'bottom', 'top', 'none'}
    gid: str
    in_layout: bool
    label: object
    linestyle or ls: {'-', '--', '-.', ':', '', (offset, on-off-seq), ...}
    linewidth or lw: float
    marker: marker style string, `~.path.Path` or `~.markers.MarkerStyle`
    markeredgecolor or mec: color
    markeredgewidth or mew: float
    markerfacecolor or mfc: color
    markerfacecoloralt or mfcalt: color
    markersize or ms: float
    markevery: None or int or (int, int) or slice or List[int] or float or (float, float) or List[bool]
    path_effects: `.AbstractPathEffect`
    picker: unknown
    pickradius: float
    rasterized: bool or None
    sketch_params: (scale: float, length: float, randomness: float)
    snap: bool or None
    solid_capstyle: {'butt', 'round', 'projecting'}
    solid_joinstyle: {'miter', 'round', 'bevel'}
    transform: `matplotlib.transforms.Transform`
    url: str
    visible: bool
    xdata: 1D array
    ydata: 1D array
    zorder: float

See Also
--------
scatter : XY scatter plot with markers of varying size and/or color (
    sometimes also called bubble chart).

Notes
-----
**Format Strings**

A format string consists of a part for color, marker and line::

    fmt = '[marker][line][color]'

Each of them is optional. If not provided, the value from the style
cycle is used. Exception: If ``line`` is given, but no ``marker``,
the data will be a line without markers.

Other combinations such as ``[color][marker][line]`` are also
supported, but note that their parsing may be ambiguous.

**Markers**

=============    ===============================
character        description
=============    ===============================
``'.'``          point marker
``','``          pixel marker
``'o'``          circle marker
``'v'``          triangle_down marker
``'^'``          triangle_up marker
``'<'``          triangle_left marker
``'>'``          triangle_right marker
``'1'``          tri_down marker
``'2'``          tri_up marker
``'3'``          tri_left marker
``'4'``          tri_right marker
``'s'``          square marker
``'p'``          pentagon marker
``'*'``          star marker
``'h'``          hexagon1 marker
``'H'``          hexagon2 marker
``'+'``          plus marker
``'x'``          x marker
``'D'``          diamond marker
``'d'``          thin_diamond marker
``'|'``          vline marker
``'_'``          hline marker
=============    ===============================

**Line Styles**

=============    ===============================
character        description
=============    ===============================
``'-'``          solid line style
``'--'``         dashed line style
``'-.'``         dash-dot line style
``':'``          dotted line style
=============    ===============================

Example format strings::

    'b'    # blue markers with default shape
    'or'   # red circles
    '-g'   # green solid line
    '--'   # dashed line with default color
    '^k:'  # black triangle_up markers connected by a dotted line

**Colors**

The supported color abbreviations are the single letter codes

=============    ===============================
character        color
=============    ===============================
``'b'``          blue
``'g'``          green
``'r'``          red
``'c'``          cyan
``'m'``          magenta
``'y'``          yellow
``'k'``          black
``'w'``          white
=============    ===============================

and the ``'CN'`` colors that index into the default property cycle.

If the color is the only part of the format string, you can
additionally use any  `matplotlib.colors` spec, e.g. full names
(``'green'``) or hex strings (``'#008000'``).
Source:   
@_copy_docstring_and_deprecators(Axes.plot)
def plot(*args, scalex=True, scaley=True, data=None, **kwargs):
    return gca().plot(
        *args, scalex=scalex, scaley=scaley,
        **({"data": data} if data is not None else {}), **kwargs)
File:      d:\anaconda3\lib\site-packages\matplotlib\pyplot.py
Type:      function

3. 折线图补充

实例2:绘制某一天上海中午11点12点每分钟的温度变化图,气温在15-18度内随机变化,代码如下:

import matplotlib.pyplot as plt
import random
# 设置字体,解决中文显示的问题
plt.rcParams['font.sans-serif']=['SimHei'] # 替换sans_serif字体
plt.rcParams['axes.unicode_minus']=False # 解决坐标轴负数的负号显示问题

plt.figure(figsize=(10,5),dpi=100)
# 创建x轴数据,画出一个11点到12点小时内,气温的变化,范围在[15,18]
x_time = range(60)

# y的数据需与x一一对应
sh_temp = [random.uniform(15,20) for i in x_time]
# y轴刻度从5开始每隔5度直到20
plt.yticks([i*5 for i in range(1,5)])

# 画温度折线图
plt.plot(x_time, sh_temp, color='g', linestyle='--', label='上海')

# 添加标题
plt.title('国内某些城市中午11点到12点的气温变化图')
plt.xlabel('时间')
plt.ylabel('温度')

# 添加图形注释
plt.legend(loc='best')

plt.show()

在这里插入图片描述
注意点:
自定义x,y刻度plt.xticks(x, **kwargs)/plt.yticks(y, **kwargs):参数表示要显示的刻度值;
中文显示问题:上述代码采用的替换默认字体的方法,也是个人觉得最简单的方法。

有关绘图中常用的一些函数,列在如下表格中:

方法描述
plt.figure(figsize=None,dpi=None)生成新的图片,figsize:图片大小,dpi:透明度
plt.savefig(fname)保存图片
plt.xticks(ticks=None)设置x轴刻度的值
plt.yticks(ticks=None)设置y轴刻度的值
plt.xlabel(xlabel)设置x轴标签
plt.ylabel(ylabel)设置y轴标签
plt.title()设置图标题
plt.grid()根据x轴和y轴的数值展示轴网格

plt.annotate() 基本使用:
• text:是注释的文本
• xy:是需要注释的点的坐标
• xytext:是注释文本的坐标
• arrowprops:是箭头的样式属性

补充:给实例1中的数据添加标题、x、y轴标签以及文本注释。

import matplotlib.pyplot as plt
# 设置字体,解决中文显示的问题
plt.rcParams['font.sans-serif']=['SimHei'] # 替换sans_serif字体
plt.rcParams['axes.unicode_minus']=False # 解决坐标轴负数的负号显示问题

# 生成图片
plt.figure(figsize=(10,5),dpi=100)
# 创建数据
week = ['一','二','三','四','五','六','日']
temp = [20,23,17,19,21,20,25]
# 绘制折线图
plt.plot(week, temp)

# 修改x轴刻度
weekday = ['星期{}'.format(i) for i in week]
plt.xticks(week, weekday, rotation=45)

# 给坐标点添加文本注释
# list,zip操作把两个列表的数据组合成一一对应的新列表    
for index,(x_i, y_i) in enumerate(list(zip(week,temp))):
    plt.annotate(f"{x_i, y_i}", xy=(x_i, y_i), xytext=(index-0.25, y_i))
        
# 添加标题、x,y轴标签
plt.title('上海一周内最高气温变化图')
plt.xlabel('时间')
plt.ylabel('温度')

# 展示图片
plt.show()

在这里插入图片描述


总结

本文主要讲述了Matplotlib入门的简单概念,折线图的绘制方法、以及画图中常用的函数及其对应参数的使用方法。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值