数据可视化由浅入深

最新推荐文章于 2023-12-28 16:15:47 发布

ERCO123

最新推荐文章于 2023-12-28 16:15:47 发布

阅读量578

点赞数 1

文章标签：可视化 python 数据可视化数据分析

本文链接：https://blog.csdn.net/KaelCui/article/details/105868592

版权

数据可视化

参考

一、matplotlib、seaborn介绍

1.1 matplotlib

1.1.1 基本介绍

官方文档

1.1.2导入惯例

import matplotlib.pyplot as plt

1.1.3 pylop 与 pylab

matplotlib.pyplot是使Matplotlib像MATLAB一样工作的命令样式函数的集合。每个pyplot函数都会对图形进行一些更改：例如，创建图形，在图形中创建绘图区域，在绘图区域中绘制一些线，用标签装饰绘图等
pylab是一个模块，其包括matplotlib.pyplot，numpy 和单个名称空间内的一些附加功能。它的最初目的是通过将所有函数导入全局名称空间来模仿类似于MATLAB的工作方式
由于大量导入全局名称空间可能会导致意外行为，因此强烈建议不要使用pylab。使用matplotlib.pyplot 代替

import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline 
#Jupyter Notebook / IPython独有,绘图下方就能自动显示代码块，无需输入plt.show()来创建新图

1.2 seaborn

Seaborn是基于matplotlib的Python数据可视化库。它提供了用于绘制引人入胜且内容丰富的统计图形的高级界面
Seaborn是把matplotlib的部分功能根据常用组合进行封装，使初学者也能绘制出较为实用的图
难以实现特定需求的定制化图
初学可视化的同学建议以seaborn入手，可以满足大部分需求

官方教程

导入惯例

import seaborn as sns

二、基础绘图

from sklearn.datasets import load_boston #导入数据集：波士顿房价预测数据集

data = load_boston()
data

{'data': array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,
         4.9800e+00],
        [2.7310e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9690e+02,
         9.1400e+00],
        [2.7290e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9283e+02,
         4.0300e+00],
        ...,
        [6.0760e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
         5.6400e+00],
        [1.0959e-01, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9345e+02,
         6.4800e+00],
        [4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
         7.8800e+00]]),
 'target': array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
        18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17.5, 20.2, 18.2, 13.6, 19.6,
        15.2, 14.5, 15.6, 13.9, 16.6, 14.8, 18.4, 21. , 12.7, 14.5, 13.2,
        13.1, 13.5, 18.9, 20. , 21. , 24.7, 30.8, 34.9, 26.6, 25.3, 24.7,
        21.2, 19.3, 20. , 16.6, 14.4, 19.4, 19.7, 20.5, 25. , 23.4, 18.9,
        35.4, 24.7, 31.6, 23.3, 19.6, 18.7, 16. , 22.2, 25. , 33. , 23.5,
        19.4, 22. , 17.4, 20.9, 24.2, 21.7, 22.8, 23.4, 24.1, 21.4, 20. ,
        20.8, 21.2, 20.3, 28. , 23.9, 24.8, 22.9, 23.9, 26.6, 22.5, 22.2,
        23.6, 28.7, 22.6, 22. , 22.9, 25. , 20.6, 28.4, 21.4, 38.7, 43.8,
        33.2, 27.5, 26.5, 18.6, 19.3, 20.1, 19.5, 19.5, 20.4, 19.8, 19.4,
        21.7, 22.8, 18.8, 18.7, 18.5, 18.3, 21.2, 19.2, 20.4, 19.3, 22. ,
        20.3, 20.5, 17.3, 18.8, 21.4, 15.7, 16.2, 18. , 14.3, 19.2, 19.6,
        23. , 18.4, 15.6, 18.1, 17.4, 17.1, 13.3, 17.8, 14. , 14.4, 13.4,
        15.6, 11.8, 13.8, 15.6, 14.6, 17.8, 15.4, 21.5, 19.6, 15.3, 19.4,
        17. , 15.6, 13.1, 41.3, 24.3, 23.3, 27. , 50. , 50. , 50. , 22.7,
        25. , 50. , 23.8, 23.8, 22.3, 17.4, 19.1, 23.1, 23.6, 22.6, 29.4,
        23.2, 24.6, 29.9, 37.2, 39.8, 36.2, 37.9, 32.5, 26.4, 29.6, 50. ,
        32. , 29.8, 34.9, 37. , 30.5, 36.4, 31.1, 29.1, 50. , 33.3, 30.3,
        34.6, 34.9, 32.9, 24.1, 42.3, 48.5, 50. , 22.6, 24.4, 22.5, 24.4,
        20. , 21.7, 19.3, 22.4, 28.1, 23.7, 25. , 23.3, 28.7, 21.5, 23. ,
        26.7, 21.7, 27.5, 30.1, 44.8, 50. , 37.6, 31.6, 46.7, 31.5, 24.3,
        31.7, 41.7, 48.3, 29. , 24. , 25.1, 31.5, 23.7, 23.3, 22. , 20.1,
        22.2, 23.7, 17.6, 18.5, 24.3, 20.5, 24.5, 26.2, 24.4, 24.8, 29.6,
        42.8, 21.9, 20.9, 44. , 50. , 36. , 30.1, 33.8, 43.1, 48.8, 31. ,
        36.5, 22.8, 30.7, 50. , 43.5, 20.7, 21.1, 25.2, 24.4, 35.2, 32.4,
        32. , 33.2, 33.1, 29.1, 35.1, 45.4, 35.4, 46. , 50. , 32.2, 22. ,
        20.1, 23.2, 22.3, 24.8, 28.5, 37.3, 27.9, 23.9, 21.7, 28.6, 27.1,
        20.3, 22.5, 29. , 24.8, 22. , 26.4, 33.1, 36.1, 28.4, 33.4, 28.2,
        22.8, 20.3, 16.1, 22.1, 19.4, 21.6, 23.8, 16.2, 17.8, 19.8, 23.1,
        21. , 23.8, 23.1, 20.4, 18.5, 25. , 24.6, 23. , 22.2, 19.3, 22.6,
        19.8, 17.1, 19.4, 22.2, 20.7, 21.1, 19.5, 18.5, 20.6, 19. , 18.7,
        32.7, 16.5, 23.9, 31.2, 17.5, 17.2, 23.1, 24.5, 26.6, 22.9, 24.1,
        18.6, 30.1, 18.2, 20.6, 17.8, 21.7, 22.7, 22.6, 25. , 19.9, 20.8,
        16.8, 21.9, 27.5, 21.9, 23.1, 50. , 50. , 50. , 50. , 50. , 13.8,
        13.8, 15. , 13.9, 13.3, 13.1, 10.2, 10.4, 10.9, 11.3, 12.3,  8.8,
         7.2, 10.5,  7.4, 10.2, 11.5, 15.1, 23.2,  9.7, 13.8, 12.7, 13.1,
        12.5,  8.5,  5. ,  6.3,  5.6,  7.2, 12.1,  8.3,  8.5,  5. , 11.9,
        27.9, 17.2, 27.5, 15. , 17.2, 17.9, 16.3,  7. ,  7.2,  7.5, 10.4,
         8.8,  8.4, 16.7, 14.2, 20.8, 13.4, 11.7,  8.3, 10.2, 10.9, 11. ,
         9.5, 14.5, 14.1, 16.1, 14.3, 11.7, 13.4,  9.6,  8.7,  8.4, 12.8,
        10.5, 17.1, 18.4, 15.4, 10.8, 11.8, 14.9, 12.6, 14.1, 13. , 13.4,
        15.2, 16.1, 17.8, 14.9, 14.1, 12.7, 13.5, 14.9, 20. , 16.4, 17.7,
        19.5, 20.2, 21.4, 19.9, 19. , 19.1, 19.1, 20.1, 19.9, 19.6, 23.2,
        29.8, 13.8, 13.3, 16.7, 12. , 14.6, 21.4, 23. , 23.7, 25. , 21.8,
        20.6, 21.2, 19.1, 20.6, 15.2,  7. ,  8.1, 13.6, 20.1, 21.8, 24.5,
        23.1, 19.7, 18.3, 21.2, 17.5, 16.8, 22.4, 20.6, 23.9, 22. , 11.9]),
 'feature_names': array(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD',
        'TAX', 'PTRATIO', 'B', 'LSTAT'], dtype='<U7'),
 'DESCR': ".. _boston_dataset:\n\nBoston house prices dataset\n---------------------------\n\n**Data Set Characteristics:**  \n\n    :Number of Instances: 506 \n\n    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.\n\n    :Attribute Information (in order):\n        - CRIM     per capita crime rate by town\n        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.\n        - INDUS    proportion of non-retail business acres per town\n        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)\n        - NOX      nitric oxides concentration (parts per 10 million)\n        - RM       average number of rooms per dwelling\n        - AGE      proportion of owner-occupied units built prior to 1940\n        - DIS      weighted distances to five Boston employment centres\n        - RAD      index of accessibility to radial highways\n        - TAX      full-value property-tax rate per $10,000\n        - PTRATIO  pupil-teacher ratio by town\n        - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town\n        - LSTAT    % lower status of the population\n        - MEDV     Median value of owner-occupied homes in $1000's\n\n    :Missing Attribute Values: None\n\n    :Creator: Harrison, D. and Rubinfeld, D.L.\n\nThis is a copy of UCI ML housing dataset.\nhttps://archive.ics.uci.edu/ml/machine-learning-databases/housing/\n\n\nThis dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.\n\nThe Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic\nprices and the demand for clean air', J. Environ. Economics & Management,\nvol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics\n...', Wiley, 1980.   N.B. Various transformations are used in the table on\npages 244-261 of the latter.\n\nThe Boston house-price data has been used in many machine learning papers that address regression\nproblems.   \n     \n.. topic:: References\n\n   - Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.\n   - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.\n",
 'filename': 'C:\\Users\\CK\\anaconda3\\lib\\site-packages\\sklearn\\datasets\\data\\boston_house_prices.csv'}

import pandas as pd

x_df = pd.DataFrame(data['data'],columns=data['feature_names'])
x_df.head()

	CRIM	ZN	INDUS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT
0	0.00632	18.0	2.31	0.538	6.575	65.2	4.0900	1.0	296.0	15.3	396.90	4.98
1	0.02731	0.0	7.07	0.469	6.421	78.9	4.9671	2.0	242.0	17.8	396.90	9.14
2	0.02729	0.0	7.07	0.469	7.185	61.1	4.9671	2.0	242.0	17.8	392.83	4.03
3	0.03237	0.0	2.18	0.458	6.998	45.8	6.0622	3.0	222.0	18.7	394.63	2.94
4	0.06905	0.0	2.18	0.458	7.147	54.2	6.0622	3.0	222.0	18.7	396.90	5.33

x_df.T.head()

	0	1	2	3	4	5	6	7	8	9	...	496	497	498	499	500	501	502	503	504	505
CRIM	0.00632	0.02731	0.02729	0.03237	0.06905	0.02985	0.08829	0.14455	0.21124	0.17004	...	0.2896	0.26838	0.23912	0.17783	0.22438	0.06263	0.04527	0.06076	0.10959	0.04741
ZN	18.00000	0.00000	0.00000	0.00000	0.00000	0.00000	12.50000	12.50000	12.50000	12.50000	...	0.0000	0.00000	0.00000	0.00000	0.00000	0.00000	0.00000	0.00000	0.00000	0.00000
INDUS	2.31000	7.07000	7.07000	2.18000	2.18000	2.18000	7.87000	7.87000	7.87000	7.87000	...	9.6900	9.69000	9.69000	9.69000	9.69000	11.93000	11.93000	11.93000	11.93000	11.93000
CHAS	0.00000	0.00000	0.00000	0.00000	0.00000	0.00000	0.00000	0.00000	0.00000	0.00000	...	0.0000	0.00000	0.00000	0.00000	0.00000	0.00000	0.00000	0.00000	0.00000	0.00000
NOX	0.53800	0.46900	0.46900	0.45800	0.45800	0.45800	0.52400	0.52400	0.52400	0.52400	...	0.5850	0.58500	0.58500	0.58500	0.58500	0.57300	0.57300	0.57300	0.57300	0.57300

5 rows × 506 columns

y_df = pd.DataFrame(data['target'])
y_df.head()

	0
0	24.0
1	21.6
2	34.7
3	33.4
4	36.2

2.1 图表的基本元素

%matplotlib inline 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import seaborn as sns

9.8*500

4900.0

plt.plot(x_df)

[<matplotlib.lines.Line2D at 0x1e844f9f2c8>,
 <matplotlib.lines.Line2D at 0x1e844f9f448>,
 <matplotlib.lines.Line2D at 0x1e844f9f608>,
 <matplotlib.lines.Line2D at 0x1e844f9f7c8>,
 <matplotlib.lines.Line2D at 0x1e844f9fa08>,
 <matplotlib.lines.Line2D at 0x1e844f9fc88>,
 <matplotlib.lines.Line2D at 0x1e844f9fe88>,
 <matplotlib.lines.Line2D at 0x1e844fa5108>,
 <matplotlib.lines.Line2D at 0x1e844f9f988>,
 <matplotlib.lines.Line2D at 0x1e844f9fc08>,
 <matplotlib.lines.Line2D at 0x1e844f7b208>,
 <matplotlib.lines.Line2D at 0x1e844fa5948>,
 <matplotlib.lines.Line2D at 0x1e844fa5b88>]

在这里插入图片描述

plt.plot(np.linspace(1,10,50),np.sin(np.linspace(1,10,50)))

[<matplotlib.lines.Line2D at 0x1e845024a48>]

在这里插入图片描述

图名
x轴标签
y轴标签
图例
x轴边界
y轴边界
x刻度
y刻度
x刻度标签
y刻度标签

data_df.head()

---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

<ipython-input-28-edf0c3c225cb> in <module>
----> 1 data_df.head()


NameError: name 'data_df' is not defined

data_df = pd.DataFrame(x_df[['AGE','RM']])

fig = data_df.plot(figsize=(9,6))  # figsize：创建图表窗口，设置窗口大小

# plt.title('age and rm')  # 图名
# plt.xlabel('index')  # x轴标签
# plt.ylabel('value') # y轴标签

# plt.legend(loc = 'upper right') # 显示图例，loc表示位置
# plt.xlim([0,20]) # x轴边界
# plt.ylim([0,max(x_df['AGE'])])  # y轴边界

# plt.xticks(range(1,21,5))  # 设置x刻度
# plt.yticks(range(int(np.min(data_df.values))-5,int(np.max(data_df.values))+5,10))# 设置y刻度

#fig.set_xticklabels(i for i in '我瞎写的')   #x轴刻度标签
#fig.set_yticklabels("%.2f" %i for i in [0,0.2,0.4,0.6,0.8,1.0,1.2])   y轴刻度标签

在这里插入图片描述

2.2 图表样式及注解

linestyle
color
marker
style (linestyle、marker、color)
alpha
colormap #Matplotlib附带的色彩映射
grid
text

help(plt.plot)

df = x_df['AGE'][0:20]
df.plot(linestyle = '--',
       marker = 'o',
       color="r",
      grid=True)

<matplotlib.axes._subplots.AxesSubplot at 0x1e845117f08>

在这里插入图片描述

# df.plot(style="--.")
# df.plot(style='o')
# df.plot(style="r")
x_df[0:20].plot(colormap = 'Dark2_r') #通过将_r附加到名称，例如Dark2_r，可以获得每个这些颜色映射的反转版本

<matplotlib.axes._subplots.AxesSubplot at 0x1a28a57790>

在这里插入图片描述

cmaps = [('Perceptually Uniform Sequential', [
            'viridis', 'plasma', 'inferno', 'magma', 'cividis']),
         ('Sequential', [
            'Greys', 'Purples', 'Blues', 'Greens', 'Oranges', 'Reds',
            'YlOrBr', 'YlOrRd', 'OrRd', 'PuRd', 'RdPu', 'BuPu',
            'GnBu', 'PuBu', 'YlGnBu', 'PuBuGn', 'BuGn', 'YlGn']),
         ('Sequential (2)', [
            'binary', 'gist_yarg', 'gist_gray', 'gray', 'bone', 'pink',
            'spring', 'summer', 'autumn', 'winter', 'cool', 'Wistia',
            'hot', 'afmhot', 'gist_heat', 'copper']),
         ('Diverging', [
            'PiYG', 'PRGn', 'BrBG', 'PuOr', 'RdGy', 'RdBu',
            'RdYlBu', 'RdYlGn', 'Spectral', 'coolwarm', 'bwr', 'seismic']),
         ('Cyclic', ['twilight', 'twilight_shifted', 'hsv']),
         ('Qualitative', [
            'Pastel1', 'Pastel2', 'Paired', 'Accent',
            'Dark2', 'Set1', 'Set2', 'Set3',
            'tab10', 'tab20', 'tab20b', 'tab20c']),
         ('Miscellaneous', [
            'flag', 'prism', 'ocean', 'gist_earth', 'terrain', 'gist_stern',
            'gnuplot', 'gnuplot2', 'CMRmap', 'cubehelix', 'brg',
            'gist_rainbow', 'rainbow', 'jet', 'nipy_spectral', 'gist_ncar'])]

gradient = np.linspace(0, 1, 256)
gradient = np.vstack((gradient, gradient))  
#np.vstack: 按垂直方向（行顺序）堆叠数组构成一个新的数组

def plot_color_gradients(cmap_category, cmap_list):
    nrows = len(cmap_list)
    figh = 0.35 + 0.15 + (nrows + (nrows-1)*0.1)*0.22
    fig, axes = plt.subplots(nrows=nrows, figsize=(6.4, figh))
    fig.subplots_adjust(top=1-.35/figh, bottom=.15/figh, left=0.2, right=0.99)

    axes[0].set_title(cmap_category + ' colormaps', fontsize=14)

    for ax, name in zip(axes, cmap_list):
        ax.imshow(gradient, aspect='auto', cmap=plt.get_cmap(name))
        ax.text(-.01, .5, name, va='center', ha='right', fontsize=10,
                transform=ax.transAxes)

    for ax in axes:
        ax.set_axis_off()


for cmap_category, cmap_list in cmaps:
    plot_color_gradients(cmap_category, cmap_list)

在这里插入图片描述

df.plot(style = 'o')
plt.plot(df.argmax(),df.max(),marker = 'o',color = 'r')
plt.text(df.argmax(),max(df),'max_age',fontsize=12)

Text(8, 100.0, 'max_age')

在这里插入图片描述

sns.plot

2.3 子图

help(plt.figure)

fig_1 = plt.figure(num=1,figsize=(8,6))
plt.plot(df,'r--')
fig_2 = plt.figure(num=1,figsize=(8,6))
plt.plot(x_df['AGE'][20:40])
fig_2 = plt.figure(num=2,figsize=(8,6))
plt.plot(x_df['AGE'][40:60])

[<matplotlib.lines.Line2D at 0x1a27d7d3d0>]

在这里插入图片描述

help(plt.subplots)

fig,axes = plt.subplots(2,3,figsize=(10,4)) #创建一个新的figure，并返回一个subplot对象的numpy数组

在这里插入图片描述

fig

在这里插入图片描述

axes

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000001E845124B88>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E8451A4F08>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E8451DF748>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x000001E845215E88>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E8452ADD88>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E8452E6CC8>]],
      dtype=object)

ax1 = axes[0,2]
ax1.plot(df)
fig

在这里插入图片描述

fig,axes = plt.subplots(2,3,figsize=(10,4) ,sharex = True,sharey =True)  # 画出来的图x轴。y轴共享

在这里插入图片描述

df_4 = x_df[['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX']]
df_4.plot(style = '-',alpha = 0.4,figsize = (20,8),
       subplots = True,
       layout = (1,5),
        sharey = True)
plt.subplots_adjust(wspace=0,hspace=0.2)

x_df

	CRIM	ZN	INDUS	CHAS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT
0	0.00632	18.0	2.31	0.0	0.538	6.575	65.2	4.0900	1.0	296.0	15.3	396.90	4.98
1	0.02731	0.0	7.07	0.0	0.469	6.421	78.9	4.9671	2.0	242.0	17.8	396.90	9.14
2	0.02729	0.0	7.07	0.0	0.469	7.185	61.1	4.9671	2.0	242.0	17.8	392.83	4.03
3	0.03237	0.0	2.18	0.0	0.458	6.998	45.8	6.0622	3.0	222.0	18.7	394.63	2.94
4	0.06905	0.0	2.18	0.0	0.458	7.147	54.2	6.0622	3.0	222.0	18.7	396.90	5.33
...	...	...	...	...	...	...	...	...	...	...	...	...	...
501	0.06263	0.0	11.93	0.0	0.573	6.593	69.1	2.4786	1.0	273.0	21.0	391.99	9.67
502	0.04527	0.0	11.93	0.0	0.573	6.120	76.7	2.2875	1.0	273.0	21.0	396.90	9.08
503	0.06076	0.0	11.93	0.0	0.573	6.976	91.0	2.1675	1.0	273.0	21.0	396.90	5.64
504	0.10959	0.0	11.93	0.0	0.573	6.794	89.3	2.3889	1.0	273.0	21.0	393.45	6.48
505	0.04741	0.0	11.93	0.0	0.573	6.030	80.8	2.5050	1.0	273.0	21.0	396.90	7.88

506 rows × 13 columns

df_4.plot(style = '-',alpha = 0.4,figsize = (20,8),
       subplots = False,
       layout = (1,5),
        sharey = True)
plt.subplots_adjust(wspace=0,hspace=0)

在这里插入图片描述

3.分布数据

datasets.load_iris

---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

<ipython-input-44-fea465bb406f> in <module>
----> 1 datasets.load_iris


NameError: name 'datasets' is not defined

x_df.head()

	CRIM	ZN	INDUS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT
0	0.00632	18.0	2.31	0.538	6.575	65.2	4.0900	1.0	296.0	15.3	396.90	4.98
1	0.02731	0.0	7.07	0.469	6.421	78.9	4.9671	2.0	242.0	17.8	396.90	9.14
2	0.02729	0.0	7.07	0.469	7.185	61.1	4.9671	2.0	242.0	17.8	392.83	4.03
3	0.03237	0.0	2.18	0.458	6.998	45.8	6.0622	3.0	222.0	18.7	394.63	2.94
4	0.06905	0.0	2.18	0.458	7.147	54.2	6.0622	3.0	222.0	18.7	396.90	5.33

y_df.head()

	0
0	24.0
1	21.6
2	34.7
3	33.4
4	36.2

3.1 直方图

表示数据的分布情况

3.1.1 matplotlib

plt.hist(x_df['AGE'])

(array([ 14.,  31.,  29.,  42.,  32.,  38.,  39.,  42.,  71., 168.]),
 array([  2.9 ,  12.61,  22.32,  32.03,  41.74,  51.45,  61.16,  70.87,
         80.58,  90.29, 100.  ]),
 <a list of 10 Patch objects>)

在这里插入图片描述

3.1.2 seaborn

sns.distplot(x_df['AGE'])

<matplotlib.axes._subplots.AxesSubplot at 0x1e84721ce08>

在这里插入图片描述

sns.distplot(x_df['AGE'],
             bins = 10,       # bins → 箱数
             hist = True,      # hist、ked → 是否显示箱/密度曲线
             kde = True,
             norm_hist=True,   # norm_hist → 直方图是否按照密度来显示
            rug = True,        # rug → 是否显示数据分布情况
             vertical = False, # vertical → 是否水平显示
            color = 'y',       # color → 设置颜色
             axlabel = 'x')    # axlabel → x轴标注

<matplotlib.axes._subplots.AxesSubplot at 0x1a314d2f10>

在这里插入图片描述

sns.distplot(x_df['AGE'],
             rug = True, 
            rug_kws = {'color':'g'} ,   # 设置数据频率分布颜色
            kde_kws={"color": "k", "lw": 1, "label": "AGE",'linestyle':'--'},   # 设置密度曲线颜色，线宽，标注、线形
            hist_kws={"histtype": "step", "linewidth": 1,"alpha": 1, "color": "g"})  # 设置箱子的风格、线宽、透明度、颜色

<matplotlib.axes._subplots.AxesSubplot at 0x1a32621d90>

在这里插入图片描述

3.1.3 密度图

sns.kdeplot(x_df['AGE'],x_df['RM'],
           cbar = True,    # 是否显示颜色图例
           shade = True,   # 是否填充
           cmap = 'Reds',  # 设置调色盘
           shade_lowest=False,  # 最外围颜色是否显示
           n_levels = 10   # 曲线个数（如果非常多，则会越平滑）
           )
# 两个维度数据生成曲线密度图，以颜色作为密度衰减显示

sns.rugplot(x_df['AGE'], color="y", axis='x',alpha = 0.5)
sns.rugplot(x_df['RM'], color="g", axis='y',alpha = 0.5)
# 注意设置x，y轴

<matplotlib.axes._subplots.AxesSubplot at 0x1a320a8990>

在这里插入图片描述

sns.kdeplot(x_df['AGE'][0:200],x_df['RM'][0:200],cmap = 'Greens',
            shade = True,shade_lowest=False)
sns.kdeplot(x_df['AGE'][200:400],x_df['RM'][200:400],cmap = 'Blues',
            shade = True,shade_lowest=False)
# 创建图表
sns.rugplot(x_df['AGE'][0:400], color="g", axis='x',alpha = 0.5)
sns.rugplot(x_df['RM'][0:400], color="r", axis='y',alpha = 0.5)

<matplotlib.axes._subplots.AxesSubplot at 0x1a3254af50>

在这里插入图片描述

3.2 散点图

3.2.1 matplotlib

plt.scatter(range(0,y_df.shape[0]),
            x_df['AGE'],
            marker='.',
           s = (y_df-y_df.mean())*10, #s：散点的大小
           cmap = 'Reds_r', 
           alpha = 1,)

<matplotlib.collections.PathCollection at 0x1a33592390>

在这里插入图片描述

3.2.2 seaborn

sns.jointplot(range(0,y_df.shape[0]), y=x_df['AGE'],  # 设置xy轴，显示columns名称
              data=x_df,   # 设置数据源
              s = (y_df-y_df.mean())*10,
              edgecolor="w",linewidth=1,  # 设置散点大小、边缘线颜色及宽度(只针对scatter）
              kind = 'scatter',   # 设置类型：“scatter”、“reg”、“resid”、“kde”、“hex”
              space = 0.2,  # 设置散点图和布局图的间距
              size = 8,   # 图表大小（自动调整为正方形）
              ratio = 5,  # 散点图与布局图高度比，整型
              marginal_kws=dict(bins=15, rug=True)  # 设置柱状图箱数，是否设置rug
              )

<seaborn.axisgrid.JointGrid at 0x1a343e4c10>

在这里插入图片描述

sns.jointplot(x=x_df['LSTAT'], y=x_df['AGE'],  # 设置xy轴，显示columns名称
              data=x_df,   # 设置数据
              s = (y_df-y_df.mean())*10,
              edgecolor="w",linewidth=1,  # 设置散点大小、边缘线颜色及宽度(只针对scatter）
              kind = 'scatter',   # 设置类型：“scatter”、“reg”、“resid”、“kde”、“hex”
              marginal_kws=dict(bins=15, rug=True)  # 设置柱状图箱数，是否设置rug
              )

<seaborn.axisgrid.JointGrid at 0x1a34bb3d10>

with sns.axes_style("white"):
    sns.jointplot(x=x_df['LSTAT'], y=x_df['AGE'],data = x_df, kind="hex", color="g",
                 marginal_kws=dict(bins=20))

在这里插入图片描述

g = sns.jointplot(x=x_df['LSTAT'], y=x_df['AGE'],data = x_df,
                  kind="kde", color="k",
                  shade_lowest=False)
# 创建密度图

g.plot_joint(plt.scatter,c="w", s=30, linewidth=1, marker="*")
# 添加散点图

<seaborn.axisgrid.JointGrid at 0x1a350b7cd0>

在这里插入图片描述

sns.set_style("white")
# 设置风格

g = sns.JointGrid(x='LSTAT', y='RM', data=x_df)
# 创建一个绘图表格区域，设置好x、y对应数据
g.plot_joint(plt.scatter, color ='m', edgecolor = 'white')  # 设置框内图表，scatter
g.ax_marg_x.hist(x_df['LSTAT'], color="b", alpha=.6)            # 设置x轴直方图，注意bins是数组
g.ax_marg_y.hist(x_df['RM'], color="r", alpha=.6,
                 orientation="horizontal")            # 设置x轴直方图，注意需要orientation参数

(array([  2.,   4.,  14.,  45., 177., 151.,  69.,  22.,  13.,   9.]),
 array([3.561 , 4.0829, 4.6048, 5.1267, 5.6486, 6.1705, 6.6924, 7.2143,
        7.7362, 8.2581, 8.78  ]),
 <a list of 10 Patch objects>)

在这里插入图片描述

3.3 矩阵散点图

3.3.1 matplotlib

from sklearn.datasets import load_iris

iris = load_iris()
iris

{'data': array([[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2],
        [5.4, 3.9, 1.7, 0.4],
        [4.6, 3.4, 1.4, 0.3],
        [5. , 3.4, 1.5, 0.2],
        [4.4, 2.9, 1.4, 0.2],
        [4.9, 3.1, 1.5, 0.1],
        [5.4, 3.7, 1.5, 0.2],
        [4.8, 3.4, 1.6, 0.2],
        [4.8, 3. , 1.4, 0.1],
        [4.3, 3. , 1.1, 0.1],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [5.4, 3.9, 1.3, 0.4],
        [5.1, 3.5, 1.4, 0.3],
        [5.7, 3.8, 1.7, 0.3],
        [5.1, 3.8, 1.5, 0.3],
        [5.4, 3.4, 1.7, 0.2],
        [5.1, 3.7, 1.5, 0.4],
        [4.6, 3.6, 1. , 0.2],
        [5.1, 3.3, 1.7, 0.5],
        [4.8, 3.4, 1.9, 0.2],
        [5. , 3. , 1.6, 0.2],
        [5. , 3.4, 1.6, 0.4],
        [5.2, 3.5, 1.5, 0.2],
        [5.2, 3.4, 1.4, 0.2],
        [4.7, 3.2, 1.6, 0.2],
        [4.8, 3.1, 1.6, 0.2],
        [5.4, 3.4, 1.5, 0.4],
        [5.2, 4.1, 1.5, 0.1],
        [5.5, 4.2, 1.4, 0.2],
        [4.9, 3.1, 1.5, 0.2],
        [5. , 3.2, 1.2, 0.2],
        [5.5, 3.5, 1.3, 0.2],
        [4.9, 3.6, 1.4, 0.1],
        [4.4, 3. , 1.3, 0.2],
        [5.1, 3.4, 1.5, 0.2],
        [5. , 3.5, 1.3, 0.3],
        [4.5, 2.3, 1.3, 0.3],
        [4.4, 3.2, 1.3, 0.2],
        [5. , 3.5, 1.6, 0.6],
        [5.1, 3.8, 1.9, 0.4],
        [4.8, 3. , 1.4, 0.3],
        [5.1, 3.8, 1.6, 0.2],
        [4.6, 3.2, 1.4, 0.2],
        [5.3, 3.7, 1.5, 0.2],
        [5. , 3.3, 1.4, 0.2],
        [7. , 3.2, 4.7, 1.4],
        [6.4, 3.2, 4.5, 1.5],
        [6.9, 3.1, 4.9, 1.5],
        [5.5, 2.3, 4. , 1.3],
        [6.5, 2.8, 4.6, 1.5],
        [5.7, 2.8, 4.5, 1.3],
        [6.3, 3.3, 4.7, 1.6],
        [4.9, 2.4, 3.3, 1. ],
        [6.6, 2.9, 4.6, 1.3],
        [5.2, 2.7, 3.9, 1.4],
        [5. , 2. , 3.5, 1. ],
        [5.9, 3. , 4.2, 1.5],
        [6. , 2.2, 4. , 1. ],
        [6.1, 2.9, 4.7, 1.4],
        [5.6, 2.9, 3.6, 1.3],
        [6.7, 3.1, 4.4, 1.4],
        [5.6, 3. , 4.5, 1.5],
        [5.8, 2.7, 4.1, 1. ],
        [6.2, 2.2, 4.5, 1.5],
        [5.6, 2.5, 3.9, 1.1],
        [5.9, 3.2, 4.8, 1.8],
        [6.1, 2.8, 4. , 1.3],
        [6.3, 2.5, 4.9, 1.5],
        [6.1, 2.8, 4.7, 1.2],
        [6.4, 2.9, 4.3, 1.3],
        [6.6, 3. , 4.4, 1.4],
        [6.8, 2.8, 4.8, 1.4],
        [6.7, 3. , 5. , 1.7],
        [6. , 2.9, 4.5, 1.5],
        [5.7, 2.6, 3.5, 1. ],
        [5.5, 2.4, 3.8, 1.1],
        [5.5, 2.4, 3.7, 1. ],
        [5.8, 2.7, 3.9, 1.2],
        [6. , 2.7, 5.1, 1.6],
        [5.4, 3. , 4.5, 1.5],
        [6. , 3.4, 4.5, 1.6],
        [6.7, 3.1, 4.7, 1.5],
        [6.3, 2.3, 4.4, 1.3],
        [5.6, 3. , 4.1, 1.3],
        [5.5, 2.5, 4. , 1.3],
        [5.5, 2.6, 4.4, 1.2],
        [6.1, 3. , 4.6, 1.4],
        [5.8, 2.6, 4. , 1.2],
        [5. , 2.3, 3.3, 1. ],
        [5.6, 2.7, 4.2, 1.3],
        [5.7, 3. , 4.2, 1.2],
        [5.7, 2.9, 4.2, 1.3],
        [6.2, 2.9, 4.3, 1.3],
        [5.1, 2.5, 3. , 1.1],
        [5.7, 2.8, 4.1, 1.3],
        [6.3, 3.3, 6. , 2.5],
        [5.8, 2.7, 5.1, 1.9],
        [7.1, 3. , 5.9, 2.1],
        [6.3, 2.9, 5.6, 1.8],
        [6.5, 3. , 5.8, 2.2],
        [7.6, 3. , 6.6, 2.1],
        [4.9, 2.5, 4.5, 1.7],
        [7.3, 2.9, 6.3, 1.8],
        [6.7, 2.5, 5.8, 1.8],
        [7.2, 3.6, 6.1, 2.5],
        [6.5, 3.2, 5.1, 2. ],
        [6.4, 2.7, 5.3, 1.9],
        [6.8, 3. , 5.5, 2.1],
        [5.7, 2.5, 5. , 2. ],
        [5.8, 2.8, 5.1, 2.4],
        [6.4, 3.2, 5.3, 2.3],
        [6.5, 3. , 5.5, 1.8],
        [7.7, 3.8, 6.7, 2.2],
        [7.7, 2.6, 6.9, 2.3],
        [6. , 2.2, 5. , 1.5],
        [6.9, 3.2, 5.7, 2.3],
        [5.6, 2.8, 4.9, 2. ],
        [7.7, 2.8, 6.7, 2. ],
        [6.3, 2.7, 4.9, 1.8],
        [6.7, 3.3, 5.7, 2.1],
        [7.2, 3.2, 6. , 1.8],
        [6.2, 2.8, 4.8, 1.8],
        [6.1, 3. , 4.9, 1.8],
        [6.4, 2.8, 5.6, 2.1],
        [7.2, 3. , 5.8, 1.6],
        [7.4, 2.8, 6.1, 1.9],
        [7.9, 3.8, 6.4, 2. ],
        [6.4, 2.8, 5.6, 2.2],
        [6.3, 2.8, 5.1, 1.5],
        [6.1, 2.6, 5.6, 1.4],
        [7.7, 3. , 6.1, 2.3],
        [6.3, 3.4, 5.6, 2.4],
        [6.4, 3.1, 5.5, 1.8],
        [6. , 3. , 4.8, 1.8],
        [6.9, 3.1, 5.4, 2.1],
        [6.7, 3.1, 5.6, 2.4],
        [6.9, 3.1, 5.1, 2.3],
        [5.8, 2.7, 5.1, 1.9],
        [6.8, 3.2, 5.9, 2.3],
        [6.7, 3.3, 5.7, 2.5],
        [6.7, 3. , 5.2, 2.3],
        [6.3, 2.5, 5. , 1.9],
        [6.5, 3. , 5.2, 2. ],
        [6.2, 3.4, 5.4, 2.3],
        [5.9, 3. , 5.1, 1.8]]),
 'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]),
 'target_names': array(['setosa', 'versicolor', 'virginica'], dtype='<U10'),
 'DESCR': '.. _iris_dataset:\n\nIris plants dataset\n--------------------\n\n**Data Set Characteristics:**\n\n    :Number of Instances: 150 (50 in each of three classes)\n    :Number of Attributes: 4 numeric, predictive attributes and the class\n    :Attribute Information:\n        - sepal length in cm\n        - sepal width in cm\n        - petal length in cm\n        - petal width in cm\n        - class:\n                - Iris-Setosa\n                - Iris-Versicolour\n                - Iris-Virginica\n                \n    :Summary Statistics:\n\n    ============== ==== ==== ======= ===== ====================\n                    Min  Max   Mean    SD   Class Correlation\n    ============== ==== ==== ======= ===== ====================\n    sepal length:   4.3  7.9   5.84   0.83    0.7826\n    sepal width:    2.0  4.4   3.05   0.43   -0.4194\n    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)\n    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)\n    ============== ==== ==== ======= ===== ====================\n\n    :Missing Attribute Values: None\n    :Class Distribution: 33.3% for each of 3 classes.\n    :Creator: R.A. Fisher\n    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)\n    :Date: July, 1988\n\nThe famous Iris database, first used by Sir R.A. Fisher. The dataset is taken\nfrom Fisher\'s paper. Note that it\'s the same as in R, but not as in the UCI\nMachine Learning Repository, which has two wrong data points.\n\nThis is perhaps the best known database to be found in the\npattern recognition literature.  Fisher\'s paper is a classic in the field and\nis referenced frequently to this day.  (See Duda & Hart, for example.)  The\ndata set contains 3 classes of 50 instances each, where each class refers to a\ntype of iris plant.  One class is linearly separable from the other 2; the\nlatter are NOT linearly separable from each other.\n\n.. topic:: References\n\n   - Fisher, R.A. "The use of multiple measurements in taxonomic problems"\n     Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to\n     Mathematical Statistics" (John Wiley, NY, 1950).\n   - Duda, R.O., & Hart, P.E. (1973) Pattern Classification and Scene Analysis.\n     (Q327.D83) John Wiley & Sons.  ISBN 0-471-22361-1.  See page 218.\n   - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System\n     Structure and Classification Rule for Recognition in Partially Exposed\n     Environments".  IEEE Transactions on Pattern Analysis and Machine\n     Intelligence, Vol. PAMI-2, No. 1, 67-71.\n   - Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule".  IEEE Transactions\n     on Information Theory, May 1972, 431-433.\n   - See also: 1988 MLC Proceedings, 54-64.  Cheeseman et al"s AUTOCLASS II\n     conceptual clustering system finds 3 classes in the data.\n   - Many, many more ...',
 'feature_names': ['sepal length (cm)',
  'sepal width (cm)',
  'petal length (cm)',
  'petal width (cm)'],
 'filename': '/Users/edz/opt/anaconda3/lib/python3.7/site-packages/sklearn/datasets/data/iris.csv'}

iris_x = pd.DataFrame(iris['data'],columns=iris['feature_names'])
iris_x.head()

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

iris_y = pd.DataFrame(iris['target'])
iris_y

	0
0	0
1	0
2	0
3	0
4	0
...	...
145	2
146	2
147	2
148	2
149	2

150 rows × 1 columns

from pandas.plotting import scatter_matrix
scatter_matrix(iris_x,figsize=(10,6),
                 marker = 'o',
                 diagonal='kde',
                 alpha = 0.5,
                 range_padding=0.5,
                  cmap='Summer')

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x1a412669d0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x1a4488cf90>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x1a445d9f90>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x1a45a3f850>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x1a3d349bd0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x1a42c02890>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x1a424ecd10>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x1a54eab8d0>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x1a54ebc450>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x1a376e5dd0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x1a53ad8c90>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x1a4f219950>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x1a379d5cd0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x1a3863b990>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x1a51013d10>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x1a376b99d0>]],
      dtype=object)

在这里插入图片描述

3.3.2 seaborn

sns.pairplot(iris_x.join(iris_y),
            kind = 'reg',  # 散点图/回归分布图 {‘scatter’, ‘reg’}  
            diag_kind="kde",  # 直方图/密度图 {‘hist’, ‘kde’}
            hue=0,   # 按照某一字段进行分类
            palette="husl",  # 设置调色板
            markers=["o", "s", "D"],  # 设置不同系列的点样式（这里根据参考分类个数）
            size = 2,   # 图表大小
            )

/Users/edz/opt/anaconda3/lib/python3.7/site-packages/seaborn/axisgrid.py:2079: UserWarning: The `size` parameter has been renamed to `height`; please update your code.
  warnings.warn(msg, UserWarning)





<seaborn.axisgrid.PairGrid at 0x1a4bb45b90>

在这里插入图片描述

iris['feature_names']

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

sns.pairplot(iris_x.join(iris_y),vars=['sepal length (cm)', 'petal length (cm)'],
             kind = 'reg', diag_kind="kde", 
             hue=0, palette="husl")

<seaborn.axisgrid.PairGrid at 0x1a41860a90>

在这里插入图片描述

4.分类数据可视化

data['feature_names']

array(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD',
       'TAX', 'PTRATIO', 'B', 'LSTAT'], dtype='<U7')

4.1 分类散点图

sns.stripplot(x="CHAS",          # x → 设置分组统计字段
              y=0,   # y → 数据分布统计字段
              # 这里xy数据对调，将会使得散点图横向分布
              data=x_df.join(y_df),        # data → 对应数据
              jitter = True,    # jitter → 当点数据重合较多时，用该参数做一些调整，也可以设置间距如：jitter = 0.1
              size = 5, edgecolor = 'w',linewidth=1,marker = 'o'  # 设置点的大小、描边颜色或宽度、点样式
              )

<matplotlib.axes._subplots.AxesSubplot at 0x1a3d276910>

在这里插入图片描述

ZN

sns.stripplot(x="CHAS", 
              y=0,
              hue="RAD",
              data=x_df.join(y_df), 
              jitter=True)

<matplotlib.axes._subplots.AxesSubplot at 0x1a43defc10>

在这里插入图片描述

sns.stripplot(x="RAD", 
              y=0,
              hue="CHAS",
              data=x_df.join(y_df), 
              jitter=True,
              palette="Set2",  # 设置调色盘
              dodge=True,  # 是否拆分
             )

<matplotlib.axes._subplots.AxesSubplot at 0x1a43defa10>

在这里插入图片描述

# stripplot()
# 筛选分类类别

print(x_df['RAD'].value_counts())
# 查看day字段的唯一值

sns.stripplot(x='RAD', y=0, data=x_df.join(y_df),jitter = True, 
              order = [4.0,5.0,24.0])
# order → 筛选类别

24.0    132
5.0     115
4.0     110
3.0      38
6.0      26
8.0      24
2.0      24
1.0      20
7.0      17
Name: RAD, dtype: int64





<matplotlib.axes._subplots.AxesSubplot at 0x1a394e9490>

在这里插入图片描述