数据可视化（一）：绘制散点图

最新推荐文章于 2024-10-16 20:41:28 发布

枪枪枪

最新推荐文章于 2024-10-16 20:41:28 发布

阅读量1.5w

点赞数 2

分类专栏： Machine Learning

本文链接：https://blog.csdn.net/az9996/article/details/86439755

版权

Machine Learning 专栏收录该内容

52 篇文章 10 订阅

订阅专栏

======================================================================= Machine Learning notebook
Python机器学习基础教程（introduction to Machine Learning with Python）
https://github.com/amueller/introduction_to_ml_with_python/blob/master/01-introduction.ipynb

=======================================================================

检查数据的最佳方法之一就是将其可视化。一种可视化方法就是绘制散点图。
数据散点图将一个特征作为x轴，另一个特征作为y轴，将每一个数据点绘制为图上的一个点。不幸的是，计算机屏幕只有两个维度，所以我们一次只能绘制两个特征（也有可能是3个，空间直角坐标系）。
缺点：
用这种方法难以对多于3个特征的数据集作图。
解决方法：
一种方法是绘制散点图矩阵（pair plot），从而可以两两查看所有的特征。如果特征数不多的话，这种方法是合理的。但是要记住，散点图矩阵无法同时显示所有特征之间的关系，所以这种可视化方法可能无法展示数据的某些有趣内容。

将NumPy数组转换成pandas DataFrame。pandas有一个绘制散点图矩阵的函数，叫做scatter_matrix。矩阵的对角线是每个特征的直方图：

（数据来源为scikit-learn的datasets模块中的load_iris函数，对数据集用train_test_split方法进行了随机的拆分，目的是使各类型的数据均匀分布。train set占75%，test set占25%
为了确保多次运行同一函数能够得到相同的输出，我们利用random_state参数指定了随机数生成器的种子。这样函数输出就是固定不变的，所以这行代码的输出始终相同）

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

iris_dataset=load_iris()

X_train, X_test, y_train, y_test = train_test_split(
    iris_dataset['data'], iris_dataset['target'], random_state=0)

print("X_train shape:", X_train.shape)
print("y_train shape:", y_train.shape)

print("X_test shape:", X_test.shape)
print("y_test shape:", y_test.shape)

import mglearn
import pandas as pd

# 利用X_train中的数据创建DataFrame
# 利用iris_dataset.feature_names中的字符串对数据列进行标记
iris_dataframe = pd.DataFrame(X_train, columns=iris_dataset.feature_names)
# 利用dataframe创建散点图矩阵, 按y_train进行着色
pd.plotting.scatter_matrix(iris_dataframe, c=y_train, figsize=(15, 15),
                           marker='o', hist_kwds={'bins': 20}, s=60,
                           alpha=.8, cmap=mglearn.cm3)

#使用matplotlib.pyplot来显示图像
import matplotlib.pyplot as plt
plt.show()