小白努力第一天——数据可视化（基础scatter and lines）

优化采样

已于 2024-05-22 12:50:01 修改

阅读量918

点赞数 29

文章标签：信息可视化

于 2024-05-22 12:47:05 首次发布

本文链接：https://blog.csdn.net/qq_41282884/article/details/139092576

版权

散点图 sns.jointplot 可以看出分布情况

基本线型图:

带有误差的线型图（这里使用seaborn

绘制多条线基于x变量相同

视图分类

可能会研究单变量的影响，或多变量的影响。根据变量之间的关系，例如是两个互相影响的变量，还是需要对比的变量，分成以下四类

比较型：折线图，看不同的变量随着同一变量的关系如何
联系型：散点图，看变量之间的关系
构成型：饼状图
分布型：符合什么类型的分布

可视化工具

matplotlib库

一般用其中的pyplot工具包进行绘图

import matplotlib.pyplot as plt

seaborn工具包

提供更高级的数据分析，可以绘制scatter, box, 热力图等。自带一些示例数据集。 可以直接使用DataFrame进行绘图。

import seaborn as sns

将二者绘图进行比较：

散点图 sns.jointplot 可以看出分布情况

#利用matplotlib画散点图
fig,(ax1,ax2,ax3)=plt.subplots(1,3,figsize=(12,4))
ax1.scatter(x,y,marker='*')
plt.title('pyplot')
ax2.scatter(x,y,marker='^')
plt.title('pyplot')
plt.tight_layout()#调整布局

#利用seaborn画图
sns.scatterplot(x=x,y=y)
plt.title('seaborn')
plt.show()

#使用jointplot时候能看出各自分布
df=pd.DataFrame({'x':x,'y':y})#生成示例数据
sns.jointplot(x="x",y="y",data=df,kind='scatter',height=3)#探索两个变量之间的关系和各自的分布
plt.show()

基本线型图:

通过比较plt和seaborn我们可以看出：在生成的数列是随机的时候，seaborn的自变量可以自己从小到大排列，但是plt的不行

#生成线型图
fig3,(ax1,ax2)=plt.subplots(1,2,figsize=(8,4))
x1=np.random.randn(100)
#If let x=np.linspace(-1,1,100), the results of these 2 methods are same.
y1=np.sin(x1)
df1=pd.DataFrame({'x':x1,'y':y1})
sns.lineplot(x='x',y='y',data=df1,ax=ax1)
ax2.plot(x1,y1)
plt.tight_layout()
plt.show()

带有误差的线型图（这里使用seaborn

#先生成测试数据
x=(np.random.randn(400))*2
y=np.sin(x)
err=0.1*np.random.rand(len(x))
y_real=y+err
df1=pd.DataFrame({'x': x,'y':y})
df2=pd.DataFrame({'x':x,'y':y_real})
fig1,axes=plt.subplots(1,2,figsize=(8,4))
sns.lineplot(data=df1,x='x',y='y',ax=axes[0])
axes[0].set_title('y_desired')
sns.lineplot(data=df2,x='x',y='y',ax=axes[1])
axes[1].set_title('y_real')
plt.tight_layout()
plt.show()

绘制多条线基于x变量相同

x = np.linspace(0, 10, 200)
err=0.1*np.random.rand(len(x))
y_sin = np.sin(x)+err
y_cos = np.cos(x)+err

#合并数据的时候可以创建分类标签
category=['sin']*len(y_sin)+['cos']*len(y_cos)
x_total=np.concatenate([x,x])#合并两个数据集
y_total=np.concatenate([y_sin,y_cos])
df3=pd.DataFrame({'x':x_total,'y':y_total,'category':category})

#进行画图
sns.lineplot(data=df3,x='x',y='y',hue='category')
plt.show()

利用np.concatenate合并数据集
再利用category将其都配置上标签（若原始是dataframe格式的一般会有自己的标签
根据sns.lineplot中的hue参数：依据category绘制不同颜色的线条