数据线性关系的可视化
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(color_codes=True)
np.random.seed(sum(map(ord, "regression")))
tips = sns.load_dataset("tips")
tips.head()
total_bill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
线性回归模型
# sns.regplot?
sns.regplot(x="total_bill", y="tip", data=tips);
[外链图片转存(img-9RVX2IDz-1562743123681)(output_7_0.png)]
sns.lmplot(x="total_bill", y="tip", data=tips);
[外链图片转存失败(img-Jqsftv5A-1562743123683)(output_8_0.png)]
** sns.regplot() vs sns.lmplot() **
- 相同:
- 绘制两变量的散点图并拟合回归曲线并绘制95%置信区间
- 共享主要的绘图内核函数
- 不同:
- 图的形状不同
- 接受的参数不同
- .regplot(),接受多种类型的数据结构,ndarray,series,objects
- .lmplot(),只接受字符串,也就是x,y必须指定为字符串,及DataFrame中的变量名放在引号中。
sns.lmplot(x="size", y="tip", data=tips, x_jitter=.25);
[外链图片转存失败(img-eoIRTtMH-1562743123683)(output_10_0.png)]
sns.lmplot(x="size", y="tip", data=tips, x_estimator=np.mean);
[外链图片转存失败(img-03BgyfIl-1562743123684)(output_11_0.png)]
拟合不同类型的模型
anscombe = sns.load_dataset("anscombe")
anscombe.head()
dataset | x | y | |
---|---|---|---|
0 | I | 10.0 | 8.04 |
1 | I | 8.0 | 6.95 |
2 | I | 13.0 | 7.58 |
3 | I | 9.0 | 8.81 |
4 | I | 11.0 | 8.33 |
sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'I'"), ci=None, scatter_kws={"s": 120});
[外链图片转存失败(img-0KISi5ZV-1562743123684)(output_14_0.png)]
sns.lmplot(x="x", y="y", data=anscombe, ci=None, scatter_kws={"s": 120}, col="dataset");
[外链图片转存失败(img-JMYfH2Ci-1562743123684)(output_15_0.png)]
sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'II'"), order=2, ci=None, scatter_kws={"s": 80});
[外链图片转存失败(img-aZt5ZllT-1562743123685)(output_16_0.png)]
sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'III'"), robust=True, ci=None, scatter_kws={"s": 80});
[外链图片转存失败(img-THigsBHz-1562743123685)(output_17_0.png)]
tips["big_tip"] = (tips.tip / tips.total_bill) > .15 # tips超过总账单的15%
sns.lmplot(x="total_bill", y="big_tip", data=tips, y_jitter=.03);
[外链图片转存失败(img-yWdYsd2G-1562743123685)(output_18_0.png)]
# y="big_tip" 是二值变量,所以我们可以拟合logistic regression
sns.lmplot(x="total_bill", y="big_tip", data=tips, logistic=True, y_jitter=.03);
[外链图片转存失败(img-OYzq4KYb-1562743123686)(output_19_0.png)]
sns.lmplot(x="total_bill", y="tip", data=tips, lowess=False);
[外链图片转存失败(img-Y4CLoYk0-1562743123686)(output_20_0.png)]
sns.lmplot(x="total_bill", y="tip", data=tips, lowess=True);
[外链图片转存失败(img-jfxhBab2-1562743123688)(output_21_0.png)]
残差图
sns.residplot(x="x", y="y", data=anscombe.query("dataset == 'I'"), scatter_kws={"s": 80});
[外链图片转存失败(img-ZRyXsIXk-1562743123688)(output_23_0.png)]
sns.residplot(x="x", y="y", data=anscombe.query("dataset == 'II'"), scatter_kws={"s": 80}, color="g");
[外链图片转存失败(img-lcC7luAQ-1562743123688)(output_24_0.png)]
sns.residplot(x="x", y="y", data=anscombe.query("dataset == 'II'"), order=2, scatter_kws={"s": 80});
[外链图片转存失败(img-1B0iwFmL-1562743123689)(output_25_0.png)]
以其他变量分组后拟合模型
sns.lmplot(x="total_bill", y="tip", hue="smoker", data=tips);
[外链图片转存失败(img-Ip395rOR-1562743123689)(output_27_0.png)]
sns.lmplot(x="total_bill", y="tip", hue="smoker", data=tips, markers=["o", "x"], palette="Set1");
[外链图片转存失败(img-uZMgtwrj-1562743123689)(output_28_0.png)]
sns.lmplot(x="total_bill", y="tip", hue="smoker", col="time", data=tips);
[外链图片转存失败(img-5SSELttl-1562743123690)(output_29_0.png)]
sns.lmplot(x="total_bill", y="tip", hue="smoker", col="time", row="sex", data=tips);
[外链图片转存失败(img-HsjrsyDT-1562743123690)(output_30_0.png)]
设置图形的大小和形状
f, ax = plt.subplots(figsize=(5, 6))
sns.regplot(x="total_bill", y="tip", data=tips, ax=ax);
[外链图片转存失败(img-LwXcpmTH-1562743123690)(output_32_0.png)]
sns.lmplot(x="total_bill", y="tip", col="day", data=tips, col_wrap=4, size=4);
[外链图片转存失败(img-EwBPYtUL-1562743123691)(output_33_0.png)]
sns.lmplot?
sns.lmplot(x="total_bill", y="tip", col="day", data=tips, aspect=.8);
# size, 高
# aspect, 屏幕高宽比
[外链图片转存失败(img-Gz3WISYn-1562743123691)(output_35_0.png)]
在其他上下文中绘制回归图
sns.jointplot(x="total_bill", y="tip", data=tips, kind="reg");
[外链图片转存失败(img-IXCZxEtq-1562743123691)(output_37_0.png)]
sns.pairplot(tips, x_vars=["total_bill", "size"], y_vars=["tip"], size=5, aspect=.9, kind="reg");
[外链图片转存失败(img-qh5p22OJ-1562743123692)(output_38_0.png)]
sns.pairplot(tips, x_vars=["total_bill", "size"], y_vars=["tip"],
hue="smoker", size=5, aspect=.8, kind="reg");
[外链图片转存失败(img-AZ7FwunS-1562743123692)(output_39_0.png)]