《Machine Learning A-Z》- Part 2 (2) - Multiple Linear Regression

本文介绍了多元线性回归的概念,其中包含多个独立变量影响因变量。讨论了线性回归的假设,如线性关系、同方差性等,并解释了如何处理分类变量的哑变量。同时,提到了避免哑变量陷阱的方法。还探讨了模型构建的选择,如全部纳入、向前选择、向后消除等。文章以50_Startups.csv数据集为例,展示了在不进行特征缩放的情况下进行多元线性回归。最后,给出了如何手动实现向后消除的选项,并强调了Scikit-Learn库在处理哑变量时的便利性。
摘要由CSDN通过智能技术生成

05.08.2021

  

Part 2

Regression

(2) Multiple Linear Regression

There are multiple independent variables x0, x1, x2...can influence dependent variable y

The data set does not necessarily have some perfect linear correlations.

Assumptions of a Linear Regression:

1. Linearity;        2. Homoscedasticity;        3. Multivariate normality;

4. Independence of errors;        5. Lack of multicollinearity

Dummy variables 

When there's categorical variable, we can create new columns for each category, place 1 on the row where this category is and 0 on the other rows. Those are called Dummy Variables, but we will only keep one of them, even though we won't lose information. Here we keep NY, the coefficient b4 will work like a light switch, altering from NY to CA by setting it as 1 and 0.

Dummy Variables Trap - you can't have the constant b0 and all dummy variables b4 and b5 at the same time. b5 = b4 - 1. So for example, if you have 9 dummy variables, then you only include 8. Never include all of the dummy variables.

Understanding the P-value and Statistical Level (SL)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值