一元线性回归模型:
ε:模型误差项,平衡等号两边值
import seaborn as sns
income = pd.read_csv(r'Salary_Date.csv')
sns.lmplot(x='YearExperience',y='Salary',
data=income,ci=None)
plt.show()
线性拟合求解:
误差项最小,转换为误差平方项最小
最小时,偏导数为0
①使用基本语法求解
n = income.shape[0]
sum_x = income.YearsExperience.sum()
sum_y = income.Salary.sum()
sum_x2 = income.YearsExperience.pow(2).sum()
xy = income.YearsExperience * income.Salary
sum_xy = xy.sum()
b = (sum_xy - sum_x * sum_y / n) / (sum_x2 - sum_x ** 2 / n)
a = sum_y.mean() - b * sum_x.mean()
②使用statsmodels中的ols函数
ols(formula,data,subset=None,drop_cols)
formula:‘y~x’
subset:bool类型,子集建模
import statsmodels.api as sm
fit = sm.formula.ols('income.Salary ~ income.YearsExperience',data=income).fit()
fit.params
多元线性回归
构建多元线性回归的数据集包含n个观测,p+1个变量(p个自变量,1个因变量)