1. 相关度
相关度又叫 皮尔逊相关系数 (Pearson Correlation Coefficient):
衡量两个值线性相关强度的量 取值范围 [-1, 1]: 正向相关: >0, 负向相关:<0, 无相关性:=0
cov:协方差 Var:方差
2. R平方值:
模型可以解释为多大程度是自变量导致因变量的改变
简单线性回归:R^2 = r * r
多元线性回归:
R平方也有其局限性:R平方随着自变量的增加会变大,R平方和样本量是有关系的。因此,我们要到R平方进行修正。得到R平方值adjusted,来评判线性回归模型的拟合度。修正的方法:
3. python的两种实现方法:
import numpy as np
# from astropy.units import Ybarn
import math
def computeCorrelation(x, y):
xBar = np.mean(x)
ybar = np.mean(y)
SSR = 0
varX = 0
varY = 0
for i in range(0, len(x)): #多少实例
diffxxBar = x[i] - xBar
diffyyBar = y[i] - ybar
SSR += (diffxxBar * diffyyBar)
varX += diffxxBar ** 2 # 求平方然后累计起来
varY += diffyyBar ** 2 # 求平方然后累计起来
SST = math.sqrt(varX * varY)
return SSR / SST
def polyfit(x, y, degree):
result = {} # 定义一个字典
coeffs = np.polyfit(x, y, degree) # 直接求出b0 b1 b2 b3 ..的估计值
result["polynomial"] = coeffs.tolist()
p = np.poly1d(coeffs) # 返回预测值
yhat = p(x) # 传入x 返回预测值
ybar = np.sum(y) / len (y) # 求均值
ssreg = np.sum((yhat -ybar)**2)
sstot = np.sum((y - ybar)**2)
result["determination"] = ssreg / sstot
return result
testX = [1, 3, 8, 7, 9]
testY = [10, 12, 24, 21, 34]
print("r:", computeCorrelation(testX, testY))
print("r**2:", (computeCorrelation(testX, testY)**2))
print("r**2", polyfit(testX, testY, 1)["determination"]) # degree=1 一次
print(polyfit(testX, testY, 1)["polynomial"]) # 打印除斜率和截距