首先给出longley.csv数据集
Year,GNP.deflator,GNP,Unemployed,Armed.Forces,Population,Year,Employed
1947,83.0,234.289,235.6,159.0,107.608,1947,60.323
1948,88.5,259.426,232.5,145.6,108.632,1948,61.122
1949,88.2,258.054,368.2,161.6,109.773,1949,60.171
1950,89.5,284.599,335.1,165.0,110.929,1950,61.187
1951,96.2,328.975,209.9,309.9,112.075,1951,63.221
1952,98.1,346.999,193.2,359.4,113.270,1952,63.639
1953,99.0,365.385,187.0,354.7,115.094,1953,64.989
1954,100.0,363.112,357.8,335.0,116.219,1954,63.761
1955,101.2,397.469,290.4,304.8,117.388,1955,66.019
1956,104.6,419.180,282.2,285.7,118.734,1956,67.857
1957,108.4,442.769,293.6,279.8,120.445,1957,68.169
1958,110.8,444.546,468.1,263.7,121.950,1958,66.513
1959,112.6,482.704,381.3,255.2,123.366,1959,68.655
1960,114.2,502.601,393.1,251.4,125.368,1960,69.564
1961,115.7,518.173,480.6,257.2,127.852,1961,69.331
1962,116.9,554.894,400.7,282.7,130.081,1962,70.551
下面的例子是为了预测GNP.deflator值
岭回归的标准方程法公式如下:
根据岭回归方程,使用python进行实现,代码如下:
import numpy as np
import pandas as pd
data = pd.read_csv('longley.csv')
x_data = data.iloc[:, 2:]
y_data = data.iloc[:, 1].values.reshape(-1, 1)
x0 = np.ones((16 ,1))
x_combination = np.concatenate((x0, x_data), axis=1)
def get_weights(x_arr, y_arr, lam=0.2):
x_mat = np.mat(x_arr)
y_mat = np.mat(y_arr)
x_t_x = x_mat.T * x_mat
# np.eye(x_t_x.shape[0])是生成和x_t_x大小一致的单位矩阵,x_t_x是一个方阵
bias = x_t_x + np.eye(x_t_x.shape[0]) * lam
# 计算矩阵的值,如果值为0,说明该矩阵没有逆矩阵
if np.linalg.det(bias) == 0.0:
print('This matrix cannot do inverse')
w = bias.I * x_mat.T * y_mat
return w
weights = get_weights(x_combination, y_data)
print(weights)
y_predict = np.mat(x_combination) * np.mat(weights)
print(y_predict)
输出如下:
[[ 7.38107630e-04]
[ 2.07703836e-01]
[ 2.10076376e-02]
[ 5.05385441e-03]
[-1.59173066e+00]
[ 1.10442920e-01]
[-2.42280461e-01]]
[[ 83.55075226]
[ 86.92588689]
[ 88.09720228]
[ 90.95677622]
[ 96.06951002]
[ 97.81955375]
[ 98.36444357]
[ 99.99814266]
[103.26832266]
[105.03165135]
[107.45224671]
[109.52190685]
[112.91863666]
[113.98357055]
[115.29845063]
[117.64279933]]