多元线性回归预测国家幸福指数

最新推荐文章于 2023-01-03 14:25:17 发布

PearNotBear

最新推荐文章于 2023-01-03 14:25:17 发布

阅读量1.1k

点赞数 2

文章标签：机器学习深度学习

本文链接：https://blog.csdn.net/pearbear/article/details/120052401

版权

本文探讨了通过结合多维度经济指标（如GDP和自由度）使用多元线性回归预测世界幸福指数。通过三维可视化展示数据分布，模型训练后对比单变量回归，结果显示模型性能改善。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

多元线性回归

之前复现了单特征预测幸福指数的线性回归模型，
现在使用多向量特征回归看看是否会减小误差

导库与查看数据

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly
import plotly.graph_objs as go

plotly.offline.init_notebook_mode()

from homemade.linear_regression import LinearRegression


data = pd.read_csv('F:/MLdata/data/world-happiness-report-2017.csv')


data.head(10)

查看变量与预测变量之间的分布关系

histohrams = data.hist(grid=False, figsize=(10, 10))

在这里插入图片描述

拆分训练集测试集

input_param_name_1 = 'Economy..GDP.per.Capita.'
input_param_name_2 = 'Freedom'
output_param_name = 'Happiness.Score'

# Split training set input and output.
x_train = train_data[[input_param_name_1, input_param_name_2]].values
y_train = train_data[[output_param_name]].values

# Split test set input and output.
x_test = test_data[[input_param_name_1, input_param_name_2]].values
y_test = test_data[[output_param_name]].values

**在这里和之前发生了变化，输入变量变成了多维向量以GDP和FreeDom为例，绘制三维分布图

plot_training_trace = go.Scatter3d(
    x=x_train[:, 0].flatten(),
    y=x_train[:, 1].flatten(),
    z=y_train.flatten(),
    name='Training Set',
    mode='markers',
    marker={
        'size': 10,
        'opacity': 1,
        'line': {
            'color': 'rgb(255, 255, 255)',
            'width': 1
        },
    }
)

plot_test_trace = go.Scatter3d(
    x=x_test[:, 0].flatten(),
    y=x_test[:, 1].flatten(),
    z=y_test.flatten(),
    name='Test Set',
    mode='markers',
    marker={
        'size': 10,
        'opacity': 1,
        'line': {
            'color': 'rgb(255, 255, 255)',
            'width': 1
        },
    }
)

plot_layout = go.Layout(
    title='Date Sets',
    scene={
        'xaxis': {'title': input_param_name_1},
        'yaxis': {'title': input_param_name_2},
        'zaxis': {'title': output_param_name} 
    },
    margin={'l': 0, 'r': 0, 'b': 0, 't': 0}
)

plot_data = [plot_training_trace, plot_test_trace]

plot_figure = go.Figure(data=plot_data, layout=plot_layout)

plotly.offline.iplot(plot_figure)

绘制结果如下图所示，由于图片格式的限制，这里就只截图了，可以运行以后对每个点进行分析
在这里插入图片描述

模型训练

num_iterations = 500 
regularization_param = 0  
learning_rate = 0.01  
polynomial_degree = 0  
sinusoid_degree = 0   multipliers of additional features.

# Init linear regression instance.
linear_regression = LinearRegression(x_train, y_train, polynomial_degree, sinusoid_degree)

# Train linear regression.
(theta, cost_history) = linear_regression.train(
    learning_rate,
    regularization_param,
    num_iterations
)


print('Initial cost: {:.2f}'.format(cost_history[0]))
print('Optimized cost: {:.2f}'.format(cost_history[-1]))


theta_table = pd.DataFrame({'Model Parameters': theta.flatten()})
theta_table.head()

结果分析

梯度下降图

plt.plot(range(num_iterations), cost_history)
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.title('Gradient Descent Progress')
plt.show()

在这里插入图片描述
绘制预测结果平面，可以很容易理解多元线性回归的原理，和单变量是类似的


predictions_num = 10


x_min = x_train[:, 0].min();
x_max = x_train[:, 0].max();

y_min = x_train[:, 1].min();
y_max = x_train[:, 1].max();

x_axis = np.linspace(x_min, x_max, predictions_num)
y_axis = np.linspace(y_min, y_max, predictions_num)


x_predictions = np.zeros((predictions_num * predictions_num, 1))
y_predictions = np.zeros((predictions_num * predictions_num, 1))


x_y_index = 0
for x_index, x_value in enumerate(x_axis):
    for y_index, y_value in enumerate(y_axis):
        x_predictions[x_y_index] = x_value
        y_predictions[x_y_index] = y_value
        x_y_index += 1


z_predictions = linear_regression.predict(np.hstack((x_predictions, y_predictions)))


plot_predictions_trace = go.Scatter3d(
    x=x_predictions.flatten(),
    y=y_predictions.flatten(),
    z=z_predictions.flatten(),
    name='Prediction Plane',
    mode='markers',
    marker={
        'size': 1,
    },
    opacity=0.8,
    surfaceaxis=2, 
)

plot_data = [plot_training_trace, plot_test_trace, plot_predictions_trace]
plot_figure = go.Figure(data=plot_data, layout=plot_layout)
plotly.offline.iplot(plot_figure)

在这里插入图片描述
可以很明显的看出预测的平面分布
将测试集和训练集进行对比，看看预测损失有多少

test_predictions = linear_regression.predict(x_test)

test_predictions_table = pd.DataFrame({
    'Economy GDP per Capita': x_test[:, 0].flatten(),
    'Freedom': x_test[:, 1].flatten(),
    'Test Happiness Score': y_test.flatten(),
    'Predicted Happiness Score': test_predictions.flatten(),
    'Prediction Diff': (y_test - test_predictions).flatten()
})

test_predictions_table.head(10)