[翻译] 支持向量回归SVR的介绍

氵文大师

已于 2022-08-14 22:13:58 修改

阅读量1.8k

点赞数 2

分类专栏：每日一氵文章标签：回归支持向量机机器学习

于 2022-08-14 16:37:15 首次发布

原文链接：https://towardsdatascience.com/an-introduction-to-support-vector-regression-svr-a3ebc1672c2

版权

每日一氵专栏收录该内容

162 篇文章

订阅专栏

X. Introduction

本文先翻译一下：
http://www.saedsayad.com/support_vector_machine_reg.htm

Support Vector Machine can also be used as a regression method, maintaining all the main features that characterize the algorithm (maximal margin).
支持向量机SVM也可以用作回归方法SVR，保留了SVM算法的所有主要思想（也就是最大化间隔）。

The Support Vector Regression (SVR) uses the same principles as the SVM for classification, with only a few minor differences.
支持向量回归 (SVR) 使用与 SVM 相同的分类原理，只有一点儿细微的差别。

First of all, because output is a real number it becomes very difficult to predict the information at hand, which has infinite possibilities.
首先，由于回归问题的输出是实数，因此很难十分精确的预测，因为这具有无限多的可能性。

In the case of regression, a margin of tolerance (epsilon) is set in approximation to the SVM which would have already requested from the problem.
在回归的情况下，间隔的容差 (epsilon) 设置为其实可以看做一个SVM的求解。(我也不知道该怎么翻译)

But besides this fact, there is also a more complicated reason, the algorithm is more complicated therefore to be taken in consideration.
但除了这个事实之外，还有一个更复杂的原因，算法更复杂。(我也不知道该怎么翻译)

However, the main idea is always the same: to minimize error, individualizing the hyperplane which maximizes the margin, keeping in mind that part of the error is tolerated.
但是，主要思想始终是相同的：最小化错误，找到那个能最大化间隔的超平面，记住部分“错误”是可以容忍的。

以上几句话，就是白说，没什么感觉对吧，我也是。不用纠结了，我换一篇博客看看，这篇还不错，建议直接看原文

An Introduction to Support Vector Regression (SVR)

https://towardsdatascience.com/an-introduction-to-support-vector-regression-svr-a3ebc1672c2

In this article, I will walk through the usefulness of SVR compared to other regression models, do a deep-dive into the math behind the algorithm, and provide an example using the Boston Housing Price dataset.
老哥带你深入了解SVR数学原理，并给一个波士顿房价的例子

1. 简单回归

大部分回归模型，目标是最小化均方误差，用 Ordinary Least Squares(OLS) [线性最小二乘?] 举个例子，这是目标函数：
$\min \sum_{i=1}^{n} (y_i - w_i x_i) ^ 2$

在这里插入图片描述
上图是波士顿房价预测的最小二乘结果

Lasso, Ridge, 与 ElasticNet等等都是这个简单等式的拓展而已，就是加一个额外的惩罚项，用来降低复杂度或者降低最终模型的特征数量。无论如何，目标都是降低测试集的错误率。

附：几个正则的公式

Lasso：使用L1-norm正则
$J(\theta)=\frac{1}{2} \sum_{i}^{m}\left(y^{(i)}-\theta^{T} x^{(i)}\right)^{2}+\lambda \sum_{j}^{n}\left|\theta_{j}\right|$

Ridge：使用L2-norm正则
$J(\theta)=\frac{1}{2} \sum_{i}^{m}\left(y^{(i)}-\theta^{T} x^{(i)}\right)^{2}+\lambda \sum_{j}^{n} \theta_{j}^{2}$

ElasticNet：结合l1-norm、l2-norm进行正则
$J(\theta)=\frac{1}{2} \sum_{i}^{m}\left(y^{(i)}-\theta^{T} x^{(i)}\right)^{2}+\lambda\left(\rho \sum_{j}^{n}\left|\theta_{j}\right|+(1-\rho) \sum_{j}^{n} \theta_{j}^{2}\right)$