局部加权回归（Lowess）算法详解

酒酿小圆子～

已于 2024-05-15 10:33:29 修改

阅读量2k

点赞数 28

分类专栏：机器学习 & 深度学习文章标签：算法回归数据挖掘

于 2024-05-15 10:31:05 首次发布

本文链接：https://blog.csdn.net/u012856866/article/details/138897327

版权

机器学习 & 深度学习专栏收录该内容

91 篇文章 18 订阅

订阅专栏

文章目录

一、适用任务
- 1.1 预测问题
- 1.2 平滑问题
二、算法介绍
参考资料

Lowess局部加权回归算法的主要思想为：在数据集合的每一点用低维多项式拟合数据点的一个子集，并估计该点附近自变量数据点所对应的因变量值，该多项式是用加权最小二乘法来拟合；离该点越远，权重越小。

该点的回归函数值就是由这个局部多项式得到，而用于加权最小二乘回归的数据子集是由最近邻方法确定。

LOWESS (locally weighted scatterplot smoothing) ：
methods that combine multiple regression models in a k-nearest-neighbor-based meta-model. LOESS is a later generalization of LOWESS; although it is not a true acronym, it may be understood as standing for “LOcal regrESSion”

一、适用任务

1.1 预测问题

对于预测问题，回归中最简单的线性回归，是以线性的方法拟合出数据的趋势。但是对于有周期性，波动性的数据，并不能简单以线性的方式拟合，否则模型会偏差较大，而局部加权回归（lowess）能较好的处理这种问题。可以拟合出一条符合整体趋势的线，进而做预测。

1.2 平滑问题

局部加权回归（lowess）能较好的解决平滑问题。 在做数据平滑的时候，会有遇到有趋势或者季节性的数据，对于这样的数据，我们不能使用简单的均值正负3倍标准差以外做异常值剔除，需要考虑到趋势性等条件。使用局部加权回归，可以拟合一条趋势线，将该线作为基线，偏离基线距离较远的则是真正的异常值点。

实际上，局部加权回归（Lowess）主要还是处理平滑问题的多，因为预测问题，可以有更多模型做的更精确。但就平滑来说，Lowess很直观而且很有说服力。

二、算法介绍

2.1 算法思想

局部加权回归（Lowess）的大致思路是：以一个点 $x$ 为中心，向前后截取一段长度为 $f r a c frac$ 的数据，对于该段数据用权值函数 $w$ 做一个加权的线性回归，记 $(x,\hat{y})$ 为该回归线的中心值，其中 $\hat{y}$ 为拟合后曲线对应值。对于所有的 $n$ 个数据点则可以做出 $n$ 条加权回归线，每条回归线的中心值的连线则为这段数据的Lowess曲线。

2.2 参数讲解

在这个思路中，能提取出的可调参数则是：
1.长度frac，应该截取多长的作为局部处理，frac 为原数据量的比例；
2.权值函数w，使用什么样的权值函数w合适；
3.迭代次数it，在进行一次局部回归后，是否需要迭代，再次做回归；
4.delta回归间隔，是否真的每个点都需要算一次加权回归，能否隔delta距离算一次，中间没算的用插值替换即可。

2.3 代码实现

基于lowess局部加权回归对周期型波动性数据进行拟合及平滑，实现代码如下：

from statsmodels.nonparametric.smoothers_lowess import lowess

smooth_data = lowess(y, x, frac=0.5, it=3, delta=0.0)))

lowess函数中参数的具体定义如下：

def lowess(endog, exog, frac=2.0/3.0, it=3, delta=0.0, xvals=None, is_sorted=False, missing='drop', return_sorted=True):
    '''LOWESS (Locally Weighted Scatterplot Smoothing)

    A lowess function that outs smoothed estimates of endog
    at the given exog values from points (exog, endog)

    Parameters
    ----------
    endog : 1-D numpy array
        The y-values of the observed points
    exog : 1-D numpy array
        The x-values of the observed points
    frac : float
        Between 0 and 1. The fraction of the data used
        when estimating each y-value.
    it : int
        The number of residual-based reweightings
        to perform.
    delta : float
        Distance within which to use linear-interpolation
        instead of weighted regression.
    xvals: 1-D numpy array
        Values of the exogenous variable at which to evaluate the regression.
        If supplied, cannot use delta.
    is_sorted : bool
        If False (default), then the data will be sorted by exog before
        calculating lowess. If True, then it is assumed that the data is
        already sorted by exog. If xvals is specified, then it too must be
        sorted if is_sorted is True.
    missing : str
        Available options are 'none', 'drop', and 'raise'. If 'none', no nan
        checking is done. If 'drop', any observations with nans are dropped.
        If 'raise', an error is raised. Default is 'drop'.
    return_sorted : bool
        If True (default), then the returned array is sorted by exog and has
        missing (nan or infinite) observations removed.
        If False, then the returned array is in the same length and the same
        sequence of observations as the input array.

    Returns
    -------
    out : {ndarray, float}
        The returned array is two-dimensional if return_sorted is True, and
        one dimensional if return_sorted is False.
        If return_sorted is True, then a numpy array with two columns. The
        first column contains the sorted x (exog) values and the second column
        the associated estimated y (endog) values.
        If return_sorted is False, then only the fitted values are returned,
        and the observations will be in the same order as the input arrays.
        If xvals is provided, then return_sorted is ignored and the returned
        array is always one dimensional, containing the y values fitted at
        the x values provided by xvals.

    '''
    -----

在statsmodels中，你会发现：
1、权值w函数你是不可调的；
2、在用了delta 之后，插值函数你是不可调的。

参考资料

【算法】局部加权回归（Lowess）

酒酿小圆子～

关注

28
点赞
踩
26

收藏

觉得还不错? 一键收藏
0
评论
局部加权回归（Lowess）算法详解

局部加权回归（Lowess）的大致思路是：以一个点xxx为中心，向前后截取一段长度为fracfracfrac的数据，对于该段数据用权值函数www做一个加权的线性回归，记xyxy为该回归线的中心值，其中y\hat{y}y为拟合后曲线对应值。对于所有的nnn个数据点则可以做出nnn条加权回归线，每条回归线的中心值的连线则为这段数据的Lowess曲线。
复制链接

扫一扫

专栏目录