建立统计回归模型的基本步骤_基本回归模型

建立统计回归模型的基本步骤

Linear Regression and Regression Trees

线性回归和回归树

by Satoru Hayasaka and Rosaria Silipo, KNIME

悟早坂 罗萨丽娅Silipo, 尼米

When we talk about Machine Learning algorithms, we often think of classification problems. Indeed, the most common problems in machine learning are about classification, mainly because predicting a few classes is often easier than predicting an exact number. A less commonly used branch of data science involves numerical predictions. A family of algorithms dedicated to solving numerical prediction problems is regressions, in their basic and ensemble form. In this article, we describe two basic regression algorithms: linear regression and regression tree.

当我们谈论机器学习算法时,我们经常想到分类问题。 确实,机器学习中最常见的问题是关于分类的,这主要是因为预测几个类别通常比预测准确的数字容易。 数据科学中较少使用的分支涉及数值预测 。 专用于解决数值预测问题的一系列算法是基本形式和整体形式的回归 。 在本文中,我们描述了两种基本的回归算法: 线性回归回归树

数值预测问题 (The problem of numeric predictions)

An overarching goal of regression analysis is to model known numerical outcomes based on the available input features in the training set. Classic case studies are stock price prediction, demand prediction, revenue forecasting, and even anomaly detection [1]. Most forecasting and prediction problems generally require numerical outcomes.

回归分析的总体目标是根据训练集中可用的输入特征对已知的数值结果进行建模。 经典案例研究包括股票价格预测需求预测收入预测 ,甚至异常检测 [1]。 大多数预测和预测问题通常需要数值结果。

Many algorithms have been proposed over the years, and, among those — many regression algorithms. Two very basic classic and widely adopted regression algorithms are linear regression and regression tree. We want to explore the theory behind each one of them and their pros and cons, to better understand when it is better to use one rather than the other.

这些年来,已经提出了许多算法,其中包括许多回归算法。 线性回归和回归树是两个非常基本的经典且被广泛采用的回归算法。 我们希望探索其中每一个背后的理论及其优缺点,以更好地理解何时使用一种而非另一种更好。

Let’s take a toy example to run our exploration: a small dataset, two numeric features (one is the target, one is the input). The “auto-MPG” dataset from the UC Irvine Repository provides a description of 398 car types, by brand, engine measures, and chassis features. Two of these attributes sound interesting for our little experiment: Horsepower (HP) and mileage per gallon (MPG) (Figure 1). It is likely that the two attributes are related.

让我们以一个玩具示例进行探索:一个小的数据集,两个数字特征(一个是目标,一个是输入)。 UC Irvine储存库中的“ auto-MPG”数据集按品牌,发动机尺寸和底盘特征提供了398种汽车类型的描述。 在我们的小实验中,其中两个属性听起来很有趣:马力(HP)和每加仑行驶里程(MPG)(图1)。 这两个属性很可能是相关的。

Is it possible to build a regression model where MPG (outcome y) can be described through HP (input feature x)? The goal of the regression model is to build that function f(), so that y=f(x).

是否可以建立一个可以通过HP(输入特征x )描述MPG(结果y )的回归模型? 回归模型的目标是构建该函数f() ,以便y = f(x)

线性回归 (Linear Regression)

There are different approaches to regression analysis. One of the most popular approaches is linear regression [2], in which we model the target variable y as a linear combination of input features x.

回归分析有不同的方法。 线性回归是最流行的方法之一[2],其中我们将目标变量y建模为输入特征

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值