建立统计回归模型的基本步骤_基本回归模型

最新推荐文章于 2024-06-23 09:14:46 发布

weixin_26704853

最新推荐文章于 2024-06-23 09:14:46 发布

阅读量4.9k

点赞数 1

文章标签： python 机器学习人工智能深度学习大数据

原文链接：https://medium.com/analytics-vidhya/basic-regression-models-5153454fe62f

版权

本文详细介绍了建立统计回归模型的步骤，从数据预处理到模型选择，深入浅出地探讨了回归分析在机器学习和大数据中的应用。

摘要由CSDN通过智能技术生成

建立统计回归模型的基本步骤

Linear Regression and Regression Trees

线性回归和回归树

by Satoru Hayasaka and Rosaria Silipo, KNIME

由 悟早坂 和 罗萨丽娅Silipo， 尼米

When we talk about Machine Learning algorithms, we often think of classification problems. Indeed, the most common problems in machine learning are about classification, mainly because predicting a few classes is often easier than predicting an exact number. A less commonly used branch of data science involves numerical predictions. A family of algorithms dedicated to solving numerical prediction problems is regressions, in their basic and ensemble form. In this article, we describe two basic regression algorithms: linear regression and regression tree.

当我们谈论机器学习算法时，我们经常想到分类问题。确实，机器学习中最常见的问题是关于分类的，这主要是因为预测几个类别通常比预测准确的数字容易。数据科学中较少使用的分支涉及数值预测 。专用于解决数值预测问题的一系列算法是基本形式和整体形式的回归。在本文中，我们描述了两种基本的回归算法： 线性回归和回归树 。

数值预测问题 (The problem of numeric predictions)

An overarching goal of regression analysis is to model known numerical outcomes based on the available input features in the training set. Classic case studies are stock price prediction, demand prediction, revenue forecasting, and even anomaly detection [1]. Most forecasting and prediction problems generally require numerical outcomes.

回归分析的总体目标是根据训练集中可用的输入特征对已知的数值结果进行建模。经典案例研究包括股票价格预测 ， 需求预测 ， 收入预测 ，甚至异常检测 [1]。大多数预测和预测问题通常需要数值结果。

Many algorithms have been proposed over the years, and, among those — many regression algorithms. Two very basic classic and widely adopted regression algorithms are linear regression and regression tree. We want to explore the theory behind each one of them and their pros and cons, to better understand when it is better to use one rather than the other.

这些年来，已经提出了许多算法，其中包括许多回归算法。线性回归和回归树是两个非常基本的经典且被广泛采用的回归算法。我们希望探索其中每一个背后的理论及其优缺点，以更好地理解何时使用一种而非另一种更好。

Let’s take a toy example to run our exploration: a small dataset, two numeric features (one is the target, one is the input). The “auto-MPG” dataset from the UC Irvine Repository provides a description of 398 car types, by brand, engine measures, and chassis features. Two of these attributes sound interesting for our little experiment: Horsepower (HP) and mileage per gallon (MPG) (Figure 1). It is likely that the two attributes are related.

让我们以一个玩具示例进行探索：一个小的数据集，两个数字特征(一个是目标，一个是输入)。 UC Irvine储存库中的“ auto-MPG”数据集按品牌，发动机尺寸和底盘特征提供了398种汽车类型的描述。在我们的小实验中，其中两个属性听起来很有趣：马力(HP)和每加仑行驶里程(MPG)(图1)。这两个属性很可能是相关的。

Is it possible to build a regression model where MPG (outcome y) can be described through HP (input feature x)? The goal of the regression model is to build that function f(), so that y=f(x).

是否可以建立一个可以通过HP(输入特征x )描述MPG(结果y )的回归模型？回归模型的目标是构建该函数f() ，以便y = f(x) 。

线性回归 (Linear Regression)

There are different approaches to regression analysis. One of the most popular approaches is linear regression [2], in which we model the target variable y as a linear combination of input features x.

回归分析有不同的方法。线性回归是最流行的方法之一[2]，其中我们将目标变量y建模为输入特征