pandas 线性回归_使用Python,pandas和statsmodels通过线性回归预测房价

本文介绍了如何使用Python的pandas和statsmodels库构建线性回归模型来预测房价。从理解回归的基本概念,选择变量,读取数据,到分析普通最小二乘(OLS)假设,构建简单和多元线性回归模型,再到评估模型质量,最后讨论了模型可能存在的问题和陷阱。通过实例展示了线性回归在预测经济活动导致的房价中的应用。
摘要由CSDN通过智能技术生成

pandas 线性回归

This post was originally published here

这篇文章最初发表在这里

rel="stylesheet" type="text/css" href="/wp-content/themes/colormag-child/css/tim-dobbins-style.css">

rel="stylesheet" type="text/css" href="/wp-content/themes/colormag-child/css/tim-dobbins-style.css">

In this post, we’ll walk through building linear regression models to predict housing prices resulting from economic activity. Topics covered will include:

在本文中,我们将逐步构建线性回归模型,以预测经济活动导致的房价。 涵盖的主题将包括:

Future posts will cover related topics such as exploratory analysis, regression diagnostics, and advanced regression modeling, but I wanted to jump right in so readers could get their hands dirty with data.

未来的文章将涵盖相关主题,例如探索性分析,回归诊断和高级回归建模,但是我想跳进去,以便读者可以轻松掌握数据。

什么是回归?   (What is Regression? )

Linear regression is a model that predicts a relationship of direct proportionality between the dependent variable (plotted on the vertical or Y axis) and the predictor variables (plotted on the X axis) that produces a straight line, like so:

线性回归是一个模型,该模型可预测因变量(绘制在垂直或Y轴上)与预测变量(绘制在X轴上)之间的直接比例关系,该变量会产生一条直线,如下所示:

linear regression

Linear regression will be discussed in greater detail as we move through the modeling process.

在建模过程中,将更详细地讨论线性回归。

变量选择   (Variable Selection )

For our dependent variable we’ll use housing_price_index (HPI), which measures price changes of residential housing.

对于我们的因变量,我们将使用housing_price_index (HPI)来衡量住宅价格的变化。

For our predictor variables, we use our intuition to select drivers of macro- (or “big picture”) economic activity, such as unemployment, interest rates, and gross domestic product (total productivity). For an explanation of our variables, including assumptions about how they impact housing prices, and all the sources of data used in this post, see here.

对于我们的预测变量,我们使用直觉来选择宏观(或“全局”)经济活动的驱动力,例如失业率,利率和国内生产总值(总生产率)。 有关我们变量的解释,包括关于变量如何影响房价的假设以及本文中使用的所有数据来源,请参见此处

熊猫读数据   (Reading in the Data with pandas )

Once we’ve downloaded the data, read it in using pandas’ read_csv method.

下载完数据后,请使用pandas的read_csv方法读取数据。

import pandas as pd
# read in from csv using pd.read_csv
# be sure to use the file path where you saved the data
housing_price_index = pd.read_csv('/Users/tdobbins/Downloads/hpi/monthly-hpi.csv')
unemployment = pd.read_csv('/Users/tdobbins/Downloads/hpi/unemployment.csv')
federal_funds_rate = pd.read_csv('/Users/tdobbins/Downloads/hpi/fed_funds.csv')
shiller = pd.read_csv('/Users/tdobbins/Downloads/hpi/shiller.csv')
gross_domestic_product = pd.read_csv('/Users/tdobbins/Downloads/hpi/gdp.csv')import pandas as pd
# read in from csv using pd.read_csv
# be sure to use the file path where you saved the data
housing_price_index = pd.read_csv('/Users/tdobbins/Downloads/hpi/monthly-hpi.csv')
unemployment = pd.read_csv('/Users/tdobbins/Downloads/hpi/unemployment.csv')
federal_funds_rate = pd.read_csv('/Users/tdobbins/Downloads/hpi/fed_funds.csv')
shiller = pd.read_csv('/Users/tdobbins/Downloads/hpi/shiller.csv')
gross_domestic_product = pd.read_csv('/Users/tdobbins/Downloads/hpi/gdp.csv') 

Once we have the data, invoke pandas’ merge method to join the data together in a single dataframe for analysis. Some data is reported monthly, others are reported quarterly. No worries. We merge the dataframes on a certain column so each row is in its logical place for measurement purposes. In this example, the best column to merge on is the date column. See below.

有了数据后,调用pandas的merge方法将数据merge到单个数据框中进行分析。 一些数据每月报告一次,其他数据每季度报告一次。 别担心。 我们将数据帧合并到某一列上,以便每一行都位于其逻辑位置以进行测量。 在此示例中,要合并的最佳列是日期列。 见下文。

Let’s get a quick look at our variables with pandas’ head method. The headers in bold text represent the date and the variables we’ll test for our model. Each row represents a different time period.

让我们用pandas的head方法快速查看我们的变量。 粗体文本标题表示日期和我们将为模型测试的变量。 每行代表一个不同的时间段。

Out[23]:
出[23]:
date 日期 sp500 sp500 consumer_price_index 消费者价格指数 long_interest_rate long_interest_rate housing_price_index housing_price_index total_unemployed 共有失业 more_than_15_weeks 超过15周 not_in_labor_searched_for_work not_in_labor_searched_for_work multi_jobs 多职位 leavers 离开者 losers 失败者 federal_funds_rate Federal_funds_rate total_expenditures 支出总额 labor_force_pr labor_force_pr producer_price_index 生产者价格指数 gross_domestic_product 国内生产总值
0 0 2011-01-01
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值