波士顿房价预测_波士顿房价

波士顿房价预测

In this post, we will perform data analysis in python on the Boston house price dataset. Before getting started it is inevitable to understand the data. So, Let’s understand the data first.

在本文中,我们将使用Python对波士顿房价数据集进行数据分析。 在开始之前,不可避免地要了解数据。 因此,让我们首先了解数据。

    - CRIM     per capita crime rate by town
- ZN proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS proportion of non-retail business acres per town
- CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- NOX nitric oxides concentration (parts per 10 million)
- RM average number of rooms per dwelling
- AGE proportion of owner-occupied units built prior to 1940
- DIS weighted distances to five Boston employment centres
- RAD index of accessibility to radial highways
- PTRATIO pupil-teacher ratio by town
- B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
- LSTAT percentage lower status of the population
- MEDV Median value of owner-occupied homes in USD 1000's
- TAX full-value property-tax rate per USD 10,000

First, let's understand what we are going to calculate in this data. We have to understand that which are dependent value and which is independent values. So, here we can see that "MEDV" is a dependent value because it contains the value of the median value of owner-occupied homes. This means that this value depends on other factors like RM, LSTAT, TAX, AGE. Because these factors are the main amongst the other which are directly proportional to the MEDV, which helps to find the price of houses.

首先,让我们了解一下我们将在此数据中计算出什么。 我们必须了解哪些是相关值,哪些是独立值。 因此,在这里我们可以看到“ MEDV”是一个从属值,因为它包含所有者拥有房屋的中位数的值。 这意味着该值取决于其他因素,例如RM,LSTAT,TAX,AGE。 因为这些因素是其他因素中的主要因素,它们与MEDV直接成正比,因此有助于寻找房屋价格。

Let's get started.....

让我们开始吧.....

1.导入我们需要安装的库 (1. Import the libraries which we needed to install)

2.调用数据集并将其转换为熊猫数据框 (2. Calling the dataset and convert it into a pandas data frame)

After calling the dataset from sklearn.datasets we have to store it into variable Boston. This dataset has 2 value factors like data and target, which are in the simple form. So, we have to convert it into a python data frame using the pandas library. Here, we convert the array into a data frame using pd.DataFrame() method.

从sklearn.datasets调用数据集后,我们必须将其存储到变量Boston中 。 该数据集具有2个简单形式的值因子,例如datatarget 。 因此,我们必须使用pandas库将其转换为python数据框。 在这里,我们使用pd.DataFrame()方法将数组转换为数据帧。

Image for post
creating a data frame
创建一个数据框

Here, you can see that data and target data frames are in separate tabular form so we have to concatenate it and make it one using concat() method.

在这里,您可以看到数据帧和目标数据帧采用单独的表格格式,因此我们必须将其连接起来,并使用concat()方法使其成为一体。

Image for post
Concatenation of a data frame
数据框的串联

Now let’s remove the unwanted columns from the data frame.

现在,让我们从数据框中删除不需要的列。

Image for post
Removing columns
移除列

3.1绘制热图以查看数据的相关性 (3.1 Plot the heatmap to see the correlation of the data)

Herewith the help of heatmap we can check the correlation on the graph. We are using the seaborn library for visualization. Have a look at the seaborn library in case you are not familiar with it. In this correlation data, we have used a seaborn heatmap.

借助热图,我们可以检查图形上的相关性。 我们正在使用seaborn库进行可视化。 如果您不熟悉Seaborn图书馆,请看一下。 在此相关数据中,我们使用了seaborn 热图

Image for post
Creating Heatmap
创建热图

3.2对图进行可视化 (3.2 Pairplot to visualize)

Here we will see every single detail of correlation on scatter and bar plot. With the help of seaborn pairplot we have plotted the data.

在这里,我们将看到散点图和条形图上每个相关的细节。 借助于seaborn pairplot,我们已经绘制了数据。

Image for post
Image for post

4.回归图 (4. Regression plot)

What we have seen earlier that can be performed on multiple values but the regression plot always has one independent value and one dependent value. So, here we plot the linear regression plot using regplot. The purple colour linear graph is between RM as independent value and MEDV as a dependent value. In the green colour graph, LSTAT values are as independent value and MEDV values are as dependent value.

我们之前看到的可以对多个值执行,但是回归图始终具有一个独立值和一个从属值。 因此,这里我们使用regplot绘制线性回归图。 紫色线性图介于RM作为独立值,而MEDV作为从属值。 在绿色图中,LSTAT值作为独立值,而MEDV值作为从属值。

Here, we have used the subplots() method of matplotlib.pyplot to plot more than one plot at a time. To show data on the regression plot we have used regplot function of a seaborn library.

在这里,我们使用了matplotlib.pyplot的subplots ()方法来一次绘制多个图。 为了表示对回归图数据,我们使用regplot一个seaborn库的功能。

Image for post
Plotting values on regression plot
在回归图上绘制值

5.创建线性模型 (5. Create the linear model)

We can create a simple model but here I created a function called “linear_model” which will calculate the linear model, prediction of the value and mean square error.

我们可以创建一个简单的模型,但是在这里我创建了一个名为“ linear_model”的函数,该函数将计算线性模型,预测值和均方误差。

Image for post
Linear model function
线性模型功能
Image for post
Getting value using the function
使用功能获得价值

From the result, we can see that mean_square_error value of LSTAT is slightly lower than RM. So we can say that whenever value provided with the LSTAT the predicted value will show more accuracy.

从结果可以看出,LSTAT的mean_square_error值略低于RM。 因此,可以说,只要LSTAT提供了值,预测值就会显示出更高的准确性。

6.让我们接受用户的输入 (6. Let’s take input from the user)

In this section, we will take values from user and predict the value of house prices.

在本节中,我们将从用户那里获取价值并预测房价的价值。

Image for post
Dynamic input
动态输入
Image for post
Output of the user values
用户值的输出

Fill free to reach me at jivaniutsav007@gmail.com.

免费填写jivaniutsav007@gmail.com与我联系。

Thank You.

谢谢。

翻译自: https://medium.com/@utsavjivani/boston-house-price-81abd86fa908

波士顿房价预测

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值