波士顿房价预测_波士顿房价

最新推荐文章于 2024-07-22 10:36:53 发布

李_涛

最新推荐文章于 2024-07-22 10:36:53 发布

阅读量335

点赞数

文章标签： python

原文链接：https://medium.com/@utsavjivani/boston-house-price-81abd86fa908

版权

波士顿房价预测

In this post, we will perform data analysis in python on the Boston house price dataset. Before getting started it is inevitable to understand the data. So, Let’s understand the data first.

在本文中，我们将使用Python对波士顿房价数据集进行数据分析。在开始之前，不可避免地要了解数据。因此，让我们首先了解数据。

    - CRIM     per capita crime rate by town
    - ZN       proportion of residential land zoned for lots over  25,000 sq.ft.
    - INDUS    proportion of non-retail business acres per town
    - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
    - NOX      nitric oxides concentration (parts per 10 million)
    - RM       average number of rooms per dwelling
    - AGE      proportion of owner-occupied units built prior to 1940
    - DIS      weighted distances to five Boston employment centres
    - RAD      index of accessibility to radial highways
    - PTRATIO  pupil-teacher ratio by town
    - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
    - LSTAT    percentage lower status of the population
    - MEDV     Median value of owner-occupied homes in USD 1000's
    - TAX      full-value property-tax rate per USD 10,000

First, let's understand what we are going to calculate in this data. We have to understand that which are dependent value and which is independent values. So, here we can see that "MEDV" is a dependent value because it contains the value of the median value of owner-occupied homes. This means that this value depends on other factors like RM, LSTAT, TAX, AGE. Because these factors are the main amongst the other which are directly proportional to the MEDV, which helps to find the price of houses.

首先，让我们了解一下我们将在此数据中计算出什么。我们必须了解哪些是相关值，哪些是独立值。因此，在这里我们可以看到“ MEDV”是一个从属值，因为它包含所有者拥有房屋的中位数的值。这意味着该值取决于其他因素，例如RM，LSTAT，TAX，AGE。因为这些因素是其他因素中的主要因素，它们与MEDV直接成正比，因此有助于寻找房屋价格。

Let's get started.....

让我们开始吧.....

1.导入我们需要安装的库 (1. Import the libraries which we needed to install)

2.调用数据集并将其转换为熊猫数据框 (2. Calling the dataset and convert it into a pandas data frame)

After calling the dataset from sklearn.datasets we have to store it into variable Boston. This dataset has 2 value factors like data and target, which are in the simple form. So, we have to convert it into a python data frame using the pandas library. Here, we convert the array into a data frame using pd.DataFrame() method.

从sklearn.datasets调用数据集后，我们必须将其存储到变量Boston中 。该数据集具有2个简单形式的值因子，例如data和target 。因此，我们必须使用pandas库将其转换为python数据框。在这里，我们使用pd.DataFrame()方法将数组转换为数据帧。

Here, you can see that data and target data frames are in separate tabular form so we have to concatenate it and make it one using concat() method.

在这里，您可以看到数据帧和目标数据帧采用单独的表格格式，因此我们必须将其连接起来，并使用concat()方法使其成为一体。

Now let’s remove the unwanted columns from the data frame.

现在，让我们从数据框中删除不需要的列。

3.1绘制热图以查看数据的相关性 (3.1 Plot the heatmap to see the correlation of the data)

Herewith the help of heatmap we can check the correlation on the graph. We are using the seaborn library for visualization. Have a look at the seaborn library in case you are not familiar with it. In this correlation data, we have used a seaborn heatmap.

借助热图，我们可以检查图形上的相关性。我们正在使用seaborn库进行可视化。如果您不熟悉Seaborn图书馆，请看一下。在此相关数据中，我们使用了seaborn 热图。

3.2对图进行可视化 (3.2 Pairplot to visualize)

Here we will see every single detail of correlation on scatter and bar plot. With the help of seaborn pairplot we have plotted the data.

在这里，我们将看到散点图和条形图上每个相关的细节。借助于seaborn pairplot，我们已经绘制了数据。

4.回归图 (4. Regression plot)

What we have seen earlier that can be performed on multiple values but the regression plot always has one independent value and one dependent value. So, here we plot the linear regression plot using regplot. The purple colour linear graph is between RM as independent value and MEDV as a dependent value. In the green colour graph, LSTAT values are as independent value and MEDV values are as dependent value.

我们之前看到的可以对多个值执行，但是回归图始终具有一个独立值和一个从属值。因此，这里我们使用regplot绘制线性回归图。紫色线性图介于RM作为独立值，而MEDV作为从属值。在绿色图中，LSTAT值作为独立值，而MEDV值作为从属值。

Here, we have used the subplots() method of matplotlib.pyplot to plot more than one plot at a time. To show data on the regression plot we have used regplot function of a seaborn library.

在这里，我们使用了matplotlib.pyplot的subplots ()方法来一次绘制多个图。为了表示对回归图数据，我们使用regplot一个seaborn库的功能。

5.创建线性模型 (5. Create the linear model)

We can create a simple model but here I created a function called “linear_model” which will calculate the linear model, prediction of the value and mean square error.

我们可以创建一个简单的模型，但是在这里我创建了一个名为“ linear_model”的函数，该函数将计算线性模型，预测值和均方误差。

From the result, we can see that mean_square_error value of LSTAT is slightly lower than RM. So we can say that whenever value provided with the LSTAT the predicted value will show more accuracy.

从结果可以看出，LSTAT的mean_square_error值略低于RM。因此，可以说，只要LSTAT提供了值，预测值就会显示出更高的准确性。

6.让我们接受用户的输入 (6. Let’s take input from the user)

In this section, we will take values from user and predict the value of house prices.

在本节中，我们将从用户那里获取价值并预测房价的价值。

Fill free to reach me at jivaniutsav007@gmail.com.

免费填写jivaniutsav007@gmail.com与我联系。

Thank You.

谢谢。

翻译自: https://medium.com/@utsavjivani/boston-house-price-81abd86fa908

波士顿房价预测

李_涛

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
波士顿房价预测_波士顿房价

波士顿房价预测In this post, we will perform data analysis in python on the Boston house price dataset. Before getting started it is inevitable to understand the data. So, Let’s understand the data first. 在本文...
复制链接

扫一扫