Kaggle-ML-How Models Work(1)

机器学习模型是怎么工作的


Introduction

我们从机器学习模型如何工作以及怎么使用的概述开始。这可能会比较基础,如果你之前有过统计学建模或者机器学习经验。不用担心,不久后我们会向建造一个强有力的模型发展。

本课程将为你构建以下场景的模型:

你的堂兄已经投入几百美元用于炒房。由于你对数据科学感兴趣,他愿意跟你成为业务合作伙伴。他将提供资金,而你来提供能够预测各种各样房子的价格的模型。

你问你的堂兄他以前是怎么预测真实的房地产价格的。然后他说只是凭直觉。但是随着更多的质疑表明,他已经确定了他过去看过的房屋的价格模式,他利用这些模式对他正在考虑的新房进行预测。

机器学习也是同样的工作原理。我们将从一个决策树的模型开始。当然还有一些更高级的模型能够提供更准确的预测结果。但是决策树比较容易理解,而且它是数据科学中一些最好模型的基础模块。

为了方便理解,我们将从最简单的决策树开始。

First Decision Trees

它把房子分成了两类。所考虑的任何房屋的预测价格是同一类别房屋历史价格的平均。

我们使用数据来决定如何将房屋分成两组,然后再确定每组的预测价格。从数据中获取模式的这一步骤称为拟合训练模型。用于拟合模型的数据称为训练数据

关于模型是怎么训练的细节(例如怎么分割数据)非常复杂,我们把它留到后面再讲。在模型训练完后,你可以将它应用到新数据上,从而来预测其他房子的价格。


Improving the Decision Tree

下面两个决策树哪个更像真实房地产训练数据拟合出来的结果?

First Decision Trees

左边的决策树 (Decision Tree 1)可能更有意义,因为它抓住了这样一个事实,即拥有更多卧室的房屋往往比卧室更少的房屋以更高的价格出售。这个模型的最大缺点就是它没有获取更多影响房价的因素,比如浴室的数目,房间大小,地理位置,等等。

你可以使用一棵有更多划分的树来获取更多影响因子。这些被称为更深的树。一棵决策树也会考虑每个房子的大小,比如像这样:

Depth 2 Tree

你可以通过在决策树中进行循迹来预测任何房屋的价格,只需始终选择与该房屋特征相对应的路径。房子的预测价格在树的末端。我们进行预测的末端节点称之为叶子节点(leaf)。

怎么分割以及叶子节点上的值由数据决定,所以是时候查看你将要使用的数据了。

Continue

检查你的数据.

数据下载

数据说明

  • train.csv - the training set
  • test.csv - the test set
  • sample_submission.csv - a benchmark submission from a linear regression on year and month of sale, lot square footage, and number of bedrooms
  • SalePrice - the property’s sale price in dollars. This is the target variable that you’re trying to predict.
  • MSSubClass: The building class
  • MSZoning: The general zoning classification
  • LotFrontage: Linear feet of street connected to property
  • LotArea: Lot size in square feet
  • Street: Type of road access
  • Alley: Type of alley access
  • LotShape: General shape of property
  • LandContour: Flatness of the property
  • Utilities: Type of utilities available
  • LotConfig: Lot configuration
  • LandSlope: Slope of property
  • Neighborhood: Physical locations within Ames city limits
  • Condition1: Proximity to main road or railroad
  • Condition2: Proximity to main road or railroad (if a second is present)
  • BldgType: Type of dwelling
  • HouseStyle: Style of dwelling
  • OverallQual: Overall material and finish quality
  • OverallCond: Overall condition rating
  • YearBuilt: Original construction date
  • YearRemodAdd: Remodel date
  • RoofStyle: Type of roof
  • RoofMatl: Roof material
  • Exterior1st: Exterior covering on house
  • Exterior2nd: Exterior covering on house (if more than one material)
  • MasVnrType: Masonry veneer type
  • MasVnrArea: Masonry veneer area in square feet
  • ExterQual: Exterior material quality
  • ExterCond: Present condition of the material on the exterior
  • Foundation: Type of foundation
  • BsmtQual: Height of the basement
  • BsmtCond: General condition of the basement
  • BsmtExposure: Walkout or garden level basement walls
  • BsmtFinType1: Quality of basement finished area
  • BsmtFinSF1: Type 1 finished square feet
  • BsmtFinType2: Quality of second finished area (if present)
  • BsmtFinSF2: Type 2 finished square feet
  • BsmtUnfSF: Unfinished square feet of basement area
  • TotalBsmtSF: Total square feet of basement area
  • Heating: Type of heating
  • HeatingQC: Heating quality and condition
  • CentralAir: Central air conditioning
  • Electrical: Electrical system
  • 1stFlrSF: First Floor square feet
  • 2ndFlrSF: Second floor square feet
  • LowQualFinSF: Low quality finished square feet (all floors)
  • GrLivArea: Above grade (ground) living area square feet
  • BsmtFullBath: Basement full bathrooms
  • BsmtHalfBath: Basement half bathrooms
  • FullBath: Full bathrooms above grade
  • HalfBath: Half baths above grade
  • Bedroom: Number of bedrooms above basement level
  • Kitchen: Number of kitchens
  • KitchenQual: Kitchen quality
  • TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)
  • Functional: Home functionality rating
  • Fireplaces: Number of fireplaces
  • FireplaceQu: Fireplace quality
  • GarageType: Garage location
  • GarageYrBlt: Year garage was built
  • GarageFinish: Interior finish of the garage
  • GarageCars: Size of garage in car capacity
  • GarageArea: Size of garage in square feet
  • GarageQual: Garage quality
  • GarageCond: Garage condition
  • PavedDrive: Paved driveway
  • WoodDeckSF: Wood deck area in square feet
  • OpenPorchSF: Open porch area in square feet
  • EnclosedPorch: Enclosed porch area in square feet
  • 3SsnPorch: Three season porch area in square feet
  • ScreenPorch: Screen porch area in square feet
  • PoolArea: Pool area in square feet
  • PoolQC: Pool quality
  • Fence: Fence quality
  • MiscFeature: Miscellaneous feature not covered in other categories
  • MiscVal: $Value of miscellaneous feature
  • MoSold: Month Sold
  • YrSold: Year Sold
  • SaleType: Type of sale
  • SaleCondition: Condition of sale

原课程链接
Machine Learning Course Home Page.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值