逻辑回归(Logistic+Regression)经典实例

机器学习算法完整版见fenghaootong-github

房价预测

数据集描述

数据共有81个特征

SalePrice - the property’s sale price in dollars. This is the target variable that you’re trying to predict.
MSSubClass: The building class
MSZoning: The general zoning classification
LotFrontage: Linear feet of street connected to property
LotArea: Lot size in square feet
Street: Type of road access
Alley: Type of alley access
LotShape: General shape of property
LandContour: Flatness of the property
Utilities: Type of utilities available
LotConfig: Lot configuration
LandSlope: Slope of property
….

导入所需模块

import numpy as np
import pandas as pd 

import matplotlib.pyplot as plt
import seaborn as sns
import math as mat

from scipy import stats
from scipy.stats import norm
from sklearn import preprocessing

import statsmodels.api as sm
from patsy import dmatrices

import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

import sklearn.linear_model as LinReg
import sklearn.metrics as metrics

导入数据

#loading the data 
data_train = pd.read_csv('../DATA/SalePrice_train.csv')
data_test = pd.read_csv('../DATA/SalePrice_test.csv')

数据共有81个特征,为了便于说明只挑选7个特征
OverallQual
GrLivArea
GarageCars
TotalBsmtSF
1stFlrSF
FullBath
YearBuilt
因为这些数据与房子的售卖价格相关性比较大

具体如何选择特征,见数据清理

数据预处理

data_train.shape
(1460, 81)  
vars = ['OverallQual', 'GrLivArea', 'GarageCars', 'TotalBsmtSF', 'FullBath','YearBuilt']
Y = data_train[['SalePrice']] #dim (1460, 1)
ID_train = data_train[['Id']] #dim (1460
  • 3
    点赞
  • 33
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值