原文:https://blog.csdn.net/yaoqiang2011/article/details/52665396
其中有三个地方需要注意一下
1、首先是要对数据进行预处理
# -*- coding: utf-8 -*-
"""
Created on Wed Sep 19 14:16:42 2018
@author: Administrator
"""
#--*- coding:utf-8 -*--
import pandas as pd
import numpy as np
#Load data:
train = pd.read_csv('Train_nyOWmfK.csv')
test = pd.read_csv('Test_bCtAN1w.csv')
print train.shape, test.shape
print train.dtypes
"""
City variable dropped because of too many categories
DOB converted to Age | DOB dropped
EMI_Loan_Submitted_Missing created which is 1 if EMI_Loan_Submitted was missing else 0 | Original variable EMI_Loan_Submitted dropped
EmployerName dropped because of too many categories
Existing_EMI imputed with 0 (median) since only 111 values were missing
Interest_Rate_Missing created which is 1 if Interest_Rate was miss