做数据分析的两大利器:python和R语言,这里介绍一个我用python学习的案例
第一步,设置工作目录
#encoding:utf8
import os
os.chdir("G:\\Anaconda3\\Scripts\\lecture01\\Feature_engineering_and_model_tuning\\Feature-engineering_and_Parameter_Tuning_XGBoost")
第二步,加载包
import pandas as pd
import numpy as np
%matplotlib inline
第三步,载入数据
#载入数据:
train = pd.read_csv('Train.csv',encoding = "ISO-8859-1")
test = pd.read_csv('Test.csv',encoding = "ISO-8859-1")
第四步,查看数据
- 维数
train.shape, test.shape
((87020, 26), (37717, 24))
- 数据类型
#看看数据的基本情况
train.dtypes
ID object
Gender object
City object
Monthly_Income int64
DOB object
Lead_Creation_Date object
Loan_Amount_Applied float64
Loan_Tenure_Applied float64
Existing_EMI float64
Employer_Name object
Salary_Account object
Mobile_Verifie