Datacastle训练赛:客户流失判断
1.目的:建立分类模型,判断企业客户是否会流失
2.具体过程:
2.1 数据分析
2.2.1导入相关的包,并读取数据
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
from lightgbm import LGBMClassifier
train = pd.read_csv('C:/Users/Administrator/Desktop/秋招项目/project2/customer_churn_judgment/train.csv')
test = pd.read_csv('C:/Users/Administrator/Desktop/秋招项目/project2/customer_churn_judgment/test_noLabel.csv')
sub = pd.read_csv('C:/Users/Administrator/Desktop/秋招项目/project2/customer_churn_judgment/submit_example.csv')
sample_no_label = pd.read_csv('C:/Users/Administrator/Desktop/秋招项目/project2/customer_churn_judgment/samples_noLabel.csv')
2.2.2 了解数据大致情况
查看数据规模,训练数据集有5227条记录,15个特征1个标签。测试集数据有1037条数据,15个特征。
train.shape,test.shape
查看数据的数据类型,数据缺失值情况。从中看出数据并无缺失值,数据类型有int、object、float
train.info()
test.info()