一、Logisitic Regression模型
1. 取数据
import pandas as pd
df = pd.read_csv('/Users/cy_ariel/Downloads/adultTest.csv')
2. 数据处理
查看class字段取值
df['class'].value_counts()
<=50K 24720 >50K 7841 Name: class, dtype: int64更改class为0、1并去除
df.loc[df['class'] == "<=50K",'target'] = 0
df.loc[df['class'] != "<=50K",'target'] = 1查看数据类型
df.dtypes
将object类型转为int,并去除class列
df1 = pd.get_dummies(data=df,columns=['workclass','education','marital-status','occupation','relationship','race','sex','native-country'])
df1.drop("class",inplace=True,axis=1)
3. 获取训练集测试集
获取X、Y
xdata = df1.drop('target',axis=1)
ydata = df1['target']区分训练集和数据集
from sklearn.cross_validation import train_test_split
xtrain,xtest,ytrain,ytest = train_test_split(xdata,ydata,test_size=0.3)
4. 建模
建LR模型
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(C=2.0,max_iter=1)
训练模型
lr.fit(xtrain,ytrain)
应用模型
lr.predict(xtest)
模型得分
lr.score(xtest,ytest)