数据挖掘 NO.3 sklearn

1.特征工程的作用决定上限!
梯度下降算法 需要对数据进行标准化
自己做的代码:

import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from sklearn import preprocessing



train=pd.read_csv("C:\\Users\\Administrator.SC-201903262346\\Desktop\\train_set - 22.csv")
test=pd.read_csv("C:\\Users\\Administrator.SC-201903262346\\Desktop\\test_set -11.csv")

train=train.replace({"job":{"unknown":"blue-collar"}})
test=test.replace({"job":{"unknown":"blue-collar"}})

train.loc[train["age"]<40,"age_code"]= 0
train.loc[((train["age"]>=40) | (train["age"]<60)) ,"age_code"]= 1
train.loc[train["age"]>=60 ,"age_code"]= 2

test.loc[test["age"]<40,"age_code"]= 0
test.loc[((test["age"]>=40) | (test["age"]<60)) ,"age_code"]= 1
test.loc[test["age"]>=60 ,"age_code"]= 2

train.drop("age",axis=1,inplace=True)
test.drop("age",axis=1,inplace=True)

train.drop("ID",axis=1,inplace=True)
test.drop("ID",axis=1,inplace=True)

train.drop(["day","month"],axis=1,inplace=True)
test.drop(["day","month"],axis=1,inplace=True)

le=preprocessing.LabelEncoder()
le_job=le.fit(train["job"])
job_label=le_job.transform(train["job"])

job_label=pd.Series(job_label)
train["job_label"]=job_label

le_job_test=le.fit(test["job"])
job_label_test=le_job.transform(test["job"])
job_label=pd.Series(job_label_test)
test["job_label"]=job_label

train=pd.get_dummies(train,columns=["job_label","marital","education","default","housing","contact","poutcome"])
test=pd.get_dummies(test,columns=["job_label","marital","education","default","housing","contact","poutcome"])

train=pd.get_dummies(train,"loan")
test=pd.get_dummies(test,"loan")

x=train.drop("y",axis=1)
y=train["y"]

SS=preprocessing.StandardScaler ()
SS.fit(train[["balance","duration","campaign","pdays","previous"]])
train[["balance","duration","campaign","pdays","previous"]]=SS.transform(train[["balance","duration","campaign","pdays","previous"]])

lr=LogisticRegression()
X_train,X_test,y_train,y_test=train_test_split(x,y,test_size=0.3)

lr.fit(X_train,y_train)

y_pre=lr.predict(X_test)

f1_score(y_test,y_pre)

y=lr.predict(test)

a=pd.Series(y)



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值