逻辑回归实战

最新推荐文章于 2024-01-07 18:02:57 发布

吴帮吕

最新推荐文章于 2024-01-07 18:02:57 发布

阅读量703

点赞数 1

简介

　　Logistic回归是一种机器学习分类算法，用于预测分类因变量的概率。在逻辑回归中，因变量是一个二进制变量，包含编码为1（是，成功等）或0（不，失败等）的数据。换句话说，逻辑回归模型预测P（Y = 1）是X的函数。

　　数据

　　该数据集来自UCI机器学习库，它与葡萄牙银行机构的直接营销活动（电话）有关。分类目标是预测客户是否将购买定期存款（变量y）。数据集可以从这里下载或者here。

import pandas as pd
import numpy as np
from sklearn import preprocessing
import matplotlib.pyplot as plt 
plt.rc("font", size=14)
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import seaborn as sns
sns.set(style="white")
sns.set(style="whitegrid", color_codes=True)

data = pd.read_csv('bank.csv', header=0)
data = data.dropna()
print(data.shape)
print(list(data.columns))

数据集提供银行客户的信息。 它包括41,188条记录和21个字段。

变量

age (numeric)
job : type of job (categorical: “admin”, “blue-collar”, “entrepreneur”, “housemaid”, “management”, “retired”, “self-employed”, “services”, “student”, “technician”, “unemployed”, “unknown”)
marital : marital status (categorical: “divorced”, “married”, “single”, “unknown”)
education (categorical: “basic.4y”, “basic.6y”, “basic.9y”, “high.school”, “illiterate”, “professional.course”, “university.degree”, “unknown”)
default: has credit in default? (categorical: “no”, “yes”, “unknown”)
housing: has housing loan? (categorical: “no”, “yes”, “unknown”)
loan: has personal loan? (categorical: “no”, “yes”, “unknown”)
contact: contact communication type (categorical: “cellular”, “telephone”)
month: last contact month of year (categorical: “jan”, “feb”, “mar”, …, “nov”, “dec”)
day_of_week: last contact day of the week (categorical: “mon”, “tue”, “wed”, “thu”, “fri”)
duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y=’no’). The duration is not known before a call is performed, also, after the end of the call, y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model
campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted)
previous: number of contacts performed before this campaign and for this client (numeric)
poutcome: outcome of the previous marketing campaign (categorical: “failure”, “nonexistent”, “success”)
emp.var.rate: employment variation rate — (numeric)
cons.price.idx: consumer price index — (numeric)
cons.conf.idx: consumer confidence index — (numeric)
euribor3m: euribor 3 month rate — (numeric)
nr.employed: number of employees — (numeric)

　　预测变量

　　y - 客户是否订购了定期存款？（二进制：“1”表示“是”，“0”表示“否”）

　　数据集的教育列有许多类别，我们需要减少类别以获得更好的建模。教育专栏有以下几类：　
(41188, 21) ['age', 'job', 'marital', 'education', 'default', 'housing', 'loan', 'contact', 'month', 'day_of_week', 'duration', 'campaign', 'pdays', 'previous', 'poutcome', 'emp_var_rate', 'cons_price_idx', 'cons_conf_idx', 'euribor3m', 'nr_employed', 'y']

data['education'].unique()

array(['basic.4y', 'unknown', 'university.degree', 'high.school',
       'basic.9y', 'professional.course', 'basic.6y', 'illiterate'],
      dtype=object)

Let us group "basic.4y", "basic.9y" and "basic.6y" together and call them "basic".

data['education']=np.where(data['education'] =='basic.9y', 'Basic', data['education'])
data['education']=np.where(data['education'] =='basic.6y', 'Basic', data['education'])
data['education']=np.where(data['education'] =='basic.4y', 'Basic', data['education'])

如果不懂np.where函数，可以看这里。

After grouping, this is the columns

data['education'].unique()

array(['Basic', 'unknown', 'university.degree', 'high.school',
       'professional.course', 'illiterate'], dtype=object)

1.1 Data exploration¶

1.1  Data exploration¶

0    36548
1     4640
Name: y, dtype: int64

来源：1.https://towardsdatascience.com/building-a-logistic-regression-in-python-step-by-step-becd4d56c9c8

2.http://www.cnblogs.com/jin-liang/p/9534801.html

吴帮吕

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫