使用Logistic回归进行统计分析和Python Statsmodels中的预测

本文通过一个示例详细介绍了如何使用Python的Statsmodels库进行Logistic回归分析,以预测二元结果变量。文章涵盖了从数据导入、模型建立到预测和可视化的过程,并讨论了赔率、对数赔率和协变量的影响。
摘要由CSDN通过智能技术生成

This article will explain a statistical modeling technique with an example. I will explain a logistic regression modeling for binary outcome variables here. That means the outcome variable can have only two values, 0 or 1. We will also analyze the correlation amongst the predictor variables (the input variables that will be used to predict the outcome variable), how to extract the useful information from the model results, the visualization techniques to better present and understand the data and prediction of the outcome. I am assuming that you have the basic knowledge of statistics and python.

本文将通过一个示例说明统计建模技术。 我将在这里解释二进制结果变量的逻辑回归建模。 这意味着结果变量只能有两个值,即0或1。我们还将分析预测变量(用于预测结果变量的输入变量)之间的相关性,以及如何从模型结果中提取有用的信息,可视化技术,以更好地呈现和理解数据以及结果预测。 我假设您具有统计和python的基本知识。

使用的工具 (The Tools Used)

For this tutorial, we will use:

在本教程中,我们将使用:

  1. Numpy Library

    脾气暴躁的图书馆

  2. Pandas Library

    熊猫图书馆

  3. Matplotlib Library

    Matplotlib库

  4. Seaborn Library

    西雅图图书馆

  5. Statsmodels Library

    统计模型库

  6. Jupyter Notebook environment.

    Jupyter Notebook环境。

数据集 (Dataset)

I used the Heart dataset from Kaggle. I have it in my GitHub repository. Please feel free download from this link if you want to follow along:

我使用了Kaggle的Heart数据集。 我在GitHub存储库中有它。 如果您想继续,请从此链接免费下载:

Let’s import the necessary packages and the dataset.

让我们导入必要的包和数据集。

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import statsmodels.api as sm
import numpy as npdf = pd.read_csv('Heart.csv')
df.head()
Image for post

The last column ‘AHD’ contains only ‘yes’ or ‘no’ which tells you if a person has heart disease or not. Replace ‘yes’ and ‘no’ with 1 and 0.

最后一列“ AHD”仅包含“是”或“否”,告诉您一个人是否患有心脏病。 将“是”和“否”分别替换为1和0。

df['AHD'] = df.AHD.replace({"No":0, "Yes": 1})
Image for post

The logistic regression model provides the odds of an event.

逻辑回归模型提供了事件的几率。

具有一个变量的基本Logistic回归 (A Basic Logistic Regression With One Variable)

Let’s dive into the modeling. I will explain each step. I suggest, keep running the code for yourself as you read to better absorb the material.

让我们深入研究建模。 我将解释每个步骤。 我建议您在阅读时继续自己运行代码,以更好地吸收材料。

Logistic regression is an improved version of linear regression. We will use a Generalized Linear Model (GLM) for this example.

Logistic回归是线性回归的改进版本。 在此示例中, 我们将使用 广义线性模型 (GLM)

There are so many variables. Which one could be that one variable? As we all know, generally heart disease occurs mostly to the older population. The younger population is less likely to get heart disease. I am taking “Age” as the only covariate. We will add more covariates later.

变量太多了。 哪一个是那个变量? 众所周知&#

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值