广义线性模型(逻辑回归、泊松回归)

线性回归模型也并不适用于所有情况,有些结果可能包含而元数据(比如正面与反面)或者计数数据,广义线性模型可用于解释这类数据,使用的仍然是自变量的线性组合。

目录

逻辑回归

使用statsmodels

 使用sklearn

泊松回归 

使用statsmodels

负二项回归


逻辑回归

当响应变量为二元数据时,常用逻辑回归对数据进行建模。

以下数据来源于pandas活用所提供的数据,如需要可在此下载https://download.csdn.net/download/qq_57099024/79301082

import pandas as pd
d=pd.read_csv('D:/pandas活用/pandas_for_everyone-master/data/acs_ny.csv')
print(d.columns)
print('@'*66)#输出特殊符号以区分两次输出
print(d.head())


'''以下为输出结果:
Index(['Acres', 'FamilyIncome', 'FamilyType', 'NumBedrooms', 'NumChildren',
       'NumPeople', 'NumRooms', 'NumUnits', 'NumVehicles', 'NumWorkers',
       'OwnRent', 'YearBuilt', 'HouseCosts', 'ElectricBill', 'FoodStamp',
       'HeatingFuel', 'Insurance', 'Language'],
      dtype='object')
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
  Acres  FamilyIncome   FamilyType  NumBedrooms  NumChildren  NumPeople  \
0  1-10           150      Married            4            1          3   
1  1-10           180  Female Head            3            2          4   
2  1-10           280  Female Head            4            0          2   
3  1-10           330  Female Head            2            1          2   
4  1-10           330    Male Head            3            1          2   

   NumRooms         NumUnits  NumVehicles  NumWorkers   OwnRent    YearBuilt  \
0         9  Single detached            1           0  Mortgage    1950-1959   
1         6  Single detached            2           0    Rented  Before 1939   
2         8  Single detached            3           1  Mortgage    2000-2004   
3         4  Single detached            1           0    Rented    1950-1959   
4         5  Single attached            1           0  Mortgage  Before 1939   

   HouseCosts  ElectricBill FoodStamp HeatingFuel  Insurance        Language  
0        1800            90        No         Gas       2500         English  
1         850            90        No         Oil          0         English  
2        2600           260        No         Oil       6600  Other European  
3        1800           140        No         Oil          0         English  
4         860           150        No         Gas        660         Spanish  '''

 以下对FamilyIncome 进行分箱操作:

d['income_15w']=pd.cut(d['FamilyIncome'],[0,150000,d['FamilyIncome'].max()],labels=[0,1])
d['income_15w']=d['income_15w'].astype(int)

使用cut分箱操作,创建二值响应变量_我就是一个小怪兽的博客-CSDN博客

使用statsmodels

import statsmodels.formula.api as smf
model=smf.logit('income_15w~HouseCosts+NumWorkers+OwnRent+NumBedrooms+FamilyType',data=d)
results=model.fit()
print(results.summary())
Optimization terminated successfully.
         Current function value: 0.391651
         Iterations 7
                           Logit Regression Results                           
==============================================================================
Dep. Variable:             income_15w   No. Observations:                22745
Model:                          Logit   Df Residuals:                    22737
Method:                           MLE   Df Model:                            7
Date:                Sat, 05 Feb 2022   Pseudo R-squ.:                  0.2078
Time:                        08:46:18   Log-Likelihood:                -8908.1
converged:                       True   LL-Null:                       -11244.
Covariance Type:            nonrobust   LLR p-value:                     0.000
===========================================================================================
                              coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------------------
Intercept                  -5.8081      0.120    -48.456      0.000      -6.043      -5.573
OwnRent[T.Outright]         1.8276      0.208      8.782      0.000       1.420       2.236
OwnRent[T.Rented]          -0.8763      0.101     -8.647      0.000      -1.075      -0.678
FamilyType[T.Male Head]     0.2874      0.150      1.913      0.056      -0.007       0.582
FamilyType[T.Married]       1.3877      0.088     15.781      0.000       1.215       1.560
HouseCosts                  0.0007   1.72e-05     42.453      0.000       0.001       0.001
NumWorkers                  0.5873      0.026     22.393      0.000       0.536       0.639
NumBedrooms                 0.2365      0.017     13.985      0.000       0.203       0.270
==================================================================================

 使用sklearn

predictors=pd.get_dummies(d[['HouseCosts','NumWorkers','OwnRent','NumBedrooms','FamilyType']],drop_first=True)
from sklearn import linear_model
lr=linear_model.LogisticRegression()
results=lr.fit(X=predictors,y=d['income_15w'])
print(results.coef_)
print('-*-'*10)
print(results.intercept_)
[[ 5.86894916e-04  7.32489391e-01  2.86764784e-01  7.17542587e-02
  -2.13282748e+00 -1.03910262e+00  2.63647146e-01]]
-*--*--*--*--*--*--*--*--*--*-
[-4.86108187]

泊松回归 

常用于计数数据分析

使用statsmodels

results=smf.poisson('NumChildren~FamilyIncome+FamilyType+OwnRent',data=d).fit()
print(results.summary())
Optimization terminated successfully.
         Current function value: nan
         Iterations 1
                          Poisson Regression Results                          
==============================================================================
Dep. Variable:            NumChildren   No. Observations:                22745
Model:                        Poisson   Df Residuals:                    22739
Method:                           MLE   Df Model:                            5
Date:                Sat, 05 Feb 2022   Pseudo R-squ.:                     nan
Time:                        09:05:28   Log-Likelihood:                    nan
converged:                       True   LL-Null:                       -30977.
Covariance Type:            nonrobust   LLR p-value:                       nan
===========================================================================================
                              coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------------------
Intercept                      nan        nan        nan        nan         nan         nan
FamilyType[T.Male Head]        nan        nan        nan        nan         nan         nan
FamilyType[T.Married]          nan        nan        nan        nan         nan         nan
OwnRent[T.Outright]            nan        nan        nan        nan         nan         nan
OwnRent[T.Rented]              nan        nan        nan        nan         nan         nan
FamilyIncome                   nan        nan        nan        nan         nan         nan
==================================================================================

负二项回归

如果泊松回归的假设不理想(例如数据过度离散),可使用负二项回归来代替

statsmodels的GLM文档列入了可以传入GLM参数的许多分布族,可在sm.familiese.<FAMILY>.links下找到连接函数::

Binomial(二项式分布)

Gamma(伽马分布)

InverseGaussian(逆高斯分布)

NegativeBinomial(负二项式分布)

Poisson(泊松分布)

Tweedie分布

import statsmodels
import statsmodels.api as sm
import statsmodels.formula.api as smf
model=smf.glm('NumChildren~FamilyIncome+FamilyType+OwnRent',data=d,family=sm.families.NegativeBinomial(sm.genmod.families.links.log))
results=model.fit()
print(results.summary())
                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:            NumChildren   No. Observations:                22745
Model:                            GLM   Df Residuals:                    22739
Model Family:        NegativeBinomial   Df Model:                            5
Link Function:                    log   Scale:                          1.0000
Method:                          IRLS   Log-Likelihood:                -29749.
Date:                Sat, 05 Feb 2022   Deviance:                       20731.
Time:                        10:06:21   Pearson chi2:                 1.77e+04
No. Iterations:                     6                                         
Covariance Type:            nonrobust                                         
===========================================================================================
                              coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------------------
Intercept                  -0.3345      0.029    -11.672      0.000      -0.391      -0.278
FamilyType[T.Male Head]    -0.0468      0.052     -0.905      0.365      -0.148       0.055
FamilyType[T.Married]       0.1529      0.029      5.200      0.000       0.095       0.211
OwnRent[T.Outright]        -1.9737      0.243     -8.113      0.000      -2.450      -1.497
OwnRent[T.Rented]           0.4164      0.030     13.754      0.000       0.357       0.476
FamilyIncome             5.398e-07   9.55e-08      5.652      0.000    3.53e-07    7.27e-07
=================================================================================
  • 3
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

爱打羽毛球的小怪兽

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值