几何平均回归Geometric Mean Regression——使用Python实现

最新推荐文章于 2024-11-09 10:05:25 发布

weixin_51102327

最新推荐文章于 2024-11-09 10:05:25 发布

阅读量779

点赞数

文章标签：回归 python 回归算法

原文链接：http://www.mathworks.com/matlabcentral/fileexchange/27918-gmregress

版权

本文介绍了一种用于处理两个随机且存在误差变量的回归方法——几何平均回归，并提供了从MATLAB转换到Python的实现代码示例。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

如果变量x的观测值存在误差，那么对(x,y)进行回归的时候应该用几何平均回归。以下自matlab转为Python。

Trujillo-Ortiz, A. and R. Hernandez-Walls. (2010). gmregress:
Geometric Mean Regression (Reduced Major Axis Regression).
A MATLAB file. [WWW document]. URL: http://www.mathworks.com/matlabcentral/fileexchange/27918-gmregress

#   This funtion is converted from matlab funtion Trujillo-Ortiz, A. and 
#   R. Hernandez-Walls. (2010). gmregress: 
#      Geometric Mean Regression (Reduced Major Axis Regression).  
#      A MATLAB file. [WWW document]. URL http://www.mathworks.com/
#      matlabcentral/fileexchange/27918-gmregress
#GMREGRESS Geometric Mean Regression (Reduced Major Axis Regression).
#   Model II regression should be used when the two variables in the
#   regression equation are random and subject to error, i.e. not 
#   controlled by the researcher. Model I regression using ordinary least 
#   squares underestimates the slope of the linear relationship between the
#   variables when they both contain error. According to Sokal and Rohlf 
#   (1995), the subject of Model II regression is one on which research and
#   controversy are continuing and definitive recommendations are difficult
#   to make.
#
#   GMREGRESS is a Model II procedure. It standardize variables before the 
#   slope is computed. Each of the two variables is transformed to have a 
#   mean of zero and a standard deviation of one. The resulting slope is
#   the geometric mean of the linear regression coefficient of Y on X. 
#   Ricker (1973) coined this term and gives an extensive review of Model
#   II regression. It is also known as Standard Major Axis.
#
#   B,BINTR,BINTJM = GMREGRESS(X,Y,ALPHA) returns the vector(ndarray) B of 
#   regression coefficients in the linear Model II and a matrix BINT of the
#   given confidence intervals for B by the Ricker (1973) and Jolicoeur and
#   Mosimann (1968)-McArdle (1988) procedure.
#
#   GMREGRESS treats NaNs in X or Y as missing values, and removes them.
#
#   Syntax: b,bintr,bintjm = gmregress(x,y,alpha)
#
#   Example. From the Box 14.12 (California fish cabezon [Scorpaenichthys 
#   marmoratus]) of Sokal and Rohlf (1995). The data are:
#
#   x=np.array([14,17,24,25,27,33,34,37,40,41,42],dtype='float')
#   y=np.array([61,37,65,69,54,93,87,89,100,90,97],dtype='float')
#
#   Calling on Python the function: 
#                b,bintr,bintjm = gmregress(x,y)
#
#   Answer is:
#
#   b =
#      array([12.19378495,  2.11936636])
#
#   bintr =
#      array([[-10.64446401,  35.03203392],
#            [  1.36720846,   2.87152426]]
#   
#   bintjm =
#      array([[-14.57692871,  31.09956922],
#            [  1.49672077,   3.00103657]]
#
#   Matlab Code Created by A. Trujillo-Ortiz and R. Hernandez-Walls
#             Facultad de Ciencias Marinas
#             Universidad Autonoma de Baja California
#             Apdo. Postal 453
#             Ensenada, Baja California
#             Mexico.
#             atrujo@uabc.edu.mx
#
#   Copyright (C)  June 15, 2010. 
#
#   To cite the matlab file, it would be an appropriate format:
#   Trujillo-Ortiz, A. and R. Hernandez-Walls. (2010). gmregress: 
#      Geometric Mean Regression (Reduced Major Axis Regression).  
#      A MATLAB file. [WWW document]. URL http://www.mathworks.com/
#      matlabcentral/fileexchange/27918-gmregress
#    
#   References:
#   Jolicoeur, P. and Mosimann, J. E. (1968), Intervalles de confiance pour
#              la pente de l'axe majeur d'une distribution normale bidimensionelle
#               . Biomrie-Praximrie, 9:121-140.
#   McArdle, B. (1988), The structural relationship:regression in biology.
#              Can. Jour. Zool. 66:2329-2339.
#   Ricker, W. E. (1973), Linear regression in fishery research. J. Fish.
#              Res. Board Can., 30:409-434. 
#   Sokal, R. R. and Rohlf, F. J. (1995), Biometry. The principles and
#              practice of the statistics in biologicalreserach. 3rd. ed.
#              New-York:W.H.,Freeman. [Sections 14.13 and 15.7] 
#
def gmregress(x,y,alpha=0.05):
    from scipy.stats import f as f
    from scipy.stats import t as t_
    import numpy as np
    # Check that matrix (X) and rigth hand side (Y) have compatible dimensions
    n = x.size
    if type(x) is not np.ndarray or type(y) is not np.ndarray:
        raise ValueError('Y must be a np.ndarray.')
    elif len(y) != n:
        raise ValueError('The number of rows in Y must equal the number of rows in X.')

    
    # Remove missing values, if any
    wasnan = np.isnan(y) | np.isnan(x)
 
    if np.any(wasnan):
       y = y[~wasnan]
       x = x[~wasnan]

       n = x.size
  
    
    R = np.corrcoef(x,y)
    r = R[0,1] #correlation coefficient
    s = r/abs(r) #find sign of the correlation coefficient: this former bug 
                  #was efficiently corrected thanks to the valuable suggestions
                  #given by Holger Goerlitz and Joel E. Cohen. Yes, a negative
                  #slope are always negative!
    S = np.cov(x,y)
    SCX = S[0,0]*(n-1)
    SCY = S[1,1]*(n-1)
    SCP = S[0,1]*(n-1)
    v = s*np.sqrt(SCY/SCX) #slope
    u = np.average(y)-np.average(x)*v #intercept
    b = np.array([u,v])
    
    #Ricker procedure
    SCv = SCY-(SCP**2)/SCX
    N = SCv/(n-2)
    sv = np.sqrt(N/SCX)
    t = t_.isf(alpha/2,n-2)
    vi = v-t*sv #confidence lower limit of slope
    vs = v+t*sv #confidence upper limit of slope
    ui = np.average(y)-np.average(x)*vs #confidence lower limit of intercept
    us = np.average(y)-np.average(x)*vi #confidence upper limit of intercept
    uint = np.array([ui,us])
    vint = np.array([vi,vs])
    if ui > us:
        uint = uint[::-1]
    
    if vi > vs:
        vint = vint[::-1]

    bintr = np.array([uint,vint])
    
    #Jolicoeur and Mosimann procedure
    r = R[0,1]
    F =f.isf(alpha,1,n-2)
    B = F*(1-r**2)/(n-2)
    a = np.sqrt(B+1)
    c = np.sqrt(B)
    qi = v*(a-c) #confidence lower limit of slope
    qs = v*(a+c) #confidence upper limit of slope
    pi = np.average(y)-np.average(x)*qs #confidence lower limit of intercept
    ps = np.average(y)-np.average(x)*qi #confidence upper limit of intercept
    pint = np.array([pi,ps])
    qint = np.array([qi,qs])
    if pi > ps:
        pint = pint[::-1]
    
    if qi > qs:
        qint = qint[::-1]

    bintjm = np.array([pint,qint])

    return b,bintr,bintjm