SAScode: 1比N病例对照匹配(gmatch)

本文介绍了SAS宏gmatch,用于使用贪婪算法进行1比N病例对照匹配。内容包括匹配原理、参数设置、匹配变量权重、距离计算方法等,并提供了一个示例数据集的匹配过程。通过对匹配变量的权重设定,gmatch能够实现不同匹配要求,如精确匹配或基于距离的匹配。
摘要由CSDN通过智能技术生成

 data a ;set a;
 if gdm=1 then timex=2; if gdm=0 then timex=3;run;
 data b ;set a;
 if _COL2="A1_1" or _COL2="A1_2" then output b;run;

 %gmatch(data=b,group=gdm,id=sample,
    mvars= age _COL3 batch_ad,wts=5 0 0,dmaxk=5 0 0,dmax=,transf=0,
     time=timex, dist= 2,
      ncontls=1,seedca=2022,seedco=2021,
        out=GDMcc,outnmca=non_case,outnmco=non_control,print=Y );
 

 

/*------------------------------------------------------------------*
   | The documentation and code below is supplied by HSR CodeXchange.             
   |              
   *------------------------------------------------------------------*/
                                                                                      
                                                                                      
                                                                                      
  /*------------------------------------------------------------------*
   | MACRO NAME  : gmatch
   | SHORT DESC  : Match 1 or more controls to cases using the
   |               GREEDY algorithm
   *------------------------------------------------------------------*
   | CREATED BY  : Kosanke, Jon                  (04/07/2004 16:32)
   |             : Bergstralh, Erik
   *------------------------------------------------------------------*
   | PURPOSE
   |
   | GMATCH Macro to match 1 or more controls for each of N cases
   | using the GREEDY algorithm--REPLACES GREEDY option of MATCH macro.
   | Changes:
   | --cases and controls in same dataset
   | --not mandatory to randomly pre-ort cases and controls, but recommended
   | --options to transform X's and to choose distance metric
   | --input parameters consistent with %DIST macro for optimal matching
   |
   | *******
   |
   | Macro name: %gmatch
   |
   | Authors: Jon Kosanke and Erik Bergstralh
   |
   | Date: July 23, 2003
   |       October 31, 2003...tweaked print/means based on "time" var
   |
   | Macro function:
   |
   | Matching using the GREEDY algorithm
   |
   | The purpose of this macro is to match 1 or more controls(from a total
   | of M) for each of N cases.  The controls may be matched to the cases by
   | one or more factors(X's).  The control selected for a particular
   | case(i) will be the control(j) closest to the case in terms of Dij.
   | Dij can be defined in multiple ways. Common choices are the Euclidean
   | distance and the weighted sum of the absolute differences between the
   | case and control matching factors.  I.e.,
   |
   |     Dij= SQRT [SUM { W.k*(X.ik-X.jk)**2} ],  or
   |
   |     Dij= SUM { W.k*ABS(X.ik-X.jk) },
   |
   |                                      where the sum is over the number
   |                                      of matching factors X(with index
   |                                      k) and W.k = the weight assigned
   |                                      to matching factor k and X.ik =
   |                                      the value of variable X(k) for
   |                                      subject i.
   |
   | The control(j) selected for a case(i) is the one with the smallest Dij
   | (subject to constraints DMAX and DMAXK, defined below). In the case of
   | ties, the first one encountered will be used. The higher the user-defined
   | weight, the more likely it is that the case and control will be matched
   | on the factor.  Assign large weights (relative to the other weights) to
   | obtain exact matches for two-level factors such as gender. An option to
   | using weights might be to standarize the X's in some fashion. The macro
   | has options to standardize all X's to mean 0 and variance 1 and to use
   | ranks.
   |
   | The matching algorithm used is the GREEDY method. Using the greedy method,
   | once a match is made it is never broken.  This may result in inefficiencies
   | if a previously matched control would be a better match for the current
   | case than those controls currently available. (An alternative method is to
   | do optimal matching using the VMATCH & DIST macros. This method guarantees
   | the best possible matched set in terms of minimizing the total Dij.)
   | The GREEDY method generally produces very good matches, especially if the
   | control pool is large relative to the number of cases. When  multiple
   | controls/case are desired, the algorithm first matches 1 control to all
   | cases and then proceeds to select second controls.
   |
   |
   | The gmatch macro checks for missing values of matching variables and the
   | time variable(if specified) and deletes those observations from the input
   | dataset.
   |
   | Call statement:
   |
   |
   | %gmatch(data=,group=,id=,
   |       mvars=,wts=,dmaxk=,dmax=,transf,
   |       time=, dist=,
   |       ncontls=,seedca=,seedco=,
   |       out=,outnmca=,outnmco=,print=);
   |
   | Parameter definitions(R=required parameter):
   |
   |
   |  R    data  SAS data set containing cases and potential controls. Must
   |             contain the ID, GROUP, and the matching variables.
   |
   |  R    group SAS variable defining cases. Group=1 if case, 0 if control.
   |
   |  R     id   SAS CHARACTER ID variable for the cases and controls.
   |
   |
   |  R   mvars  List of numeric matching variables common to both case and
   |             contr

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值