SAS中nodup(noduprec刚学的)与nodupkey你会用吗

这俩到底啥区别呢,且从实例走起,之前我脑子里记的是nodupkey按by值去重,只有有by值相同的,就保留一条,nodup去重所有,今天又查了查,大致是这样的意思,但是有新的认识,新的认识,

问题:

你们认为下例用nodup的话,ID=1的会保留几条
data MyData;
input ID var;
datalines;
1 10
1 20
1 10
2 30
2 30
2 40
3 50
3 50
3 50
; run;

*******************************************nodup**********************************************************

如果不学今天的文章,或者再查下,亲自尝试下,我认为 ID=1的会保留第一条和第三条观测,也就是保留2条,但是今天的我告诉你,会保留3条,一条也不回被eliminate,不信,你看

 

 

 看到了吧,ID=1的 are written to the new data set. why,原理如下:

  •  the Nodup Option considers entire observations. When Nodup is specified, the Sort Procedure compares the current observation to the previous observation. If the observations matches for all variables, the current observation is left out of the output data set.
  • 对于ID编号的不同处理
  • ID=1: The ID=1 group has no observations that follow each other and are an exact match. Observation 1 and 3 matches exactly. However, they do not immediately follow each other. Therefore all observations in the group are written to the new data set.
  • ID=2: The two first observations in the ID=2 group are an exact match. Consequently, when PROC SORT considers the second observation, it concludes that it matches the preceding observation exactly and leaves it out. The third observation is naturally written to the new data set.
  • ID=3: All three observations in the ID=3 by group matches exactly. Not surprisingly, only the first observation is written to the new data set.

综上就是:nodup处理时考虑的是entire observation,拿current observation 与previous(上一个)进行逐个变量比较,若二者完全匹配,则当前观测 is written to the new data set,同时他不能跨行比较(感觉这点特别不好,实际应用中我们肯定是去除所有完全重复的,哪怕你在哪一行,我是不care的)

同时也说下nodup有个synonym那就是 noduprec,我发现这个有previous的含义,以后我也用noduprec了

****************************************************nodupkey***********************************************

proc sort data=MyData nodupkey;
   by ID;
run;

 

 

 从nodupkey的日志中可以看出,它说的是6个具有重复键值的观测已删除,由此知道nodupkey与键值by相关,用英文的话就是 the Nodupkey considers only variabels in the By Statement,它只考虑by值,有重复by值的它就保留第一条

**********************一句话总结,the difference between nodup and nodupkey in proc  sort******

 the Nodupkey considers only variabels in the By Statement, the Nodup Option considers entire observations.

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值