Combining DATA Sets

One-to-one reading

  • 一对一读取
    -all the variable : the same variable names, the last data set overwrite the earlier data set
    -number of observations in the smallest original data set
DATA output-SAS-data-set;          
          SET SAS-data-set-1;               
          SET SAS-data-set-2;   
          ...            
RUN; 

Concatenating

-连接
-any common variable must have the same type attribute
-length attribute: Concat takes the length from the first data set
-the same is true for the label, format and informat attributes(the same as length attribute)

DATA output-SAS-data-set;            
          SET SAS-data-set-1 SAS-data-set-2 ...;                 
RUN;
 
data work.combined(drop=rechr);
   set sasuser.stress98
       sasuser.stress99(drop=timemin timesec);
   if resthr<72;
run;

Interleaving

-交叉

     data interlv;
        set c d;   *input data set must be **sorted** or **indexed in ascending order** based on the BY variable(s);
        by num;
     run;

Match-Merging

  • MERGE中的数据集必须:1.按照BY变量排序 or 合适的索引?;2.BY变量需要有相同的类型type
  • 数据集1有的BY变量值,2无的,则出现相应缺失值
DATA output-SAS-data-set;            
       MERGE SAS-data-set-1 SAS-data-set-2;
       BY <DESCENDING> variable(s);             
RUN; 

     proc sort data=clinic.demog;
        by descending id;
     run;
     proc sort data=clinic.visit;
        by descending id;
     run; *先排序;
     data clinic.merged;
        merge clinic.demog clinic.visit;
        by descending id; *BY语句可一对多,无则一对一;	
     run;
     proc print data=clinic.merged;
     run;
  • Match-Merging Processing
  • compilation phase 编译阶段
  • execution phase
  • details
  • how the DATA step sets up the new output data set
  • If variables that have the same name appear in more than one data set, the variable from the first data set that contains the variable (in the order listed in the MERGE statement) determines the length of the variable.
  • (However, the value of the same variables is the value in the last data set that contains it.)
  • what happens when variables in different data sets have the same name
  • how the DATA step matches observations in input data sets
  • what happens when observations don’t match
  • how missing values are handled.

Renaming Variables

  • to prevent overwriting
     data clinic.merged;       
        merge clinic.demog(**rename=(date=BirthDate name=IDname)**)
              clinic.visit(rename=(**date=VisitDate**)); 
        by id;
     run;
     proc print data=clinic.merged;
     run;

Excluding Unmatched Observations

  • 将不匹配的观测排除
  • (IN = variable) *the IN= data set option to create and name a variable that indicates whether the data set contributed data to the current observation
  • IF statement *In the DATA step below, the subsetting IF statement checks the values of indemog and invisit and continues processing only those observations that meet the condition of the expression
data sasuser.merged;
   merge work.adsort
         (rename=(date=AdmitDate) **in=inad**)
         work.strsort
         (rename=(date=VisitDate) **in=instr**);
   **if inad and instr**;
   by id;
run;
proc print data=sasuser.merged;
run;

Selecting Variables

  • ** DROP = variable1 variable2 …** and ** KEEP=**
     data clinic.merged(**drop=id**); *仍然读取id,并且可以在data步中利用id 变量处理;区别merge语句中的drop;
        merge clinic.demog(in=indemog
                           rename=(date=BirthDate))
              clinic.visit(**drop=weight** in=invisit
                           rename=(date=VisitDate));
        by id;
        if indemog and invisit;
     run;
     proc print data=clinic.merged;
     run;
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值