SAS training - Lesson 1.1
- SAS培训- 1.1课
Table of Contents(目录)
1.数据位置 (Data location)
2.数据类型 (Data type)
3.读取SAS数据集 (Reading a SAS dataset)
4.日期与时间 (Date vs Datetime)
5.数据格式 (Data format)
6.日期之间的时间间隔 (Interval between dates)
7.数据类型转换 (Data type conversion)
8.局部变量 (Automatic variable)
9.比较运算符 (Comparison operators)
1.数据位置
●SAS Libraries is used to locate the dataset (SAS库用于定位数据集)
●libref
○Must be 8 characters or fewer (必须是8个字符或更少)
○Start with character (从字符开始)
●Default library = WORK (temporary library) => 默认库= WORK(临时库)
○If NO library specified, libref is work. (如果没有指定库,则libref为work)
2.数据类型
●Numeric (数值)
●Characters (字符)
●Date (日期)
○Day count = 0 (for the date of Jan 1, 1960)
○Time count = 0 (for the date of Jan 1, 1960 00:00:00)
3.读取SAS数据集
command: set (命令:设置 ) from -> set
data-step (系统的数据集)
proc 程序,步骤(process)
**创建表 customer **(数据来自表 data1.cust_info)
command: keep (命令:保持 ) select -> keep
创建customer表: (cust_name)
数据来源于表data1.cust_info 的 cust_name 字段
command:if…then…else…
**创建customer_id表: ** (cust_id, cust_gp)
表data1.cust_info的cust_id字段第1列第1个字符是 ‘A’,'Z’其中之一的数据, 并将其分组到gp1, 其他数据分组到gp2, 组字段为cust_gp
command: output (命令:输出)
创建cust_gp1表: (cust_name, cust_id)
表data1.cust_info的cust_id字段第1列第1个字符是 ‘A’,'Z’其中之一的数据
创建cust_gp2表: (cust_name, cust_id)
表data1.cust_info的cust_id字段第1列第1个字符不是 ‘A’,'Z’其中之一的数据
4.日期与时间
command:datepart (命令:函数)
创建convert表: (time, date)
time 字段 是 2021.01.01 23:59:59 与 1960年的时间差 (单位是秒)
date 字段 是 2021.01.01 与 1960年的时间差 (单位是天)
5.数据格式
6.日期之间的时间间隔
Command : INTCK/INTNX (命令: intck/intnx)
●intck function - returns the number of time units between two dates
intck函数- 返回两个日期之间的时间单位数
●intnx function - returns a SAS date that is a specified number of time units away from a specified date
intnx 函数-返回一个SAS日期,该日期与指定日期之间的距离为指定的时间单位
time units - year, month, week, day (时间单位-年,月,周,日)
7.数据类型转换
Command : PUT/INPUT (命令:PUT/输入)
●PUT function - convert numeric variable to character variable
PUT函数-将数值变量转换为字符变量
●INPUT function - convert character variable to numeric variable
输入函数-转换字符变量为数值变量
8.局部变量
Automatic variable(局部变量) (_n_)
●The N variable is commonly used to keep track of the number of times the data step has iterated (_N_变量通常用于跟踪数据步骤的迭代次数 )
9.比较运算符
Symbolic(象征)、Mnemonic(助记符)、Meaning(意义)、Example(例子)
Sas调控演示
End of Lesson 1.1
- 第1.1课结束
SAS Training - Lesson 1.2
- SAS培训-第1.2课
Table of Contents(目录)
1.SAS数据集排序 (Sorting SAS Data Sets)
2.合并SAS数据集 (Combining SAS Data Sets)
3.外部文件导入/导出 (Importing/Exporting External File)
1. SAS数据集排序
Command :
PROC SORT DATA=…;
Command :
PROC SORT DATA=…
OUT=…
NODUPKEY;
2.1 合并SAS数据集**(垂直-Vertical)**
Command :
SET
Command :
SET…BY…
步骤1: 得到排序的数据集ds1和ds2 按 customer account_no 字段排序(默认升序)
步骤2: 合并ds1和ds2数据集并按 customer account_no 字段排序(默认升序)
Command :
SET
重命名account_number为account_no
2.2 合并SAS数据集**(水平-Horizontal)**
Command :
MERGE
内连接 if a and b --> a inner join b (by --> on)
左连接 if a --> left join (by --> on)
右连接
3.1 导入外部文件- Excel
Importing External File - Excel
Command :
PROC IMPORT
●Datafile = The path of the external excel file (外部excel文件的路径)
●Dbms = The type of excel file format (excel文件格式的类型)
●Out = The output SAS dataset name (输出SAS数据集名称)
●Getnames
○Yes = To generate the variable names from the data value in the first row of import file
(从导入文件第一行的数据值生成变量名)
○No = To generate the variable names as VAR1, VAR2, and so on
(将变量名生成为VAR1、VAR2,等等)
●Mixed
○Yes = To assign the variable type as Character when mixed data types are found in the same column (当在同一列中发现混合数据类型时,将变量类型赋值为Character)
○No = To assign the variable type depending on the majority of the data type that is found (根据找到的大部分数据类型分配变量类型)
●Sheets = To identify one worksheet when the import file having more than one worksheet
(在导入文件有多个工作表时标识一个工作表)
3.2 导入外部文件- CSV
Importing External File - CSV
Command :
PROC IMPORT
●Datafile = The path of the external excel file (外部excel文件的路径)
●Dbms = The type of file format (文件格式的类型)
●Out = The output SAS dataset name (输出SAS数据集名称)
●Getnames
○Yes = To generate the variable names from the data value in the first row of import file
(从导入文件第一行的数据值生成变量名)
○No = To generate the variable names as VAR1, VAR2, and so on
(将变量名生成为VAR1、VAR2,等等)
●Guessingrows = To specifies the number of rows to determine the appropriate data type for the columns (指定行数,以确定列的适当数据类型)
○Default (20) = Import procedure scans the first 20 rows to determine the data type
(导入过程扫描前20行以确定数据类型)
3.3 导出外部文件
Exporting External File
Command :
PROC EXPORT
●Data = Output SAS dataset name (输出SAS数据集名称)
●Outfile = The output path of the export file (导出文件的输出路径)
●Dbms = The type of file format (文件格式的类型)
●Replace = To overwrite the exported excel file (覆盖导出的excel文件)
●Sheets = To identify the worksheet in the exported excel file (在导出的excel文件中识别工作表)
End of Lesson 1.2
Tutorial
1. Import customer_base.xlsx and account_base.xlsx and create customer_base.sas7bdat and account_base.sas7bdat respectively
(分别创建 customer_base.sas7bdat表和 account_base.sas7bdat表,并分别导入customer_base.xlsx和account_base.xlsx的数据)
2. Classify the customer by cust_age from customer_base.sas7bdat
(根据customer_base.sas7bdat中的cust_age对客户进行分类)
cust_age age_gp
18-30 1
31-50 2
51-65 3
>65 4
3. Select customer with SAV account and age_gp=2/3
(选择有SAV账户和age_gp=2/3的客户)