SAS statistics by example( study notes)

Chapter 1 an introduction to SAS

proc print, proc contenct

  • proc print to list the observations in a SAS data set
proc print data=sampledata;
run;
  • proc contencts to display the data descriptor portion of SAS data set
    • 在proc contencts step 里面 不单单是展示data set 描述的一部分,也可以用于展示其改变数据之后的信息
title'dispaying the data set';
proc contents data=sampledata;
run;

在这里插入图片描述

creating a SAS data set from row data

  • reading data from a txt file that use space as delimiter
data sample2;
infile'C:\Users\apple\Desktop\63671_example\delim.txt';
length gender $1;
input ID Age Gender $;
run;
title'listing of  data set sample2';
proc print data=sample2;
run;
  • reading CSV files
data sample3;
infile'C:\Users\apple\Desktop\63671_example\comma.csv' dsd; /*dsd 去掉分隔符*/
input ID Age Gender $;
run;
title'listing of  data set sample3';
proc print data=sample3;
run;
  • data value in fixed columns 将数据固定columns
  • @-- column pointer,specify the starting columns for the variable by the starting columns number

chapter 2 continuous variables 连续变量

computing descriptive statistics using proc mean(计算描述性统计)

  • 描述性统计,运用制表和分类,图形以及计算概括性数据特征的各项活动,调查总体所有变量的相关数据进行分析。
    • 频数分析–利用频数分析和交叉频数分析可以检验异常值
    • 集中趋势分析–反应数据的一般标准,平均值,中位数,众数等
    • 离散程度分析-反应数据之间的差异程度,方差和标准差
    • 数据分布–属于正态分布,需要偏度和峰度两个指标来检验是否符合
    • 基本的统计图形–条形图,饼图和直线图等
    • case series analysis(病历系统分析)- 对一组(几例,几十例,几百例,或几千例等)相同疾病的临床资料进行整理、统计、分析、总结并得出结论

generating descriptive statistics 基础描述性统计

proc means data=example.blood_pressure n nmiss mean std median maxdec=3;/*maxdec=# 是小数点后面几位*/
var SBP DBP;
run;

classification variable(分类变量,在描述性统计中使用)

-使用 class statement

proc means data=example.blood_pressure n nmiss mean std median maxdec=3;
class drug;
var SBP DBP;
run;

在这里插入图片描述

  • 第一个是没有使用class 进行分类的,第二个是使用class 对drug进行了分类,每一个类型的drug 对应指定变量名称以及所需要知道的信息
printalltypes option 将所有的type展示出来
proc means data=example.blood_pressure n nmiss mean std median maxdec=3 printalltypes ;
class drug;
var SBP DBP;
run;

computing a 95% confidence interval and standard error(计算95%置信区间与标准误)

  • 95% confidence interval:样本统计量所构造的总体参数的估计区间,一个概率样本的置信区间是对这个样本总体参数的区间估计,而95%置信区间是多次抽中95%置信区间包含的未知参数值,而另外5%则不包含真值,决定样本中的平均样本估值与总体均值
  • standard error:用于衡量样本均值和总体均值的差距
    • 标准误越小,样本均值和总体均值差距越小
    • 标准误越大,样本均值和总体均值差距越大
    • CI is useful in helping decide sample mean estimate and mean of population from sample
    • SE same reason with CI

computing a 95% confidence interval and standard erroe by clm and stderr option

title'95% CL';
proc means data=example.blood_pressure n nmiss mean std median maxdec=3 clm stderr;
class drug;
var SBP DBP;
run;

producing descriptive statistics,histogram,and probability plots 制作分析性统计的直方图以及其他图形

  • proc univariate produces output is similar to the output from proc means, but proc univariate additional that produce histogram and probability plots

useing proc univariate to produce histogram and probability plots

title'histogram and probability plots';
proc univariate data=example.blood_pressure;
id subj;/* not necessary but useful with proc univariate,specify a variable that identifies each observation,指定变量去标识每个观察值*/
var SBP DBP;
histogram;/*request histogram*/
probplot/normal(mu=est sigma=est);/*request a probability plot,normal means want normal distribution to plot,mu is specify mean sigma is specify standard deviation,est is estimate*/
run;

changing the midpoint value on the histogram(在histogram中改变中心点的值)

  • use midpoints option on histogram statement
title'changing midpoint by midpoints option';
proc univariate data=example.blood_pressure;
id subj;
var SBP DBP;
histogram/ midpoints= 100 to 170 by 5 normal;/* want 100 to 170 and each bin representing 5 points*/
probplot/normal(mu=est sigma=est);
run;

generating a variety of graphical display of your data

  • use SGPLOT and SGSCATTER
  • SGPLOT to produce histogram,box plots scatter plots and much more
  • SGSCATTER displays several plots on a single page
  • SG procedures come with a number of build-in style

useing proc sgplot to produce histogram

title'sgplot histogram';
proc sgplot data=example.blood_pressure;
histogram SBP;
run;

using proc sgplot to produce a horizontal box plot by hbox statement

title'sgplot box plot';
proc sgplot data=example.blood_pressure;
hbox SBP;
run;

displaying multiple box plots for each value of categorical variable 分类变量中的每个值被多重box plots 去展示

  • use category= option in hnox statement
title'categorcal variable';
proc sgplot data=example.blood_pressure;
    hbox SBP/ category= drug;
run;

chapter 3 descriptive statistics - categorical variables(分类变量)

  • produce frequencies for single variables and create cross-tabulation tables
  • categorical variable graphical
  • grouping continuous variable into categories using different techniques

computing frequency counts and percentages by proc freq step计算频度以及百分比

  • for categorical variables
  • count unique values for character or numeric variables

using proc freq

title'computing freq and percentage';
proc freq data=example.blood_pressure;
table gender drug /nocum missing ;/*identify which variable want to process*/
run;
  • NOCUM table option:not cumulative statistics to remove columns type from tables(cumulative freq and present)
  • missing oprion: tell proc freq to treat missingg values as a valid category and to include them in the body of the table.(在读取到这个missing option 的时候,会自动将缺失频数当成一个变量写在table里面,同时会改变之前的变量的percentages)

computing frequencies on a continuous variable

title'computing continuous with proc freq';
proc freq data=example.blood_pressure;
table SBP/ nocum missing;
run;

using formats to group observations( group continuous value into categories) 用添加格式的方法去将观察变量进行分组

  • proc format + proc freq
title'format with group observation' ;
 proc format;
    value $gender 'M'='male'
	              'F'='female';/* value statement used by name each format ,specify unique values,groups of value or ranges of values on the left side of the equal sign,value statement 用于对特定唯一的值命名,分组或给出一个区间,给出区间的区间要在等号左边*/
	value sbpgroup low-140 ='normal'
	               141-high ='high';
	value dbpgroup low-80='normal'
	               81-high='high';
 run;
proc freq data=example.blood_pressure;
table gender sbp dbp/ nocum;
format gender $gender.
       sbp sbpgroup.
       DBP dbpgroup.;
run;
  • format statement in either data step or a proc step

histograms and bar charts

  • useful to show frequencies in a graphical display
  • older SAS procedure called Gchart
  • use proc sgplot(new sas procedure) to produce a wide different plots and charts

generating a bar chart using proc GCHART

goptions reset=all; 
pattern value = solid color = blue; /*type of bar and color of the bars*/
title "Generating a Bar Chart - Using PROC GCHART"; 
proc gchart data=store; 
vbar Region; /*vbar statement list the variable for which you want to generate bar chart vbar 是垂直bar hbar横向bar*/
run; 
quit; /*quit step 一定要记得,ends the procedure*/

creating a bar chart using proc sgplot

  • build-in styles make it easy to customization output

generating a bar chart using proc sgplot

title'generating a bar chart-using proc sgplot';
proc sgplot data=store;
vbar region;/*vbar,bar type*/
run;

using ODS to send output to alternate destinations

  • create output

using ods to create PDF output

 ods listing close;
 ods pdf file='C:\Users\apple\Desktop\63671_example';
title 'generating a bar chart';
proc sgplot data=store;
vbar region;
run;
quit;
ods pdf close;
ods listing;

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

LH@313.com

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值