proc rank

一个中文博客:http://blog.sina.com.cn/s/blog_6849f0730100we95.html

排序方法详解(文档):

Proc rank 计算 观测值对应数值型变量的秩次

语法:

      Proc rank <options>;

          By <descending> variable-1 <descending> variable-n

<notsorted> ;*分组变量;

          Var data-set-variables(s);*设定待排序求秩变量;

          Ranks new-variable(s);*含秩次的变量;

Options中求秩排序的方法:

1.1FRACTION

computes fractional ranks by dividing each rank by the number of observations having nonmissing values of the ranking variable

TIES=HIGH is the default with the FRACTION option. With TIES=HIGH, fractional ranks are considered values of a right-continuous empirical cumulative distribution function.

1.2NPLUS1

computes fractional ranks by dividing each rank by the denominator n+1, where n is the number of observations having nonmissing values of the ranking variable.

2.GROUPS=number-of-groups

assigns group values ranging from 0 to number-of-groups minus 1. Common specifications are GROUPS=100 for percentiles, GROUPS=10 for deciles, and GROUPS=4 for quartiles. For example, GROUPS=4 partitions the original values into four groups, with the smallest values receiving, by default, a quartile value of 0 and the largest values receiving a quartile value of 3.

The formula for calculating group values is

where FLOOR is the FLOOR function, rank is the value's order rank, k is the value of GROUPS=, and n is the number of observations having nonmissing values of the ranking variable.

If the number of observations is evenly divisible by the number of groups, each group has the same number of observations, provided there are no tied values at the boundaries of the groups. Grouping observations by a variable that has many tied values can result in unbalanced groups because PROC RANK always assigns observations with the same value to the same group.

3.NORMAL=BLOM | TUKEY | VW

computes normal scores from the ranks. The resulting variables appear normally distributed. The formulas are

 

where ri is the rank of the ith observation, and n is the number of nonmissing observations for the ranking variable.

VW stands for van der Waerden. With NORMAL=VW, you can use the scores for a nonparametric location test. All three normal scores are approximations to the exact expected order statistics for the normal distribution, also called normal scores. The BLOM version appears to fit slightly better than the others (Blom 1958; Tukey 1962).

4. PERCENT

divides each rank by the number of observations that have nonmissing values of the variable and multiplies the result by 100 to get a percentage.

5. SAVAGE

computes Savage (or exponential) scores from the ranks by the following formula (Lehman 1998):

TIES=HIGH | LOW | MEAN

specifies how to compute normal scores or ranks for tied data values.

HIGH

assigns the largest of the corresponding ranks (or largest of the normal scores when NORMAL= is specified).

LOW

assigns the smallest of the corresponding ranks (or smallest of the normal scores when NORMAL= is specified).

MEAN

assigns the mean of the corresponding rank (or mean of the normal scores when NORMAL= is specified).

 

 

 

 

数据分析,数据科学及AI算法是当前最热门的职业。这些职业有着共同的特点:面向数字的,针对编程的以及采取分析手段的。 这些当代热点特性使得在就业市场上对以上职位需求激增也就不足为奇了。但是,市场上提供这方面的大型综合的培训课程是有限,如果说有,大多是知识范围狭窄且非综合性的,而且大多培训都缺乏方法论与实务结合。一般的情况是讲师讲述某种语言的一堆代码,学生听完后甚至连使用方法及代码的前提都不清楚,更别提实际应用场景了。这里,掌握一门数据分析软件本身没错,但仅通过单一的编程培训很难获得聘用为数据分析师或数据科学家所需的技能。那我的解决方案是什么呢?首先,我把所有数据分析中的典型问题都归类总结出来,再结合相应的实际问题,数据以及案例,同时采用世界上最流行的两种数据分析软件:PYTHON 和 SAS去解决这些问题,并将这些解决方法传授给学生。学生在完成培训后更重要的收获是知道每一问题从产生直至解决的前因后果和应用场景,这是因为我在每一课程章节最前都会交代方法论,知识要点及应用场合。SAS和PYTHON可以一起学吗?当然可以。因为我就是这样做到的。具体步骤是,我在课程当中安排了一系列主题,然后使用两种编程语言解决同样的问题。我总结出这样做的好处是边学习边比较,最后在不知不觉当中掌握了两门语言的精华和数据分析的通用方法或模式。过程虽有点长,但十分有趣。最后,为了巩固已学的知识和技能,我还专门安排了针对PYTHON 和 SAS的中小型项目及详细代码讲解。另外,课程当中使用的全部编程代码及数据文件都将免费地提供给注册的学生。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值