SAS and R 学习记录

# Monte Carlo Simulation

A.STEP,(FROM《SAS For Monte Carlo Studies》)

1. Designing the system (What are the parameters of the system? What are the relationships among these parameters? What is the unknown we are after? What are
the parameters changing by chance? What precision do we wish to achieve?).
2. Identifying the a priori distributions of the probability variables in the system.
3. Programming the whole system in SAS.
4. Executing the simulation.
5. Saving all relevant results and necessary intermediate values.
6. Checking the required randomness in the system.
7. Calculating the results.
8. Presenting the results.

B.CASES

B.1) FROM:Dr. Karl L. Wuensch'S homepage

In the data step NORMAL1, I randomly sample 2,500,000 scores from a normal population which has a mean of 100 and a standard deviation of 15.  I arrange these in a matrix with 25 columns and 100,000 rows. Then I use PROC UNIVARIATE to look at the distribution of one sample of 100,000 of these scores.

In the NORMAL2 step, I compute sample mean, standard deviation, variance, z, and t on 100,000 samples with N = 9 and 100,000 samples with N = 25.  The z and the t test the null hypothesis that the population mean is 100.  This may be your one chance in life to see an absolutely true null hypothesis be tested (assuming that SAS' random number generator is flawless).

These statistics are output to data set NORMAL3.  Keep in mind that these statistics are being computed on sampling distributions (the 'scores' in each of these distributions are statistics computed on samples randomly drawn from the same population).  Various statistics from the sampling distributions are displayed, u and TITLE statements are used to label the output.

options formdlim='-' pageno=min nodate;
DATA normal1; drop obs I;
do obs=1 to 100000;
array Ys[25] Y1-Y25;
do I = 1 to 25;
Ys[I]=100+15*NORMAL(0);
end; output;
end;
proc univariate plot normal; var Y1;
title 'One Sample of 100,000 scores from Normal Distribution'; run;
run;
*************************************************************************;
data normal2;
set normal1;
mean9 = mean(of Y1-Y9);     mean25 = mean(of Y1-Y25);
std9 = std(of Y1-Y9);      std25=std(of Y1-Y25);
var9 = std9*std9;      var25 = std25*std25;
Z9 = (mean9 - 100)/5;      Z25 = (mean25 - 100)/3;
T9 = (mean9 - 100)/(std9/3);     T25 = (mean25 - 100)/(std25/5);
Type1_N9 = 'No ';
If ABS(T9)GE 2.306 then Type1_N9 = 'Yes';
Type1_N25 = 'No ';
If ABS(T25)GE 2.064 then Type1_N25 = 'Yes';
proc freq;
tables Type1_N9 Type1_N25;
title 'Frequency of Type I Errors';
proc univariate noprint;
output out=normal3
mean=mean_mean9 mean_var9 mean_std9 mean_Z9 mean_T9
mean=mean_mean25 mean_var25 mean_std25 mean_Z25 mean_T25
median=med_mean9 med_var9 med_std9 med_Z9 med_T9
median=med_mean25 med_var25 med_std25 med_Z25 med_T25
std=std_mean9 std_var9 std_std9 std_Z9 std_T9
std=std_mean25 std_var25 std_std25 std_Z25 std_T25
skewness=g1_mean9 g1_var9 g1_std9 g1_Z9 g1_T9
skewness=g1_mean25 g1_var25 g1_std25 g1_Z25 g1_T25
kurtosis=g2_mean9 g2_var9 g2_std9 g2_Z9 g2_T9
kurtosis=g2_mean25 g2_var25 g2_std25 g2_Z25 g2_T25
min=min_mean9 min_var9 min_std9 min_Z9 min_T9
min=min_mean25 min_var25 min_std25 min_Z25 min_T25
max=max_mean9 max_var9 max_std9 max_Z9 max_T9
max=max_mean25 max_var25 max_std25 max_Z25 max_T25;
var mean9 var9 std9 Z9 T9 mean25 var25 std25 Z25 T25; run;
*************************************************************************;
data normal4;
set normal3;
proc print;
var mean_mean9 med_mean9 std_mean9 g1_mean9 g2_mean9 min_mean9 max_mean9;
title 'Normal Population, Distribution of Sample Means, N = 9';
proc print;
var mean_mean25 med_mean25 std_mean25 g1_mean25 g2_mean25 min_mean25 max_mean25;
title 'Normal Population, Distribution of Sample Means, N = 25';
**;
proc print;
var mean_var9 med_var9 std_var9 g1_var9 g2_var9 min_var9 max_var9;
title 'Normal Population, Distribution of Sample Variances, N = 9';
proc print;
var mean_var25 med_var25 std_var25 g1_var25 g2_var25 min_var25 max_var25;
title 'Normal Population, Distribution of Sample Variances, N = 25';
**;
proc print;
var mean_std9 med_std9 std_std9 g1_std9 g2_std9 min_std9 max_std9;
title 'Normal Population, Distribution of Sample Standard Deviations, N = 9';
proc print;
var mean_std25 med_std25 std_std25 g1_std25 g2_std25 min_std25 max_std25;
title 'Normal Population, Distribution of Sample Standard Deviations, N = 25';
**;
proc print;
var mean_Z9 med_Z9 std_Z9 g1_Z9 g2_Z9 min_Z9 max_Z9;
title 'Normal Population, Distribution of Z Test Statistic, N = 9';
proc print;
var mean_Z25 med_Z25 std_Z25 g1_Z25 g2_Z25 min_Z25 max_Z25;
title 'Normal Population, Distribution of Z Test Statistic, N = 25';
**;
proc print;
var mean_T9 med_T9 std_T9 g1_T9 g2_T9 min_T9 max_T9;
title 'Normal Population, Distribution of T Test Statistic, N = 9';
proc print;
var mean_T25 med_T25 std_T25 g1_T25 g2_T25 min_T25 max_T25;
title 'Normal Population, Distribution of T Test Statistic, N = 25';
run;
*############################################################################;
DATA skewed1;
drop obs J;
do obs=1 to 100000;
array Xs[25] X1-X25;
do J = 1 to 25;
Xs[J]=100*RANEXP(0);
end; output;
end;
run;
proc univariate plot normal;
var X1;
title 'One Sample of 100,000 scores from Exponential Distribution';
run;
*************************************************************************;
data skewed2;
set skewed1;
mean9 = mean(of X1-X9);     mean25 = mean(of X1-X25);
Z9 = 3*(mean9 - 100)/100;      Z25 = (mean25 - 100)/20;
std9 = std(of X1-X9);      std25=std(of X1-X25);
T9 = (mean9 - 100)/(std9/3);     T25 = (mean25 - 100)/(std25/5);
Type1_N9 = 'No ';
If ABS(T9)GE 2.306 then Type1_N9 = 'Yes';Type1_N25 = 'No ';
If ABS(T25)GE 2.064 then Type1_N25 = 'Yes';
proc freq;
tables Type1_N9 Type1_N25;
title 'Frequency of Type I Errors';
proc univariate noprint;
output out=skewed3
mean=mean_mean9 mean_std9 mean_Z9 mean_T9
mean=mean_mean25 mean_std25 mean_Z25 mean_T25
median=med_mean9 med_std9 med_Z9 med_T9
median=med_mean25 med_std25 med_Z25 med_T25
std=std_mean9 std_std9 std_Z9 std_T9
std=std_mean25 std_std25 std_Z25 std_T25
skewness=g1_mean9 g1_std9 g1_Z9 g1_T9
skewness=g1_mean25 g1_std25 g1_Z25 g1_T25
kurtosis=g2_mean9 g2_std9 g2_Z9 g2_T9
kurtosis=g2_mean25 g2_std25 g2_Z25 g2_T25
min=min_mean9 min_std9 min_Z9 min_T9
min=min_mean25 min_std25 min_Z25 min_T25
max=max_mean9 max_std9 max_Z9 max_T9
max=max_mean25 max_std25 max_Z25 max_T25;
var mean9 std9 Z9 T9 mean25 std25 Z25 T25;
run;
*************************************************************************;
proc print; var mean_mean9 med_mean9 std_mean9 g1_mean9 g2_mean9 min_mean9 max_mean9;
title 'Exponential Population, Distribution of Sample Means, N = 9';
proc print; var mean_mean25 med_mean25 std_mean25 g1_mean25 g2_mean25 min_mean25 max_mean25;
title 'Exponential Population, Distribution of Sample Means, N = 25';
**;
proc print; var mean_std9 med_std9 std_std9 g1_std9 g2_std9 min_std9 max_std9;
title 'Exponential Population, Distribution of Sample Standard Deviations, N = 9';
proc print; var mean_std25 med_std25 std_std25 g1_std25 g2_std25 min_std25 max_std25;
title 'Exponential Population, Distribution of Sample Standard Deviations, N = 25';
**;
proc print; var mean_Z9 med_Z9 std_Z9 g1_Z9 g2_Z9 min_Z9 max_Z9;
title 'Exponential Population, Distribution of Z Test Statistic, N = 9';
proc print; var mean_Z25 med_Z25 std_Z25 g1_Z25 g2_Z25 min_Z25 max_Z25;
title 'Exponential Population, Distribution of Z Test Statistic, N = 25';
**;
proc print; var mean_T9 med_T9 std_T9 g1_T9 g2_T9 min_T9 max_T9;
title 'Exponential Population, Distribution of T Test Statistic, N = 9';
proc print; var mean_T25 med_T25 std_T25 g1_T25 g2_T25 min_T25 max_T25;
title 'Exponential Population, Distribution of T Test Statistic, N = 25';
run;


Problem1:   Estimate PI using Monte Carlo Integration

Strategy:  Equation of a circle with radius=1:  x2 + y2 = 1, which can be written y = sqrt(1-x2); Area of this circle = p; Area of this circle in the first quadrant = p/4;

Generate Ux ~ Uniform(0,1) and Uy ~ Uniform(0,1),  Check to see if Uy <= sqrt(1-Ux2);

The proportion of generated points when this Condition is true is an estimate of p/4.

data MCint;
/* initialize seed                       */
seed1 = 12345;
do isim = 1 to 10000;
/* generate point in first quadrant       */
Ux = ranuni(seed1);
Uy = ranuni(seed1);
/* check to see if point under the circle */
Under = (Uy <= sqrt(1-Ux**2));
/* output simulation result               */
drop isim;
output;
end;
* ODS TRACE ON;
ODS OUTPUT OneWayFreqs=data_freq;
proc freq;
table Under;
run;
ODS OUTPUT CLOSE;
* ODS TRACE OFF;
proc print data=data_freq; run;
data summary; set data_freq;
if under = 1;
PI_est = 4*Percent/100;
prop_est = Percent/100;
SE_PI_EST = 4*sqrt(prop_est * (1 – prop_est)/10000 );
PI_CI_Low = PI_est – 2*SE_PI_EST;
PI_CI_Up  = PI_est + 2*SE_PI_EST;
put ‘-----------------------------------------------‘;
put ‘| MC Integration estimate of PI               |’;
put ‘-----------------------------------------------‘;
put ‘     PI (estimate) = ‘ PI_est;
put ‘SE [PI (estimate)] = ‘ SE_PI_EST;
put ‘    Approx. 95% CI = ‘ PI_CI_Low ‘, ‘ PI_CI_Up;
put ‘ ‘;
put ‘Based on 10,000 simulated points. ‘;
put ‘-----------------------------------------------‘;
run;
ODS RTF file='D:\baileraj\Classes\Fall 2003\sta402\SAS-programs\week6-MC-fig.rtf'
proc gplot data=MCint;
plot Uy*Ux=Under;
run;
ODS RTF CLOSE;

Problem2:Simulating sex ratio in family with 4 kids

data monte2;
title2 “Problem 2: #boys & girls in families of size 4”;

array x x1-x4;        * array containing family of 4 kids;
do j=1 to 1000;       * generate 1000 familes;
numboys = 0;        * initialize counters of #boys and #girls;
numgirls=0;
do over x;          * generate a family of 4 kids;
p = ranuni(0);
if p<.5 then do;
numboys = numboys + 1;
x = 0;
end;
else do;
numgirls = numgirls + 1;
x = 1;
end;
end;                * end of the loop for KIDS;
outcome = 0;
if numboys=numgirls then outcome=1;
drop j p;
output;
end;                  * end of the loop for FAMILIES;
proc print;
proc freq;
table outcome numboys*numgirls;
run;

Problem3:  Explore whether t-test really is robust to violations of the equal variance assumption

Strategy: See if the t-test operates at the nominal  Type I error rate when the unequal variance assumption is violated

Test case:  n1=n2=10; Population 1:  N(0,1) ; Population 2:  N(0,4)

data twogroup;
array x{10} x1-x10;
array y{10} y1-y10;
do isim = 1 to 10000;
/* generate samples X~N(0,1)  Y~N(0,4) - normal case */
do isample = 1 to 10;
x{isample} = rannor(0);
y{isample} = 2*rannor(0);
end;
/* calculate the t-statistic                        */
xbar = mean(of x1-x10);
ybar = mean(of y1-y10);
xvar = var(of x1-x10);
yvar = var(of y1-y10);
s2p = (9*xvar + 9*yvar)/18;
tstat = (xbar-ybar)/sqrt(s2p*(2/10));
Pvalue = 2*(1-probt(abs(tstat),18));
Reject05 = (Pvalue <= 0.05);
keep xbar ybar xvar yvar s2p tstat Pvalue Reject05;
output;
end;   * end of the simulation loop;

proc freq;
table Reject05;
run;

Other sourses:

<<SAS FOR Monte Carlo Studies>>

http://blog.sina.com.cn/s/blog_5d3b177c0100dfxw.html

#### Monte Carlo Simulation technique

2007-04-09 00:24:00

#### Markov chain Monte Carlo

2013-09-02 18:51:09

#### Handbook of Markov Chain Monte Carlo - Steve影印版

2016年03月11日 15.1MB 下载

#### 如何用R进行蒙特卡罗模拟（Monte Carlo Simulation with R）

2016-11-20 23:22:39

#### Introduction to Monte Carlo Tree Search

2017-05-01 17:27:43

#### Monte Carlo simulation, resampling

2018-04-07 12:05:55

#### Sequential Monte Carlo Methods

2015-09-09 18:39:12

#### 蒙特卡罗方法（Monte Carlo method）浅入

2017-09-25 18:37:34

#### Gibbs Sampling(一)：随机数产生方法介绍 & Monte Carlo方法

2015-09-12 14:04:38

#### Molecular Gas Dynamics and the Direct Simulation of Monte Carlo

2015年03月12日 18.6MB 下载