yugao1986的专栏

SAS and R 学习记录

Monte Carlo Simulation

A.STEP,(FROM《SAS For Monte Carlo Studies》)

1. Designing the system (What are the parameters of the system? What are the relationships among these parameters? What is the unknown we are after? What are
the parameters changing by chance? What precision do we wish to achieve?).
2. Identifying the a priori distributions of the probability variables in the system.
3. Programming the whole system in SAS.
4. Executing the simulation.
5. Saving all relevant results and necessary intermediate values.
6. Checking the required randomness in the system.
7. Calculating the results.
8. Presenting the results.

B.CASES

B.1) FROM:Dr. Karl L. Wuensch'S homepage

In the data step NORMAL1, I randomly sample 2,500,000 scores from a normal population which has a mean of 100 and a standard deviation of 15.  I arrange these in a matrix with 25 columns and 100,000 rows. Then I use PROC UNIVARIATE to look at the distribution of one sample of 100,000 of these scores. 

In the NORMAL2 step, I compute sample mean, standard deviation, variance, z, and t on 100,000 samples with N = 9 and 100,000 samples with N = 25.  The z and the t test the null hypothesis that the population mean is 100.  This may be your one chance in life to see an absolutely true null hypothesis be tested (assuming that SAS' random number generator is flawless). 

These statistics are output to data set NORMAL3.  Keep in mind that these statistics are being computed on sampling distributions (the 'scores' in each of these distributions are statistics computed on samples randomly drawn from the same population).  Various statistics from the sampling distributions are displayed, u and TITLE statements are used to label the output.

options formdlim='-' pageno=min nodate;
DATA normal1; drop obs I;
   do obs=1 to 100000;
      array Ys[25] Y1-Y25;
      do I = 1 to 25;
         Ys[I]=100+15*NORMAL(0);
      end; output; 
   end;
proc univariate plot normal; var Y1;
title 'One Sample of 100,000 scores from Normal Distribution'; run;
run;
*************************************************************************;
data normal2; 
   set normal1;
   mean9 = mean(of Y1-Y9);     mean25 = mean(of Y1-Y25);
   std9 = std(of Y1-Y9);      std25=std(of Y1-Y25);
   var9 = std9*std9;      var25 = std25*std25;
   Z9 = (mean9 - 100)/5;      Z25 = (mean25 - 100)/3;
   T9 = (mean9 - 100)/(std9/3);     T25 = (mean25 - 100)/(std25/5);
   Type1_N9 = 'No ';
   If ABS(T9)GE 2.306 then Type1_N9 = 'Yes';
      Type1_N25 = 'No ';
   If ABS(T25)GE 2.064 then Type1_N25 = 'Yes';
proc freq; 
   tables Type1_N9 Type1_N25;
title 'Frequency of Type I Errors';
proc univariate noprint; 
   output out=normal3 
   mean=mean_mean9 mean_var9 mean_std9 mean_Z9 mean_T9
   mean=mean_mean25 mean_var25 mean_std25 mean_Z25 mean_T25
   median=med_mean9 med_var9 med_std9 med_Z9 med_T9
   median=med_mean25 med_var25 med_std25 med_Z25 med_T25
   std=std_mean9 std_var9 std_std9 std_Z9 std_T9
   std=std_mean25 std_var25 std_std25 std_Z25 std_T25
   skewness=g1_mean9 g1_var9 g1_std9 g1_Z9 g1_T9
   skewness=g1_mean25 g1_var25 g1_std25 g1_Z25 g1_T25
   kurtosis=g2_mean9 g2_var9 g2_std9 g2_Z9 g2_T9
   kurtosis=g2_mean25 g2_var25 g2_std25 g2_Z25 g2_T25
   min=min_mean9 min_var9 min_std9 min_Z9 min_T9
   min=min_mean25 min_var25 min_std25 min_Z25 min_T25
   max=max_mean9 max_var9 max_std9 max_Z9 max_T9
   max=max_mean25 max_var25 max_std25 max_Z25 max_T25;
   var mean9 var9 std9 Z9 T9 mean25 var25 std25 Z25 T25; run;
*************************************************************************;
data normal4; 
   set normal3;
proc print;
   var mean_mean9 med_mean9 std_mean9 g1_mean9 g2_mean9 min_mean9 max_mean9;
title 'Normal Population, Distribution of Sample Means, N = 9';
proc print; 
   var mean_mean25 med_mean25 std_mean25 g1_mean25 g2_mean25 min_mean25 max_mean25;
title 'Normal Population, Distribution of Sample Means, N = 25';
**;
proc print; 
   var mean_var9 med_var9 std_var9 g1_var9 g2_var9 min_var9 max_var9;
title 'Normal Population, Distribution of Sample Variances, N = 9';
proc print; 
   var mean_var25 med_var25 std_var25 g1_var25 g2_var25 min_var25 max_var25;
title 'Normal Population, Distribution of Sample Variances, N = 25';
**;
proc print; 
   var mean_std9 med_std9 std_std9 g1_std9 g2_std9 min_std9 max_std9;
title 'Normal Population, Distribution of Sample Standard Deviations, N = 9';
proc print; 
   var mean_std25 med_std25 std_std25 g1_std25 g2_std25 min_std25 max_std25;
title 'Normal Population, Distribution of Sample Standard Deviations, N = 25';
**;
proc print; 
   var mean_Z9 med_Z9 std_Z9 g1_Z9 g2_Z9 min_Z9 max_Z9;
title 'Normal Population, Distribution of Z Test Statistic, N = 9';
proc print; 
   var mean_Z25 med_Z25 std_Z25 g1_Z25 g2_Z25 min_Z25 max_Z25;
title 'Normal Population, Distribution of Z Test Statistic, N = 25';
**;
proc print; 
   var mean_T9 med_T9 std_T9 g1_T9 g2_T9 min_T9 max_T9;
title 'Normal Population, Distribution of T Test Statistic, N = 9';
proc print; 
   var mean_T25 med_T25 std_T25 g1_T25 g2_T25 min_T25 max_T25;
title 'Normal Population, Distribution of T Test Statistic, N = 25';
run;
*############################################################################;
DATA skewed1; 
   drop obs J;
   do obs=1 to 100000;
      array Xs[25] X1-X25;
      do J = 1 to 25;
         Xs[J]=100*RANEXP(0);
      end; output; 
   end;
run;
proc univariate plot normal; 
   var X1;
title 'One Sample of 100,000 scores from Exponential Distribution'; 
run;
*************************************************************************;
data skewed2; 
   set skewed1;
   mean9 = mean(of X1-X9);     mean25 = mean(of X1-X25);
   Z9 = 3*(mean9 - 100)/100;      Z25 = (mean25 - 100)/20;
   std9 = std(of X1-X9);      std25=std(of X1-X25);
   T9 = (mean9 - 100)/(std9/3);     T25 = (mean25 - 100)/(std25/5);
   Type1_N9 = 'No ';
   If ABS(T9)GE 2.306 then Type1_N9 = 'Yes';Type1_N25 = 'No ';
   If ABS(T25)GE 2.064 then Type1_N25 = 'Yes';
proc freq; 
   tables Type1_N9 Type1_N25;
title 'Frequency of Type I Errors';
proc univariate noprint;
   output out=skewed3 
   mean=mean_mean9 mean_std9 mean_Z9 mean_T9
   mean=mean_mean25 mean_std25 mean_Z25 mean_T25
   median=med_mean9 med_std9 med_Z9 med_T9
   median=med_mean25 med_std25 med_Z25 med_T25
   std=std_mean9 std_std9 std_Z9 std_T9
   std=std_mean25 std_std25 std_Z25 std_T25
   skewness=g1_mean9 g1_std9 g1_Z9 g1_T9
   skewness=g1_mean25 g1_std25 g1_Z25 g1_T25
   kurtosis=g2_mean9 g2_std9 g2_Z9 g2_T9
   kurtosis=g2_mean25 g2_std25 g2_Z25 g2_T25
   min=min_mean9 min_std9 min_Z9 min_T9
   min=min_mean25 min_std25 min_Z25 min_T25
   max=max_mean9 max_std9 max_Z9 max_T9
   max=max_mean25 max_std25 max_Z25 max_T25;
   var mean9 std9 Z9 T9 mean25 std25 Z25 T25;
run;
*************************************************************************;
proc print; var mean_mean9 med_mean9 std_mean9 g1_mean9 g2_mean9 min_mean9 max_mean9;
title 'Exponential Population, Distribution of Sample Means, N = 9';
proc print; var mean_mean25 med_mean25 std_mean25 g1_mean25 g2_mean25 min_mean25 max_mean25;
title 'Exponential Population, Distribution of Sample Means, N = 25';
**;
proc print; var mean_std9 med_std9 std_std9 g1_std9 g2_std9 min_std9 max_std9;
title 'Exponential Population, Distribution of Sample Standard Deviations, N = 9';
proc print; var mean_std25 med_std25 std_std25 g1_std25 g2_std25 min_std25 max_std25;
title 'Exponential Population, Distribution of Sample Standard Deviations, N = 25';
**;
proc print; var mean_Z9 med_Z9 std_Z9 g1_Z9 g2_Z9 min_Z9 max_Z9;
title 'Exponential Population, Distribution of Z Test Statistic, N = 9';
proc print; var mean_Z25 med_Z25 std_Z25 g1_Z25 g2_Z25 min_Z25 max_Z25;
title 'Exponential Population, Distribution of Z Test Statistic, N = 25';
**;
proc print; var mean_T9 med_T9 std_T9 g1_T9 g2_T9 min_T9 max_T9;
title 'Exponential Population, Distribution of T Test Statistic, N = 9';
proc print; var mean_T25 med_T25 std_T25 g1_T25 g2_T25 min_T25 max_T25;
title 'Exponential Population, Distribution of T Test Statistic, N = 25';
run;

B.2)EXAMPLE FROM google

Problem1:   Estimate PI using Monte Carlo Integration

   Strategy:  Equation of a circle with radius=1:  x2 + y2 = 1, which can be written y = sqrt(1-x2); Area of this circle = p; Area of this circle in the first quadrant = p/4;

                    Generate Ux ~ Uniform(0,1) and Uy ~ Uniform(0,1),  Check to see if Uy <= sqrt(1-Ux2);

                    The proportion of generated points when this Condition is true is an estimate of p/4.

data MCint;
    /* initialize seed                       */
  seed1 = 12345;
  do isim = 1 to 10000;
    /* generate point in first quadrant       */ 
    Ux = ranuni(seed1);
    Uy = ranuni(seed1);
    /* check to see if point under the circle */
    Under = (Uy <= sqrt(1-Ux**2));
    /* output simulation result               */
    drop isim;
    output;
  end;
* ODS TRACE ON;
ODS OUTPUT OneWayFreqs=data_freq;
proc freq; 
  table Under;
  run;
ODS OUTPUT CLOSE;
* ODS TRACE OFF;  
proc print data=data_freq; run;
data summary; set data_freq;
  if under = 1;
  PI_est = 4*Percent/100;
  prop_est = Percent/100;
  SE_PI_EST = 4*sqrt(prop_est * (1 – prop_est)/10000 );
  PI_CI_Low = PI_est – 2*SE_PI_EST;
  PI_CI_Up  = PI_est + 2*SE_PI_EST;
  put ‘-----------------------------------------------‘;
  put ‘| MC Integration estimate of PI               |’;
  put ‘-----------------------------------------------‘;
  put ‘     PI (estimate) = ‘ PI_est;
  put ‘SE [PI (estimate)] = ‘ SE_PI_EST;
  put ‘    Approx. 95% CI = ‘ PI_CI_Low ‘, ‘ PI_CI_Up;
  put ‘ ‘;
  put ‘Based on 10,000 simulated points. ‘;
  put ‘-----------------------------------------------‘;
run;
ODS RTF file='D:\baileraj\Classes\Fall 2003\sta402\SAS-programs\week6-MC-fig.rtf'
proc gplot data=MCint;
  plot Uy*Ux=Under;
  run;
ODS RTF CLOSE;


Problem2:Simulating sex ratio in family with 4 kids

data monte2;
title2 “Problem 2: #boys & girls in families of size 4”;

array x x1-x4;        * array containing family of 4 kids;
do j=1 to 1000;       * generate 1000 familes;
  numboys = 0;        * initialize counters of #boys and #girls;
  numgirls=0;
  do over x;          * generate a family of 4 kids;
    p = ranuni(0);
    if p<.5 then do;
      numboys = numboys + 1; 
      x = 0;
      end;
    else do;
      numgirls = numgirls + 1;
      x = 1;
      end;
  end;                * end of the loop for KIDS;
  outcome = 0;
  if numboys=numgirls then outcome=1;
  drop j p;
  output;
end;                  * end of the loop for FAMILIES;
proc print;
proc freq;
  table outcome numboys*numgirls;
run;


Problem3:  Explore whether t-test really is robust to violations of the equal variance assumption 

    Strategy: See if the t-test operates at the nominal  Type I error rate when the unequal variance assumption is violated

    Test case:  n1=n2=10; Population 1:  N(0,1) ; Population 2:  N(0,4)

data twogroup;
   array x{10} x1-x10;
   array y{10} y1-y10;
   do isim = 1 to 10000;
  /* generate samples X~N(0,1)  Y~N(0,4) - normal case */
      do isample = 1 to 10;
         x{isample} = rannor(0);
         y{isample} = 2*rannor(0);
      end;
/* calculate the t-statistic                        */
      xbar = mean(of x1-x10);
      ybar = mean(of y1-y10);
      xvar = var(of x1-x10);
      yvar = var(of y1-y10);
      s2p = (9*xvar + 9*yvar)/18;
      tstat = (xbar-ybar)/sqrt(s2p*(2/10));
      Pvalue = 2*(1-probt(abs(tstat),18));
      Reject05 = (Pvalue <= 0.05);
      keep xbar ybar xvar yvar s2p tstat Pvalue Reject05;
      output;
   end;   * end of the simulation loop;

proc freq; 
  table Reject05;
run;

Other sourses:

<<SAS FOR Monte Carlo Studies>>

http://blog.sina.com.cn/s/blog_5d3b177c0100dfxw.html

http://blog.csdn.net/JiangtangHu
 

阅读更多
想对作者说点什么? 我来说一句

没有更多推荐了,返回首页

加入CSDN,享受更精准的内容推荐,与500万程序员共同成长!
关闭
关闭