Assessment 1

最新推荐文章于 2024-09-10 13:47:50 发布

Aspirationuio

最新推荐文章于 2024-09-10 13:47:50 发布

阅读量332

点赞数 3

文章标签：人工智能算法机器学习开发语言

本文链接：https://blog.csdn.net/Aspirationuio/article/details/140934456

版权

Java Python Assessment 1 (Due by 23:59pm 9th August 2024)

Question 1[50 marks]

We observe data {(yi, xi ) , i = 1, 2, . . . , n} from the linear regression model

yi = β0 + β1 x1i + β2 x2i + · · · + βpxpi + εi , i = 1, 2, . . . , n, (1)

where xi = (x1i, x2i, . . . , xpi )T involves the p covariates.

(a). [10 marks] Generate a set of sample observations (yi, xi ), i = 1, 2, . . . , n in the statistical software R by following the data generating process (DGP) below.

1. Parameters. set p = 20, n = 26 and β0 = 1, β1 = β2 = · · · = β10 = 0.8, β11 = β12 = · · · = β20 = 1.3.

2. Covariates. All the p covariates (i.e. predictors) follow normal distribution with mean 0.4 and variance 1.1.

3. Error Component. The error component εi follows standard normal dis- tribution.

(b). [15 marks] With the generated data in (a), estimate the regression coe伍cients β0 , βk with k = 1, 2, . . . , p with the ordinary least squares (OLS) estimation approach in R. Design an experiment (i.e. simulation) to evaluate the prediction accuracy of this OLS estimator for the response variable y on test data. Please write the procedure of the designed experiment and present the results in R.

(c). [25 marks] With the generated data in (a), propose another estimation approach for the linear regression model, which has more accurate prediction accuracy than the OLS. Please implement the proposed estimation approach in R and present the estimation of the linear coe伍cients. Further, please illustrate why the proposed method is better than the OLS in the sense of prediction accuracy.

Question 2[50 marks]

Consider two sets of sample observations {x1, x2, . . . , xn } and {y1, y2, . . . , ym } from normal distributions with population mean vectors being μ and ν, respectively. The population covariance matrices are both identity matri Assessment 1 ces. The dimensions of μ and ν are both equal to p. Statisticians are interested in the hypothesis test

H0 : μ = ν vs Ha : μ ≠ ν . (2)

A popular test statistic for this hypothesis testing problem is the Hotelling T square statistic

where Sx and Sy are sample covariance matrices constructed by {x1, x2, . . . , xn } and {y1, y2, . . . , ym }, respectively; x and y are sample mean vectors for μ and ν, respec-

tively. The Hotelling T square statistic T2 has the following asymptotic distribution

T2 → χp(2), as n, m → ∞ , (4)

where χp(2) is Chi square distribution with k degrees of freedom.

(a). [25 marks] Please generate the two sets of sample observations in R by setting p = 60, n = 80, m = 90, μ = ν = (1, 1, . . . , 1)T , and then calculate the value of Hotelling T2 statistic T2 . Repeat this experiment for N = 200 times and then plot the histogram of the statistic T2 .

(b). [25 marks] Please apply the bootstrap method to estimate the variance of the Hotelling T2 statistic in R when p = 60, n = 80, m = 90, μ = ν = (1, 1, . . . , 1)T . Write down the details of the bootstrap procedure and present the bootstrap estimation. In addition, please comment on the accuracy of the bootstrap estimation and provide the reasons.

Note: This homework is to be submitted through Wattle in digital form only as per ANU policy. The R codes for any computational question must be supplied

Aspirationuio

关注

3
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
Assessment 1

Java Python Assessment 1 (Due by 23:59pm 9th August 2024)Question 1[50 marks]We observe data {(yi, xi ) , i = 1, 2, . . . , n} from the linear regression modelyi = β0 + β1 x1i + β2 x2i + · · · + βpxpi + εi , i = 1, 2, . . . , n, (1)wh
复制链接

扫一扫