MAST90083 Computational Statistics and Data Mining


Assignment 1
MAST90083 Computational Statistics and Data Mining
Due time: 5PM, Monday September 16th
You must submit your report via LMS
1 Data Analysis
Gross domestic product is a standard measure of the size of an economy; it’s the total value
of all goods and services bought and solid in a country over the course of a year. It’s not a
perfect measure of prosperity, but it is a very common one, and many important questions
in economics turn on what leads GDP to grow faster or slower. One common idea is that
poorer economies, those with lower initial GDPs, should grower faster than richer ones.
The reasoning behind this catching up is that poor economies can copy technologies and
procedures from richer ones, but already-developed countries can only grow as technology
advances. A second, separate idea is that countries can boost their growth rate by undervaluing
their currency, making the goods and services they export cheaper. Our dataset
“uval.csv” contains the following variables:
• Country, in a three-letter code.
• Year (in five-year increments).
• Per-capita GDP, in dollars per person per year
• Average percentage growth rate in GDP over the next five years.
• An index of currency under-valuation. The index is 0 if the currency is neither overnor
under-valued, positive if under-valued, negative if it is over-valued.
Note that not all countries have data for all years. However, there are no missing values in

代写MAST90083留学生作业、代做Data Mining作业
the data table.
1. Linearly regress the growth rate on the under-valuation index and the log of GDP.
Report the coefficients and their standard errors. Do the coefficients support the
idea of catching up? Do they support the idea that under-valuing a currency boosts
economic growth?
1
2. Repeat the linear regression but add as covariates the country, and the year. Use
factor(year), not year, in the regression formula.
(a) Report the coefficients for log GDP and undervaluation, and their standard errors.
(b) Explain why it is more appropriate to use factor(year) in the formula than just
year.
(c) Plot the coefficients on year versus time.
(d) Does this expanded model support the idea of catching up? Of undervaluation
boosting growth?
3. Does adding in year and country as covariates improve the predictive ability of a linear
model which includes log GDP and under-valuation?
(a) What are the R2 and the adjusted R2 of the two models?
(b) Use leave-one-out cross-validation to find the mean squared errors of the two
models. Which one actually predicts better, and by how much?
(c) Explain why using 5-fold cross-validation would be hard here.
4. Kernel regression Use kernel regression, as implemented in the np package, to nonparametrically
regress growth on log GDP, under-valuation, country, and year (treating
year as a categorical variable). Hint: read chapter four of Shalizi carefully. In particular,
try setting tol to about 10−3 and ftol to about 10−4
in the npreg command,
and allow several minutes for it to run.
(a) Give the coefficients of the kernel regression, or explain why you cannot.
(b) Plot the predicted values of the kernel regression, for each country and year,
against the predicted values of the linear model.
(c) Plot the residuals of the kernel regression against its predicted values. Should
these points be scattered around a flat line, if the model is right? Are they?
(d) The npreg function reports a cross-validated estimate of the mean squared error
for the model it fits. What is that? Does the kernel regression predict better or
worse than the linear model with the same variables?
2 Kernel regression and varying smoothness
Starter code for this problem is in starter.R. That code will generate a data set to be used
for this problem, and will also provide a true mean function µ(x). The resulting data frame
has a x column (your predictor) and a y column (your response).
1. Plot y versus x. Overlay the true mean function µ(x) using the curve function in R.
What do you notice for x < 4π and x > 4π?
2
2. Using the np library in R, fit a kernel regression on each of the following datasets:
(a) Only those data points with x < 4π.
(b) Only those data points with x > 4π.
(c) All the data points
For each of these regressions, what is the optimal bandwidth? How does the optimal
bandwidth for the overall data set compare to the optimal bandwidth for each of the
halves?
3. For each of the three selected bandwidths, make a plot showing:
• The true mean µ(x).
• The data points.
• The kernel regression predictions, with the bandwidth specified to be the selected
bandwidth.
• The 95% confidence band for the regression curve µ using resampling of residuals.
• The 95% confidence band for the regression curve µ using resampling of cases.
The result should be three plots, each tuned to one of the selected bandwidths. Give
these plots clear titles to distinguish them.
4. How do these three plots differ? In particular, how well do the regressions trained on
the left and right halves do on each half of the data set? How well does the bandwidth
fit on the overall data set do on each half? (Be specific about the types of problems
that occur.) What lesson might this tell about functions of varying smoothness and
kernel regression, if any?
3 Theoretical questions
1. Exercise 1.2 in Shalizi
2. Exercise 1.4 in Shalizi
3. Exercise 7.4 in ESL

因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:99515681@qq.com

微信:codehelp

转载于:https://www.cnblogs.com/lpka/p/11519847.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
提供的源码资源涵盖了Java应用等多个领域,每个领域都包含了丰富的实例和项目。这些源码都是基于各自平台的最新技术和标准编写,确保了在对应环境下能够无缝运行。同时,源码中配备了详细的注释和文档,帮助用户快速理解代码结构和实现逻辑。 适用人群: 适合毕业设计、课程设计作业。这些源码资源特别适合大学生群体。无论你是计算机相关专业的学生,还是对其他领域编程感兴趣的学生,这些资源都能为你提供宝贵的学习和实践机会。通过学习和运行这些源码,你可以掌握各平台开发的基础知识,提升编程能力和项目实战经验。 使用场景及目标: 在学习阶段,你可以利用这些源码资源进行课程实践、课外项目或毕业设计。通过分析和运行源码,你将深入了解各平台开发的技术细节和最佳实践,逐步培养起自己的项目开发和问题解决能力。此外,在求职或创业过程中,具备跨平台开发能力的大学生将更具竞争力。 其他说明: 为了确保源码资源的可运行性和易用性,特别注意了以下几点:首先,每份源码都提供了详细的运行环境和依赖说明,确保用户能够轻松搭建起开发环境;其次,源码中的注释和文档都非常完善,方便用户快速上手和理解代码;最后,我会定期更新这些源码资源,以适应各平台技术的最新发展和市场需求。 所有源码均经过严格测试,可以直接运行,可以放心下载使用。有任何使用问题欢迎随时与博主沟通,第一时间进行解答!
提供的源码资源涵盖了小程序应用等多个领域,每个领域都包含了丰富的实例和项目。这些源码都是基于各自平台的最新技术和标准编写,确保了在对应环境下能够无缝运行。同时,源码中配备了详细的注释和文档,帮助用户快速理解代码结构和实现逻辑。 适用人群: 适合毕业设计、课程设计作业。这些源码资源特别适合大学生群体。无论你是计算机相关专业的学生,还是对其他领域编程感兴趣的学生,这些资源都能为你提供宝贵的学习和实践机会。通过学习和运行这些源码,你可以掌握各平台开发的基础知识,提升编程能力和项目实战经验。 使用场景及目标: 在学习阶段,你可以利用这些源码资源进行课程实践、课外项目或毕业设计。通过分析和运行源码,你将深入了解各平台开发的技术细节和最佳实践,逐步培养起自己的项目开发和问题解决能力。此外,在求职或创业过程中,具备跨平台开发能力的大学生将更具竞争力。 其他说明: 为了确保源码资源的可运行性和易用性,特别注意了以下几点:首先,每份源码都提供了详细的运行环境和依赖说明,确保用户能够轻松搭建起开发环境;其次,源码中的注释和文档都非常完善,方便用户快速上手和理解代码;最后,我会定期更新这些源码资源,以适应各平台技术的最新发展和市场需求。 所有源码均经过严格测试,可以直接运行,可以放心下载使用。有任何使用问题欢迎随时与博主沟通,第一时间进行解答!

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值