Java Python Specific information for this piece of coursework
• Answer ALL questions.
• Students must work alone.
• AI tools cannot be used.
• You may submit only one answer to each question.
• The total number of marks is 50.
• The numbers in square brackets indicate the relative weights attached to each part question.
• Marks are awarded not only for a final answer but also for the clarity and coherence of your solution. Do not include large amounts of R output.
• Your answers should be submitted as one pdf document (no larger than 100MB) through the submission link on the STAT0031 moodle page, within the ‘In-course
Assessment’ section. Your answers may include hand-written and/or type sections.
• You must complete an ICA cover sheet to include as the first page of your submitted document. The cover sheet can be found within the ‘In-course Assessment’ section of the STAT0031 moodle page.
• Your answers will be marked anonymously. Please DO NOT include your name in any submitted material.
• You may use your course materials to answer questions.
• You may contact the course lecturer to ask questions concerning the ICA using the ‘In-course assessment discussion forum’ on the course moodle page. Please note, the lecturer will limit the amount of help ofered to students and cannot check answers
(or part-answers) or give detailed guidance.
You may use the following notation and results:
The Poisson distribution, Poisson(λ), has probability mass function
where λ > 0. The mean is λ .
The Gamma distribution, Gamma(Q, β), has probability density function
where Q > 0 and β > 0. The mean is Q/β and the variance is Q/β2.
A multinational electrical components manufacturer wants to limit the number of faulty components that it produces. The company chooses C factories for the experiment. At each factory, B large batches of components are tested. The number of faulty components in the i-th batch at the j-th factory is Yi,j . Let Y = (Y1,1, . . . , YB,1, Y1,2, . . . , YB,2, . . . , Y1,C, . . . , YB,C ). The data can be downloaded in the file faulty_comp . txt from the ICA section of the STAT0031 Moodle page and contains results for 10 batches taken at 60 factories.
The company’s Bayesian stati STAT0031 AssessmentR stician proposes the model
Yi,jjθj ~ Poisson(θj ), i = 1, 2, . . . , B, j = 1, 2, . . . , C
where Yi,j are independent given θj and
θj i..d. Gamma(Q, β), j = 1, 2, . . . , C.
1. Choose appropriate hyperpriors for α and β and briefly justify your choice. [3]
2. Derive the full conditional distribution of θj . [6]
3. Use NIMBLE to implement a Gibbs sampler to sample from the posterior distribution of this model with the data in the file faulty_comp . txt. In your answer include all R code needed to run the sampler with two chains and to monitor the chains for Q and β , i.e. all steps needed to use the function nimbleMCMC. [13]
4. Draw trace plots and densities of Q and β using the first 1000 iterations of your Gibbs sampler and comment on the convergence and mixing. [4]
5. Decide on an appropriate burn-in (using graphs of the Gelman-Rubin diagnostic) and a suitable run length to answer the next question (question 6) (briefly justify your choices). [7]
6. Use your code to provide estimates (by reporting the posterior mean and a 95% central credible interval for each parameter) of the following:
7. The company’s Bayesian statistician is worried that some batches have been incor- rectly tested. She proposes extending the model by introducing the parameter Zi,j which indicates whether the i-th batch at the j-th factory was accurately tested (Zi,j = 0) or inaccurately tested (Zi,j = 1). The new model is
Yi,jjθj , Zi,j ~ Poisson((1 + 2Zi,j )θj ), i = 1, 2, . . . , B, j = 1, 2, . . . , C
Zi,j ~ Bernoulli(φ), i = 1, 2, . . . , B, j = 1, 2, . . . , C
θj ~ Gamma(2, β), j = 1, 2, . . . , C
β ~ Gamma(1, 0.001)
(a) Choose a prior for φ and extend your NIMBLE code in Question 3 to sample from the posterior of this model with the data in file faulty_comp . txt. You should only include the model definition (i.e. the nimbleCode part). [4]
(b) Consider the first 3 factories (j = 1; 2; 3) and use your model and MCMC output to decide which batches are faulty. Briefly justify your answer