Java Python COMP2271-WE01
Data Science
2023
Section A Probability and Statistics
Question 1
(a) Consider the joint probability mass function (PMF) fXY (x,y) of two dis- crete random variables X and Y given by the following table:
i. What value does the constant c need to have in order to make this a valid joint PMF? [2 Marks]
ii. Determine the marginal distribution fX of X . [2 Marks]
iii. Calculate the conditional probability P(Y = 2 | X = 1). [2 Marks]
iv. Calculate the covariance Cov(X, Y). [4 Marks]
v. Calculate E[Y2 − 2X]. [2 Marks]
(b) A server farm has 20 servers. When a request arrives, it is allocated to one of the 20 servers, selected uniformly at random. Let X be the random variable that counts the number of servers that have not been allocated any request after 15 requests have arrived.
i. Calculate P(X = 19). [2 Marks]
ii. Calculate P(X = 5). [2 Marks]
iii. Calculate E[X]. [4 Marks]
iv. Let Y be a random variable that is equal to 1 if the first two requests are allocated to the same server and equal to 0 otherwise. Determine whether X and Y are independent random variables. Justify your answer. [3 Marks]
Note: If you need to refer to a Z-table to answer the following questions, you can find one on the following pages.
(c) A company observes that the time its customers spend on its website has a mean of 10 minutes and a standard deviation of 2 minutes. They employ a web designer to improve the website with the goal of increasing the mean time customers spend on the website. After the new design has been implemented, the company claims that the meantime that customers spend on the website has not changed. We believe that the mean time has actually increased. We wish to carry out a hypothesis test in order to investigate the company’s claim.
i. Formulate a suitable null hypothesis and a suitable alternative hy- pothesis. [2 Marks]
ii. Is the test a two-tailed, left-tailed or right-tailed test? [2 Marks]
iii. If we sample n customers and determine that the mean time they spend on the new website is ¯(x) minutes, what is the formula for cal- culating the test-statistic z from the sample data? In the formula, use actual values (instead of variables representing unknown values) where known. [2 Marks]
iv. With a significance level of α = 0.05, what is the critical region (that is, the range of values of the test statistic for which we reject the null hypothesis)? [2 Marks]
v. If α = 0.05, n = 100, and the sample mean is ¯(x) = 10.4 minutes, does the test statistic lie in the critical region? How should we formulate the result of the hypothesis test? [4 Marks]
vi. Assume that the time customers spend on the new website actually has a mean of 10.5 minutes with a standard deviation of 3 minutes. If a significance level of α = 0.05 is used for the hypothesis test, how large does the sample size n need to be in order for the statistical power 1 − β of the hypothesis test to be at least 0.95? [5 Marks]
(d) A company purchases components that have a mean lifetime of 100 hours.
i. If you know nothing else about the distribution of the component lifetimes, what inequality can you use to calculate an COMP2271 Data Science 2023Java upper bound on the probability that a newly purchased component has a lifetime of at least 150 hours, and what is the resulting upper bound? [3 Marks]
ii. Assume you additionally know that the standard deviation of the com- ponent lifetimes is 8 hours. Calculate a lower bound on the probability that the lifetime of a newly purchased component is between 90 and 110 hours. State the name of the inequality that you have used in your calculation (if any). [3 Marks]
iii. Assume now that you know that the lifetime of the components can be modelled very well as a normal distribution with mean μ = 100 and standard deviation σ = 8. What is the probability that the life- time of a newly purchased component is between 90 and 110 hours? [4 Marks]
Section B Computer Graphics
Question 2
This question relates to the graphics modelling and rendering of a football game environment. The football playground is modelled using a large rectangle with proper texture mapping applied. There are 22 football players moving on the playground during a game. Two movable virtual cameras are set up. One is called ’global camera’, which is designed to visualise the whole playground. The other one is called ’local camera’, which is designed to visualise a local region of the playground, focusing on a few active football players at any time.
Figure 1: A simplified view of the football game environment.
(a) Suppose the global camera is enabled and the local camera is disabled. Draw a scene graph for the football game environment, which incorporates the football playground, the 22 football players,a football, and the 2 virtual cameras. Assume each football player is modelled by a ’player’ node in the scene graph, so that you are not required to expand the modelling details of each football player. The marks will be given based on:
i. Correct structure and organisation of the football game environment. [8 Marks]
ii. Correct transformation operations involved. [6 Marks]
(b) Suppose the local camera is now used for visualising the football game environment most of the time during a football game.
i. Analyse why the scene graph in (a) is not optimal for supporting efficient rendering. Justify your answer. [6 Marks]
ii. Construct a new scene graph for the football game that can support efficient rendering with the local camera. Explain how the new scene graph should be updated during runtime and how the football game should be rendered based on the local camera. [9 Marks]
(c) Suppose you want to add some spotlights around the football playground to illuminate it.
i. Suggest what lighting method you should apply. In addition, state how the suggested lighting method can be implemented in WebGL. [7 Marks]
ii. Assume that the spot lights are all static in terms of their positions and illumination effects during a football game. Suggest a method to accelerate the rendering process and describe how this can be implemented. [5 Marks]
(d) To further enhance the football game environment, it is suggested to add a large number of audience (e.g., around 1000 people) sitting around the football playground.
i. Suggest a rendering-efficient solution to implement the above en- hancement. Justify your answer. [5 Marks]
ii. Based on your answer in (d)(i), describe a solution to support anima- tion of audience, e.g. animating audience applause