ISE529 Predictive Analytics 2024 Fall Homework 1Python

Java Python ISE529 Predictive Analytics

2024 Fall

Homework 1

Due by: Sept. 17, 2024, 11:59 PM

Instructions:

1.   Print your First and Last name and NetID on your answer sheets

2.   Submit all your answers including Python scripts and report in a single Jupyter Lab file

(.ipynb) or along with a single PDF to Brightspace by due date. No other file formats will be graded. No late submission will be accepted.

3.   Total 7 questions. Total points: 100

1. (16 points)

For each of parts (a) through (d), indicate whether we would generally expect the performance of a flexible statistical learning method to be better or worse than an inflexible method. Justify your answer.

(a) The sample size n is extremely large, and the number of predictors p is small.

(b) The number of predictors p is extremely large, and the number of observations n is small.

(c) The relationship between the predictors and response is highly non-linear.

(d) The variance of the error terms, i.e. σ2 = Var(ε), is extremely high.

2. (20 points)

Answer the following questions.

(a) Provide a sketch of typical (squared) bias, model variance, training error, test error, and irreducible error curves, on a single plot, as we go from less flexible statistical learning methods towards more flexible approaches. The x-axis should represent the amount of flexibility in the method, and they-axis should represent the values for each curve. There should be five curves. Make sure to label each one.

(b) Explain why each of the five curves has the shape displayed in part (a).

(c) According to the bias-variance decomposition, show that the following equation holds.

3. (12 points)

If random variable X follows Poisson distribution with parame ISE529 Predictive Analytics 2024 Fall Homework 1Python ter λ, use the definition of expectation and variance learned in the class, show that its E(X) = Var(X) = λ .

4. (12 points)

Let X be the amount (in ounces) of soft drink in a randomly chosen bottle from company A, and Y be the amount of soft drink in a randomly chosen bottle from company B. A study has shown that the probability distributions of X and Y are as follows:

Find E(X), E(Y), Var(X), and Var(Y) and interpret them.

5. (15 points)

The table below provides a training data set containing six observations, three predictors, and one qualitative response variable. Use Jupyter Lab with Python to answer the following questions.

Suppose we wish to use this data set to make a prediction for Y when X1 = X2 = X3 = 0 using K- nearest neighbors.

(a) Compute the Euclidean distance between each observation and the test point, X1 = X2 = X3 = 0.

(b) What is our prediction with K = 1 or 3? Why?

(c) If the Bayes decision boundary in this problem is highly nonlinear, then what would be the best choice for the value of K? Why?

6. (25 points)

Given the Auto data set (see attached Auto.csv), use Jupyter Lab with Python to answer the following questions. Make sure that the missing values in the data set have been removed before analysis is performed.

(a) Which of the predictors are quantitative, and which are qualitative?

(b) What is the range of each quantitative predictor?

(c) What is the mean and standard deviation of each quantitative predictor?

(d) Now remove the 10th through 85th observations. What is the range, mean, and standard deviation of each predictor in the subset of the data that remains?

(e) Using the full data set, investigate the predictors graphically, using scatterplots or other tools of your choice. Create matrix of scatter plots highlighting the relationships among the predictors.

(f) Suppose that we wish to predict gas mileage (mpg) on the basis of the other variables         

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值