Java Python COMP3032 – Machne Learning
Assignment One (20 marks)
Due date: 11:59 pm Monday, 16 September 2024
Main objective:
❼ This assignment is to apply supervised and unsupervised machine learning tech-niques.
❼ You will have an opportunity to employ various regression models for predictions and classifications, to utilize the cross-validation approach for model selection, and to perform. PCA for dimensionality reduction.
Task1: Systolic Pressure Prediction using different models(12 marks):
1. Dataset Description: Blood pressure dataset pressure.csv contains examples of sys-tolic pressures along with various features from different individuals.
2. Polynomial Regression:
(a) Create polynomial regression models using the whole dataset to predict systolic pressure using the ”WEIGHT” feature, for polynomial degrees ranging from 1 to 14.
(b) Perform. 10-fold cross-validation.
(c) Compute and display the mean RMSEs of the 10-fold cross-validation for each of the 14 polynomial degrees.
(d) Produce a cross-validation error plot showing the mean RMSE for polynomial degrees from 1 to 14.
3. Model selection:
(a) Select the best polynomial degree and briefly explain your choice.
(b) Print the intercept and coefficients of the selected model.
4. Multiple linear regression:
(a) Create a multiple linear regression model to predict systolic pressure using all the other relevant useful features in the dataset.
(b) Print the intercept and coefficients of the model.
(c) Perform. 10-fold cross-validation.
(d) Compute and display the mean RMSE for the 10-fold cross-validation.
5. Ridge regression:
(a) Build a ridge regression model for the multiple linear regression model created in item 4 with a regularization parameter α = 0.1.
(b) Print the intercept and coefficients of the model.
(c) Perform. 10-fold cross-validation.
(d) Compute and display the mean RMSE for the 10-fold cross-validation.
6. Model comparison:
(a) Select the best model among the three (polynomial regression, multiple linear regression, and ridge regression).
(b) Briefly explain the reasons behind your choice.
COMP3032 – Machne Learning Assignment OnePython Task2: MNIST Digit Classification using PCA and Logistic Regression (8 marks):
1. Load the renowned MNIST (’mnist 784’) dataset, which consists of a large collection of handwritten digit images. Your task is to reduce the number of features first, and then build a binary classification model to distinguish between the digit “7” and all other digits (not “7”).
2. Prform. Principal Component Analysis (PCA) on the feature data to reduce its di-mensionality while retaining 90% of the overall explained variance.
3. Split the data into training and testing sets, using a common split ratio of 80% for training and 20% for testing.
4. Create a Logistic Regression model using the reduced feature dataset.
5. Use this model to predict the labels for both the training and testing dataset.
6. Print the number of principal components preserved. Print the prediction accuracy (proportion of correct predictions) of your model on the training set. Also, print the prediction accuracy, the confusion matrix, and the misclassified digits (i.e. wrong predicitons) of your model on the testing set.
7. Evaluate the model: What do you think of the model generated (good, underfitting, overfitting)? Briefly explain your reasoning.
Documentation:
1. You should write a readme file which contains:
(a) your name and student ID
(b) instructions on how to run your code
(c) test runs and their outputs (You can include screenshots)
(d) descriptions of your findings in Task 1 (item 3 and 6) , Task 2 (item 7)
(e) any limitations or issues if your program does not output the expected results
2. Your code should include necessary comments to clearly explain what each part of the code does and how it works.
Submission:
ALL relevant files (including the readme file, python program and data) should be zipped into a single file named StudentID.zip and submitted via vUWS. Be prepared to demonstrate your program if requested. Please note
1. It is students’ responsibility to ensure that they can upload successfully their sub-missions before the deadline.
2. students’ responsibility to ensure that their programs are runnable on the schools lab machines.
3. It is students’ responsibility to ensure that they keep a copy of their submission.
4