COMP 309 — Machine Learning Tools and Techniques Assignment 3: Kaggle Competition-CSDN博客

本文链接：https://blog.csdn.net/weixin_41993251/article/details/126596126

IF you want the assignment's solution, please add my wechat: fuji12345

1 Objectives

The goal of this assignment is to help you tie together all the concepts you have learnt in the first half of this course

in the lectures and assignments. To aid you in completing this assignment, you should review the major aspects of

the course that have been explored so far, such as:

• Data understanding, cleansing, and pre-processing,

• Machine learning concepts,

• CRISP-DM and pipelines in general,

• Feature manipulation, including feature selection, feature construction and imputation,

• Statistical design and analysis of results.

These topics are (to be) covered in Weeks 1–7. Research into online resources for AI is encouraged, where the

rabbit-hole 1 will provide useful jumping off points for further exploration.

2 Question Description

Many of us use some music streaming app to listen to music. These apps usually make personalized playlists to cater

to each user’s need. But what is the logic behind the personalized playlist? One general example is to have a Music

Genre Classification System . More specifically, creating a machine learning model, which classifies music samples

into different genres using various audio features.

The overall aim of this assignment is to develop the best possible machine learning system to predict the genres of

music . The task is to classify the music tracks into one of ten genres based on the provided audio features. The

data for each track includes both textual features e.g. artist and track names, numerical descriptors e.g. duration

and various audio features. The hope is that your model will identify the relationship between music genres and the

audio features.

We have set up a Kaggle InClass Competition 2 to facilitate finding the best machine learning system for officials to

use. You are expected to analyse the provided data, design and improve your own machine learning pipeline, and

consider the consequences of applying your pipeline to this data.

Note the data is real. Thus, you could attempt to find the original dataset and create a look-up table. This is not

permitted as it misses the point of the course. We want to see your analyses, rather than see perfect results.