COMM5000 Data Literacy

COMM5000

Data Literacy

Case Study Project

Milestone 1 Information

Term 3, 2024

Case Study Information

Business context: In recent years, the growing interest in wine has fuelled the expansion of the wine industry. As a result, companies are investing in new technologies to enhance both wine production and sales. Quality certification plays a vital role in these processes and currently relies heavily on wine tasting by human experts.

Case/Scenario: You consult a winery and help this company to predict or estimate human wine taste preferences at the certification step. Knowing the wine quality will allow the winery to be better positioned to predict available amounts and yearly sales. It will also support the oenologist wine tasting evaluations by potentially improving the quality and speed of their decisions, and improve wine production. Furthermore, similar techniques can help in target marketing by modelling consumer tastes from niche markets. In order to predict wine quality you will use a dataset consisting of 4898 white and 1599 red vinho verde samples from Portugal's northwest region, and the statistical methods covered in this course.

Project objectives

1. Looking at the provided dataset, is there any relationship (positive or negative) that can be used between wine quality and any of the variables in the dataset? Does wine type (red or white) have an impact on our predictions of wine quality? If so, how can these relationships be used to predict wine quality, using the methods used in this class? The data provided is everything we know about these wines, and no external data sources are to be used. Moreover, no knowledge about the chemical properties of wines is assumed or required: this project should be seen as a business / statistical exercise. Please provide both quantitative and qualitative analysis supporting any findings.

2. In addition, based on your analysis towards the Objective (1) identify weaknesses and limitations of the chosen approach, and propose, at least in broad terms, a better approach. This proposed approach could include additional data to be included, or a methodology that is able to better deal with data limitations. Please also provide any supporting analysis for these additional considerations.

COMM5000 Context

This is a business question that is based on real data, although simulated for assessment purposes. The role you are to play is one of a consultant contracted by a winery to assist with the analysis of data using the COMM5000 data analysis tools, which include descriptive and inferential statistics.

The work will be scaffolded into two milestones M1 (20%) & M2 (20%) and a final project report (60%). Every milestone will require you to use what you have learned to address specific aspects of the data. Generally, M1 consists of an exploratory data analysis, whereas M2 is concerned with identifying hypotheses and formulating key inferential questions. In the final project report, all the insights gathered from M1 and M2 are used to model the data to answer the project questions.

M1 is a peer-reviewed assessment, which means that your assessment will be assessed by some random peer students. More details on this process below.

Schedule of engagement for the entire course

Upon request and as part of additional support for assessments one of the course teaching team members might hold consultation sessions throughout the term. It is very important that you attend these sessions where we will hold live synchronous sessions to provide more detailed information about the case study. During these sessions, you are free to ask questions and discuss any aspects of the project.

Milestone 1: Preliminary Insight Development

Description of assessment task

This first milestone aims to give you a better understanding of the datasets, variables, and questions in this Case Study. This exploratory data analysis seeks to get the necessary insights so that a development plan can be formulated to address the following key points of the case study project:

1) Data analysis: an in-depth description of the variables included in the dataset and the relationship between wine quality and alcohol.

2) Effect of Wine Type on estimated wine quality.

You must submit a written development plan summarising the finding from the data explorations, describing any patterns from comparing summary statistics of the variable of interest, and providing a plan on how you may address the key questions (1) and (2). The report should be concise and well written.

Please note that you are not required to fully answer (1) and (2) in this milestone. Instead, you are required to develop insights and understand the problem, as well as the datasets for your final project.

As a style. guide, you may include some or all tables/graphs as an appendix and refer to them as appropriate in your report. You should only include graphics and tables to support your analysis, conclusions, and findings. While preparing your paper, you will encounter numerous tables and graphs, which are irrelevant to the analysis. So be very selective and make good use of the page limit!

Approach to the assessment task

In week 1, we learnt how to represent the data using graphical tools, as well as numerical summaries. All these tools are meant to give us an idea of what the data are ‘trying to tell us’. Can we make sense of the large numbers of observations and tell a simple story or pick up a trend? This is what you will do in this milestone: understand the data and what we are trying to find out from the data.

(A) Expected Tasks

(i) Download the data. This assessment requires the download o f the Excel file provided on your course Moodle page (file name: “Vinho_Verde.xlsm”). The dataset is related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult: https://www.vinhoverde.pt/en/homepage or the reference [Cortez et al., 2009] which can be accessed from https://repositorium.sdum.uminho.pt/bitstream/1822/10029/1/wine5.pdf.

(ii) Data preparation. For this class, the data are already cleaned and complete. No cleaning is needed. However, you must explain any data manipulations you perform. and provide a rationale for them.

(iii) Variables of interest. Consider (1) and (2) above and focus only on the variables included in the provided dataset.

(B) The expected outcomes

The written work must provide a brief description of the Case Study problem and a clear plan of how the dataset provided will address the key questions raised in the project description. You will have the opportunity to adjust, revise and review this plan as we progress throughout the term. M1 analysis is based on COMM5000 content covered in weeks 1, 2 and 3.

(i) Numerical summaries of the key variables of interest: present descriptive statistics of the variables in the data. You may represent these results in the form. of tables.

(ii) These numerical summaries must be presented for 1) the entire sample, 2) only for red wines and 3) only for white wines.

(iii) For example, for each variable:

                               Mean              Mode            Median            SD             Min           Max

Variable name 1

(ii) Graphical representations of some variables if you deem it important for to capture a trend or some interesting patterns in the data.

(iii) Analysis of the relationship between wine quality and alcohol content. Use scatter plots and describe your findings

(iv) What conclusions can you make from the inspection of these data summaries in the form. of tables and graphs? For example, is there a pattern that you can identify?

(v) Your analysis should inform. your development plan to address points (1) and (2) in Milestone 2 and in the final report. This plan may be revised later during your work on Milestone 2.

Structure of the report

* Introduction You should briefly introduce the topic and summarise the purpose and importance of this project for the client. Then outline how this preliminary insight development plan will be structured. It is important to provide some background information on this topic. You can find relevant information from https://www.vinhoverde.pt/en/homepage or the reference [Cortez et al., 2009] which can be accessed from https://repositorium.sdum.uminho.pt/bitstream/1822/10029/1/wine5.pdf

* Data Summaries and Descriptive Statistics: Provide the necessary analysis to explore the variables. Describe the trends and stories that emerge from the data summaries. Are any patterns emerging from the graphs or tables you have constructed so far? Note: now that you have completed the first stage of data summaries, you have some basic insight into the dataset. You can use this information to develop some plans of action to address points (1) and (2) in Milestone 2 and in the final report.

* Conclusion: The conclusion should summarise the findings of your investigation and any concluding comments. It should also provide your plan for the next step of the analysis.

* References: Every piece of external documents you use needs to be properly referenced. Please include page and link so that your lecturer and tutor can efficiently check your work.

* It is suggested that you limit your report to a maximum of 8 pages including tables, graphs, and references.

Schedule of engagement for M1

Below you can find a summary of the deadlines related to M1. M1 requires not only the submission of the report (15%), but also a high-quality, constructive contribution to the assessment of the other students’ works (5%). This peer assessment involves being randomly assigned to critically review a number of other students’ submissions. You cannot choose which students to assess. You will 1) mark these submissions using a marking rubric to facilitate your job but also 2) leave constructive feedback. It is important that you leave high-quality, constructive feedback. Your feedback will be assessed, and you will also have the opportunity to evaluate the feedback you have received.

1. Week 4, Friday, deadline for submission of M1

2. Week 6, deadline for marking the submissions you were assigned to

3. Week 7, deadline for evaluating the feedback you have received

Submission instructions

• Via Moodle course site.

Supporting resources and links

- Dataset files: The Excel dataset file is available on Moodle. You only need to analyse the data that is included in this file.

- Weekly seminar: The seminar coordinator will cover relevant project aspects using Excel during the SEM session.

- Background information: See https://www.vinhoverde.pt/en/homepage or the reference [Cortez et al., 2009] which can be accessed from https://repositorium.sdum.uminho.pt/bitstream/1822/10029/1/wine5.pdf

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值