MCD2080 Business Statistics Trimester 2 2024Python

Java Python MCD2080 Business Statistics

Trimester 2, 2024

Group Assignment

Problem background: Glassdoor.com

Glassdoor is a free digital platform. that gathers information and reviews from employees or former employees about companies, salaries, and even job openings.

The dataset used for this group assignment contains a random sample of job advertisements from Glassdoor.com. It is used to analyse the current job trends in the data science field based on job positions, company size, software skills, etc.

Refer to the workbook labelled Job Advertisements.xlsx in the Group assignment section on Moodle. This data can be used to understand various software skill requirements and other factors in job advertisements for Data Analysts, Data Engineers and Data Scientists. In this assignment, your task is to investigate and report how the expected salary is associated with various factors such as job types and software skills requirements.

Data definition:

In the file “Job Advertisements.xlsx”, you are provided with both numeric and categorical data. Note that this data has already been cleaned for you, and any missing records are removed. The following table contains the data definition.

Column

Column Name

Data Definition

A

Advertisement ID

The unique identifier for the job posting

B

Job Type

A simplified job title

C

Company Name

Full name of the company the advertisement is posted for

D

Company Size

Range of number of employees in the company

E

Ownership Type

Company type of ownership. 8 ownership types provided

F

Industry

The industry to which the organisation belongs

G

Min Salary

Minimum expected salary ($ 000 per year) for the job

H

Expected Salary

Average expected salary ($ 000 per year) for the job

I

Python

A binary indicator of whether the job requires Python knowledge/skills (1:Yes, 0:No)

J

AWS

A binary indicator of whether the job requires AWS knowledge/skills (1:Yes, 0:No)

K

Excel

A binary indicator of whether the job requires Excel knowledge/skills (1:Yes, 0:No)

Purpose:

We wish to explore the relationships between the expected salary and other independent variables. This is done by utilising the following statistical tools:

1.      Pivot Tables and Charts

2.      Summary Statistics

3.      Confidence Intervals

4.      Hypothesis Testing

5.      Regression Analysis

Assignment questions:

Answer all questions.

Week 4 Checkpoint: Do question 1

1 a). Discuss and compare the average expected salary for Data Engineers and Data Analysts using the following factors:

Ownership

Industry

Construct appropriate charts to support your discussion. Keep your discussion succinct.

Your answer to this question should not be longer than 1-2 pages.

b). We wish to compare the distribution of the expected salary between data analysts and engineers.

Generate Summary statistics and histograms and use them to compare the distributions. In your discussion, include measures of central tendency, variability and shape.

When discussing, include contextual interpretations of the measures used.

Your answer to this question should not be longer than 2 pages.     (14 marks)

Week 7 Checkpoint: Do questions 2 & 3.

2. We will now explore the relationship between the expected salary of Data Analysts and Data Engineers.

a). Calculate the 95% Confidence Interval estimate of the true average expected salary for Data Analysts and Engineers. Report your results using the table below.

Confidence Interval Estimate of Average Expected Salary for Job Types

Job Type

Lower Boundary / Limit

Upper Boundary / Limit

Data Analysts

Data Engineers

b). Calculate the 95% Confidence Interval estimate of the true average expected salary for Data Analysts and Engineers who have the following software skills:

•    Excel

•   Python

•   AWS

For each variable, report your results using the following format in the examples provided.

Confidence Interval Estimate of Average Expected Salary of Data Analysts requiring Excel Skills

Excel Skills

Lower Boundary / Limit

Upper Boundary / Limit

0 (No)

1 (Yes)

Confidence Interval Estimate of Average Expected Salary of Data Engineers requiring Excel Skills

Excel Skills

Lower Boundary / Limit

Upper Boundary / Limit

0 (No)

1 (Yes)

(Please use a similar format for Python and AWS)

c). Discuss your results obtained in (a) and (b). Remember to discuss answers for all tables produced.

For part (c) only, the expected length of the answer should be less than a page.    (20 marks)

3. We wish to disentangle the relationship between expected salary and Excel skills in each job type.

Use your knowledge in Hypothesis Testing to answer the following questions.

a). Do a majority/minority of data analyst roles require Excel skills?

b). Do a majority/minority of data analyst roles require Python skills? c). Do a majority/minority of data engineer roles require Excel skills? d). Do a majority/minority of data engineer roles require Python skills?

Hint: For each test, state the hypotheses, p-value and conclusion in the context of the question.   (6 marks)

Week 11 Final presentation and report submission: Do questions 4 & 5.

4. Estimate a multiple regression model to analyse the relationship between:

Expected salary and all other variables, such as three software skills, the two job types (data analysts and data engineers), and the minimum salary. You are required to produce one multiple regression output.

This section includes an analysis of the statistical significance of various factors in the model. Highlight the key factors that the multiple regression reveals as being the driver of Expected  Salary.

Your answer to this question should be approximately 1 to 1.5 pages.    (15 marks)

5. Based on the statistical analysis and results in questions 1 to 4, draw conclusions on the following:

a). All factors associated with Expected Salary.

b). The importance of software skills for different job types

c). Recommendations for job seekers to improve their ability to obtain higher-paying employment.

Your answer to this question should be approximately 1 to 1.5 pages.   (20 marks)

Assignment marks

The maximum total mark for the assignment is 175. Your total score will be composed of two parts:

•    Final assignment report (Questions 1-5): maximum marks of 75.

• Presentation: a maximum mark of 100

(i). Week 4 checkpoint - 20 (staff: 10 & peer to peer evaluation: 10)

(ii). Week 7 checkpoint - 30 (staff:15 & peer to peer evaluation:15)

(iii). Week 11 checkpoint - 40 (staff:20 & peer to peer evaluation:20)

Please note that any group member who will not give feedback to other group members will be awarded zero marks.

You will be required to fill in the peer evaluation on Teammates to be eligible for this component.

Please note that the Unit Leader reserves the right to adjust individual report marks based on the peer evaluation. Should the feedback indicate that an individual did not contribute to the group assignment, the reporting mark will be adjusted to zero, implying that the individual’s group assignment contribution to their final grade will be 0%.

Report requirements:

●     All answers should be in font size 12pt and 1.5 spacing.

●     Plots and tables must be legible, with appropriate labels to aid readers.

●     Statistical results need to be summarised in succinct table formats.

●     You will lose marks for poor presentation.

Presentation:

Use PowerPoint or other cloud-based apps eg Google slide, Prezi or Visme, etc.

Week 11 Final Assignment submission guidelines

•     The   link   is   set   up   using   an   Assignment   Tool   on   Moodle.   Please   submit   the   group Report/Answers in Word document or PDF.

•     If the question has sub-parts, for example, (a), (b) …, please indicate the labels for each part clearly.

•     DO NOT click on "submit all and finish" before you finish all questions.

ONLY 1 attempt is allowed for the Assignment. Group members should appoint one member to submit on behalf of the group         

  • 16
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值