CS39A0190 Data analytics basics


ASSIGNMENT 2 INSTRUCTIONS
CS39A0190 Data analytics basics
”DATASETS”-file has three sheets. First sheet ”DATASET1” is its own task and sheets two
and three, ”DATASET2_1” and ”DATASET2_2” are another one.
The datasets needs to be cleaned before they can be analysed. You can turn the data into
table or keep as a range, whichever you find easier to work with.
DATASET1 basic info
The data is process data where Events go through a process from A to G.
Process step colums A to G have dates as data. The date depicts when the process step began. Process
step G means completion of the whole process.
DATASET2 basic info
The data is sales data for product range. SKU (stock keeping unit) is assigned to each unique product.
Agency depicts a retailer. Volume is the amount sold during the time and average max temp describes
the temperature at the retailer for the sales period.
DATASETS
Cleaning the data
Remove empty rows.
Remove incomplete Event records. Every process step must have a date.
Remove duplicate Event records. Each Event ID is unique.
If Excel does not recognize the process step data as dates, convert the cell contents into dates.
Event IDs and Event types are numbers by format.
Analysing the data
Calculate the duration (in days) between each process step and the total duration for all Events.
Calculate the average duration between each process step and the average total process duration for all
Event types.
Example for a single Event:
DATASET1 – 3 POINTS
First, use Text to Columns to get the comma separated data into separate columns.
Cleaning the data
Both DATASET2_1 and DATASET2_2 sheets have a column that uses dot as a decimal separator. If your
Excel has comma as a decimal separator, convert the dots to commas.
Check for duplicates, missing data, empty rows... and handle accordingly if found.
Analysing the data
Add the Avg_Max_Temp data from DATASET2_2 to DATASET_1. See example below.
Calculate monthly total sales (all SKUs) for each Agency.
Calculate annual total sales for each SKU.
DATASET2 – 3 POINTS
Data is from Eurostat, population data. Actual national population data from 2021 and
projections for 2022-2100.
Projections has multiple scenarios in separate sheets.
Your task is to combine the data for EU countries + Iceland, Norway and Switzerland (30
countries in total).
Open a blank Excel spreadsheet and connect to both of the Excel files to get data.
Clean the data, merge the datasets, and clean the merged data if necessary.
You will return results only for the 30 countries.
As a result, you will have for each country a 2021 actual population (total, males, and
females) and projections for 2022-2100 (total, males, and females).
From projections, use two scenarios. Baseline and a free choice for the second.
POWER QUERY – 6 POINTS
Find two datasets (from anywhere) that you can connect to and combine in Power Query.
Use two different sources for the data.
E.g., one from CSV and one from Web
Clean the data using Power Query and combine them into one dataset.
FIND AND MERGE – 3 POINTS
Return three Excel-files.
DATASETS
POWER QUERY
FIND AND MERGE
All used functions and Queries must remain in the submitted files. Being able to see the
data transformation process in the files is more important than the end result.
WX:codehelp​

  • 7
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值