For part one you are required to submit two files a PDF/Word Document and an Excel
working file.
PDF/Word document:
1) Introduction (150 words): explain from your perspective the
objective of this coursework. 2) Stage one (200 words): summarise the tasks completed in this
part and reflect on the change of sample size from one task to another. 3) Stage two (500
words): reflect on your analysis and designing your display panel. Stage three (500 words):
provide short essay type of answers to the questions. 4) Conclusion (150): a self- reflection on
your learning after completing this coursework by providing (e.g. pros vs cons analysis).
Excel Spreadsheet:
you need to structure a similar spreadsheet to the one explained in the
“Case Study: US Crude Oil Trade Flows”, which is part of week 5 learning material.
In doing so you are required to complete the following tasks:
Stage One:
cleaning, manipulating and structuring the dataset.
You are required to clean and manipulate the dataset “AG-Tradeflows-2020” prior running
analysis and structuring your final display panel.
1) Before cleaning the data (by deleting not needed data columns) you need to filter the
dataset to only include shipments from “Saudi Arabia” and only for the period 2020 by
filtering the data columns “Load Country” and “Departure Data”, respectively and
accordingly.
(2%)
2) Create the following new variable “Cargo” by multiplying the column “Volume” by
1000 (Volume × 1000).
(2%)
3) Filter the dataset to include only the following indicators (columns): Vessel Name,
Vessel IMO, Load Port, Departure Date, Discharge Country, Discharge Port, Product,
Grade, Cargo, Discharge Country/Sub-Country, Discharge Region, Discharge Zone.
(2%)
4) Filter the dataset to exclude observations with missing values by deleting observations
that include blank or error data for the following data columns: Vessel Name, Load
Port, Departure Date and Discharge Port. (2%)
5) Create the following new variables: Vessel Type and DWT by merging information
from the second table (LOOKUP) sheet into the main dataset. (6%)
6) After merging both datasets use the new structured column data indicator “Vessel
Type” to filter the dataset to only include four vessel types namely, 1) Crude Oil Tanker,
2) Products Tanker, 3) Chemical/ Products Tanker, 4) Crude/Oil Products Tanker.
(2%)
7) The final sample should only include the following variables “Vessel”, “Vessel Type”,
“DWT”, “Load Port”, “Departure Date”, “Discharge Country”, “Discharge Port”,
“Product”, “Grade”, “Cargo”, “Discharge Country/Sub-Country”, “Discharge
Region” and “Discharge Zone”. (2%)
i.
Check the data type (format) of these variables and if necessary, modify the data
type.
ii.
Check if these variables contain missing values. Exclude all observations where at
least one of these variables contain missing values.
1
(2%)
Note: each step should be clearly shown in a separate sheet of the spreadsheet
Stage Two:
data analysis and designing the display panel.
8) Create a monthly time series of vessels shipments (a count of number of monthly fixed
ships), total cargo shipped and cargo capacity utilization.
(3%)
9) Plot a monthly time series showing total number of vessels shipments and total cargo
capacity loaded onboard ships. You need to provide a table with the data used to plot
the time series.
(3%)
10) Identify the month that had the highest number of vessel shipments, the most loaded
cargo in tonnes and percentages of cargo capacity utilization.
(3%)
11) Structure tables and provide suitable illustrations that categorises total shipments and
cargo capacity by vessel type, load port, type of product and discharge zone. (3%)
12) Similar to the Case Study: US Crude Oil Trade Flows, which is part of week 5 learning
material, you need to structure your spreadsheet providing tables and illustrations and
design a display panel.
(3%)
Note: each step should be clearly shown in a separate sheet of the spreadsheet
Stage Three:
answer the following questions.
13) Historical time series may contain useful information that are useful for decision
makers. Do you see any pattern in the monthly time series of vessels shipments and
total cargo capacity?
(5%)
14) Forecasts are required to support decisions in the future. We need to provide forecast
that supports operational planning one month in advance. Use naïve and simple moving
average to provide one-month ahead forecast. Reflect on which approach do you
recommend using for this forecasting task? Explain your answer and plot your
forecasts.
(5%)
15) Reflect on how useful a Linear Programming method for this type of data (e.g. cargo
capacity, amount of cargo shipped, different sizes of vessels, … etc.).
(5%)
Data analytics is generally used to provide evidence and inform decisions. In a typical business
data analytic task, you can use data to inform decisions, verify claims and assumptions, answer
or refine questions. In this part of the coursework, you are first asked to choose a dataset and
discuss a relevant problem to the dataset that needs to be informed by data analysis, it could be
in the form of questions, claims or assumptions.
For you to have the greatest chance of success with this coursework it is important that you
choose a manageable dataset. This means that the data should be readily accessible and large
enough that multiple relationships can be explored. As such, your dataset must have at least 50
observations (rows) and between 3 to 5 variables (columns). The variables in the data should
include categorical variables, numerical variables, or date/time variables. The dataset format
could be in the format of text(.txt) or excel (.csv /.xls, .xlsx)
If you are using a dataset that comes in a format that we haven’t encountered in class, make
sure that you are able to load it into R as this can be tricky depending on the source. If you are
having trouble, ask for help before it is too late.
Note on reusing datasets from class:
Do not reuse datasets used in examples, homework
assignments, or labs in the class.