Java Python ITS70504
Data Engineering
Assignment 1 - Individual
Subject Code ITS70504
Subject Name Data Engineering
Weightage 15%
Assignment Individual Assignment
Handout Date 18th June 2024
Submission Date 1st July 2024
Learning outcomes assessed by this assignment
1. Demonstrate the concept of data engineering.
Instructions
1. This individual assignment carries 15% of the total marks available for the module.
2. The output of this assignment is in terms of a report written within the range of between 1800 to 2500 words.
3. You are required to submit your report in softcopy (through MyTIMeS-Moodle). Kindly ensure your name and ID are written on the cover sheet.
4. Using AI tools is not recommended. But, in the case of using any AI tools, below instructions must be followed:
a. The AI tool must be cited properly.
b. The output of the AI tool must be interpreted.
c. At least 3 improvements to the AI-suggested answers must be discussed.
d. Student needs to present the report in the class, defend the answers, and provide required justifications if it is needed.
5. 0 mark and barring from sitting the final examination may be implemented for those who do not submit any assignments.
GENTLE REMINDER:
- Plagiarism is a serious offence and plagiarized work will result in an F grade.
- Failure to submit the report by the deadline shall be penalized with zero marks.
Assessment Criteria
Assessment Task | Weightage | MLO Assessed | Formative/ Summative | Assessment Instrument | Topics | Week | MQF2.0 |
Assignment 1 | 15% | MLO 1 | Formative | Individual Assignment. Case Study | 1,2, 3 | 3 | C1, C2, C3A |
C =Knowledge & nderstanding, C=Cognitive Skills, C= Practical Skills, C3= Interpersonal Skills, C3C=Communication Skills, C3D= Digital SkillsC3ENumeracy Skills, C3FLeadership, Autonomy & Responsibility, C4APersonal Skills, C4B=Entrepreneurial Skills, C5Ethics & Professionalism
Case Study: Data Engineering at a Retail Company
A large retail company, ShopEase, operates both online a ITS70504 Data Engineering Assignment 1C/C++ nd offline stores, serving millions of customers globally. The company has accumulated vast amounts of data from various sources, including customer transactions, online browsing behaviours, inventory management systems, and social media interactions. The data is stored in different formats and systems, creating challenges for analysis and decision-making. ShopEase aims to improve its data infrastructure to enable better data-driven decisions, enhance customer experience, and optimise operations.
Answers the following questions:
Criteria 1: Introduction to Data Engineering (20%)
1) Discuss the concept of data engineering and explain its importance in the context of
ShopEase.
a. Discuss what is data engineering in a general context. (10 Marks)
b. Explain how data engineering can help ShopEase manage and utilize its vast amount of data effectively. (10 Marks)
Criteria 2: Data Preprocessing: Concepts & Techniques (20%)
2) Identify and describe Four (4) data preprocessing techniques that would be critical for preparing ShopEase's data for analysis.
a. Explain each technique and its relevance to the case study. (10 Marks)
b. Provide examples of how these techniques can be applied to ShopEase's data. (10 Marks)
Criteria 3: Data Storage Technologies (20%)
3) Evaluate different data storage technologies suitable for ShopEase’s diverse data types (structured, semi-structured, unstructured).
a. Compare at least TWO (2) data storage technologies. (10 Marks)
b. Discuss the advantages and disadvantages of each in the context of ShopEase’sneeds. (10 Marks)
Criteria 4: Big Data Frameworks for Data Engineering (20%)
4) Recommend a big data framework that ShopEase should adopt for its data processing needs. Justify your choice.
a. Describe the selected big data framework. (10 Marks)
b. Explain why it is suitable for handling ShopEase’s large-scale data processing. (10 Marks)
Criteria 5: Distributed File System (20%)
5) Explain the role of a distributed file system in ShopEase’s data architecture.
a. Define a distributed file system. (10 Marks)
b. Discuss its benefits and potential challenges for ShopEase. (10 Marks)
Note 1: Using tables to summarise your points and adding figures to provide some illustrations are highly recommended.
Note 2: In the case of using any figures or tables from any external sources, proper citations must be added to their captions and inside your description of those figures/tables.
Note 3: The total achieved marks will be capped at 15 marks
Deliverables
The output should be in terms of:
1. Assignment Report (Softcopy in PDF)
2. Cover page
3. References (APA Referencing Style: www.apa.org or http://www.apastyle.org/index.aspx or
https://owl.english.purdue.edu/owl/resource/560/01/).
4. Report of similarity (maximum accepted similarity is 20%)