UFCFVQ-15-M Programming for Data Science


College of Arts, Technology and Environment
aCADEMIC YEAR 2023/24

Resit Assessment Brief
Submission and feedback dates
Submission deadline:    Before 14:00 on 15th July 2024 
This assessment is eligible for 48-hour late submission window.
Marks and Feedback due on: 12turrent local time (at time of submission) in the UK.
Submission details
Module title and code:    UFCFVQ-15-M Programming for Data Science        
Assessment type:    Practical Skill Assessment
Assessment title:    Practical Coursework        
Assessment weighting:    100% of total module mark
Size or length of assessment: No word limit; Development time 20 hours
Module learning outcomes assessed by this task:
1. Apply the principles of programming and data management to solve problems.
2. Demonstrate the use of an object-oriented paradigm when solving software problems.
3. Design and implement algorithms for numerical analysis.
4. Demonstrate the use of proactive error handling techniques to address software reliability and program vulnerability issues.
5. Critique and reflect on alternative solutions to a given problem or on their own work in a constructive way.
6. Undertake independent research activities with relation to innovative approaches to data science problem solving.
7. Demonstrate the use of Data Visualisation techniques for supporting numerical data analysis.
8. Demonstrate the use of a version control system (such as Git) as part of an integrated development process.
Completing your assessment 
What am I required to do on this assessment?
For this assessment, you are required to complete four different tasks. A brief outline is given below. Exact details of what is required are given in Appendix 1.
1.Develop a set of functions to solve a programming problem using ONLY built-in Python functions and data structures.
2.Perform basic data analysis of a given dataset and identify an “interesting” pattern or trend within the data.
3.Write a reflective report about the process you followed while developing solutions to the two main programming tasks (i.e., 1 & 2 above).
Where should I start?
To demonstrate your understanding and programming skills it is important that you develop a sufficient knowledge of the module materials and gain practical experience of coding in Python before you begin this assessment. You should read the detailed description of each task given in Appendix 1. 
Firstly, you should create a GitHub account and follow the instructions given by the tutor for accessing the GitHub Classroom that has been set up for this assessment. How to complete this will be covered during one of your workshops. In addition, there is a pre-recorded explanation of how to do this available in the Assessment folder on Blackboard. Secondly, you need to clone your GitHub repository to your local machine. Now, you should open a Jupyter Notebook console from Anaconda Navigator and load the Resit Programming Task 1 Template. You can now begin working through the programming requirements set out in Section A
What do I need to do to pass? 
To pass this coursework assessment you will need to achieve an overall mark of 50% or above. Realistically, this will not be possible without at least attempting both programming tasks. However, you should make sure to attempt the other two task to ensure that you have maximised your mark for this assessment.
How do I achieve high marks in this assessment? 
High marks can be achieved by carefully following the requirements set out in Appendix 1. Marks will be deducted for solutions which do not follow the requirements precisely. In addition, you should make sure that you demonstrate good coding standards, write an insightful reflective (rather than descriptive) process report, and follow all naming conventions set out in this assessment.
How does the learning and teaching relate to the assessment? 
Week 1 focuses on Git and so following this material is important for accessing the assessment materials and submitting your work. Weeks 2 through 6 focus on basic Python programming. You should pay particular attention to Week 6 to identify built-in functions. These are important for the first task. Weeks 7 through 9 focus on how to use Python for data analysis and are important for the second task. Week 11’s Data Science demonstration may also be useful for the second task.
What additional resources may help me complete this assessment?
Additional resources that you might find useful for completing this assessment include:
Reflective Writing course at https://xerte.uwe.ac.uk/play_4988
Referencing information at https://www.uwe.ac.uk/study/study-support/study-skills/referencing
Module Discussion Boards: Coursework Queries and FAQs
The Module Leader and Module Tutors will also available via email to clarify any issues you may be having with the assessment. Formative feedback can be requested during the tutorial sessions.  
What do I do if I am concerned about completing this assessment?
UWE Bristol offer a range of Assessment Support Options that you can explore through this link, and both Academic Support and Wellbeing Support are available.
For further information, please see the Academic Survival Guide.
How do I avoid an Assessment Offence on this module? 2
Use the support above if you feel unable to submit your own work for this module. The most common form of Assessment Offense for this type of assessment is copying code from another source (such as a forum, webpage, another student, etc) without referencing (and citing) it correctly. Referencing is an important part of academia, and you should become clear about when you need to reference an external source and how to reference it (more information is available in the study skills link above). However, it should be made clear that any copied code may result in partial marks for any sub-task in which it is used.
During the marking phase, an analysis of submissions will be made across the cohort to identify any evidence of collusion and/or plagiarism.  
UWE Bristol’s UWE’s Assessment Offences Policy requires that you submit work that is entirely your own and reflects your own learning, so it is important to:
Ensure you reference all sources used, using the UWE Harvard and the guidance available on UWE’s Study Skills referencing pages. 
Avoid copying and pasting any work into this assessment, including your own previous assessments, work from other students or internet sources.
Develop your own style, arguments, and wording, so avoid copying sources and changing individual words but keeping, essentially, the same sentences and/or structures from other sources.
Never give your work to others who may copy it
If an individual assessment, develop your own work and preparation, and do not allow anyone to make amends on your work (including proof-readers, who may highlight issues but not edit the work)
 
When submitting your work, you will be required to confirm that the work is your own, and text-matching software and other methods are routinely used to check submissions against other submissions to the university and internet sources. Details of what constitutes plagiarism and how to avoid it can be found on UWE’s Study Skills pages about avoiding plagiarism.
Marks and Feedback
Your assessment will be marked according to the marking criteria set out in each task in Appendix 1. You can use these to evaluate your own work before you submit. 

Appendix 1 – Assessment Overview
This single coursework assessment involves four separate tasks. The requirements for each task are detailed below together with deliverables, submission details and grading criteria. Below is a breakdown of percentage weighting per task:
Task    % Weighting
Programming Task 1    48
Programming Task 2    38
Process Development Report    14
Total    100

Section A. Programming Task 1
This programming task focuses on using Python to calculate a set of Student’s t-test statistics for a given dataset using ONLY built-in functions and data structures.
oFor Programming Task 1, you MUST NOT import any Python library functions. This means you cannot use Python modules such as math, SciPy, csv or libraries such as Pandas or NumPy.
To print the Student’s t-test statistic for a given pair of Python Lists, it would be very easy to use the ttest_rel() function provided in the SciPy library. However, this programming task is designed to assess your coding abilities and by preventing you from using this function you are forced to gain a deeper understanding of how to complete that task. To do this, you will need to develop your own algorithm. Try typing “calculate Student’s t-test statistic by hand” into your favourite search engine.
For your information, a t-test statistic values greater than 1.972 indicates a statistically significant result at a level of 5% (assuming a paired two-tailed test).
There is a single data file available in your resit GitHub repository for use in this programming task. The file contains data about the prevalence of mental health disorders in countries around the world in 2017 based on different age groups.
oThe data file is called resit_task1.csv. This CSV file includes a header row with multiple named data values. 
oThis file is available in the Resit Materials section on Blackboard.
Students are expected to follow appropriate coding standards such as code commenting, docstrings, consistent identifier naming, code readability, and appropriate use of data structures.
A.1. Requirements
ID    Requirement    Description    Marks Available
FR1    Develop a function to read a single specified column of data from a CSV file    The function should accept two parameters: the data file name and a column number. The column number specifies which of the columns to read. It can range between 0 and n-1 (where n is the number of columns). The function should return two values: the column name and a List containing all the specified column’s data values. You should use the resit_task1.csv data file to test your function but your function should also work for other CSV files. An illustration of this is given in Appendix 2.    6
FR2    Develop a function to read CSV data from a file into memory    The resit_task1.csv data file contains several columns of data values. This function should accept a single parameter: the data file name. It should make use of the function developed in FR1 to read all columns of data from the data file and add them to a Dictionary data structure. The Dictionary should contain one entry for each column in the CSV data file. An illustration of this is given in Appendix 3.    6
FR3    Develop a function to calculate a paired Student’s t-test statistic for two lists of data     This function should calculate a paired Student’s t-test statistic for two lists of data. The function should take two lists of data (of equal length) as parameters. The function should ensure that the lists are of equal length otherwise raise an error. The function should return the calculated statistic value.    12
FR4    Develop a function to generate a set of paired Student’s t-test statistics for a given data file    The function should accept one parameter: the Dictionary data structure generated in FR2. This function should make use of the function developed in FR3 to generate a paired Student’s t-test statistic for every pair of columns in the input data structure parameter. The function should return a list of tuples, each tuple containing the two column names and associated statistic value. An illustration of this is given in Appendix 4.     10
FR5    Develop a function to print a custom table    This function should output the paired Student’s t-test statistics for a subset of the column pairs generated in FR4. The function should take three parameters: list of Student’s t-test statistic tuples, border character to use and which columns to include. You should indicate values which are statistically significant values (at the level of 5%) using stars, e.g., *2.43*. High marks will be given for good use of padding in the table cells to improve readability. An illustration of this is given in Appendix 5.    9

A.2. Deliverables
A Jupyter Notebook file (in .ipynb format) containing a complete solution to this Programming Task. 
oYou must use the template provided[ There is a Jupyter Notebook template available in your GitHub repository - UFCFVQ-15-M_Resit_Programming_Task 1_Template.ipynb].
A.3. Submission
You should commit your completed Jupyter Notebook file to your resit GitHub repository with an appropriate commit message.
A.4. Grading Criteria
Marks are allocated as follows: 
oup to 43 marks for the Python code solution
Marks will be awarded for each requirement according to the level of completion.
To gain high marks you must follow the requirement instruction precisely. 
oup to 5 marks for adherence to good coding standards.
Section B. Programming Task 2
This programming task focuses on using NumPy/SciPy, Pandas, and Matplotlib/Seaborn to combine and analyse two datasets related to bike sharing in London between 2015 and 2017.
Two data files have been provided in your GitHub repository for this task. 
oThe resit_task2a.csv data file contains the number of bike shares per hour between January 2015 and January 2017.
oThe resit_task2b.csv data file contains the temperature, “feels like” temperature, humidity, wind speed for every hour between 2015 and 2017.
Students are expected to follow appropriate coding standards such as code commenting, consistent identifier naming, code readability, and appropriate use of data structures.
B.1. Requirements
ID    Requirement    Description    Marks Available
FR6    Read CSV data from two files and merge it into a single Data Frame    For this task you should use the resit_task2a.csv and resit_task2b.csv data files.    4
FR7    Explore the dataset to identify an "interesting" pattern or trend[ An “interesting” pattern or trend might include a correlation between two columns of data, equality of two columns of data or estimating a linear or non-linear relationship between columns of data.]    Use an appropriate visualisation tool (such as Matplotlib or Seaborn) to illustrate your exploration. You should include at least three visualisations as part of your exploration. You could consider other ways to explore the data such as data summaries or transformations. You must include an explanation of the dataset exploration, your selected "interesting" pattern or trend and your reasons for selecting it.    10
FR8    Detect and remove any outliers in the data used for your "interesting" pattern or trend    Using an appropriate technique to detect and remove any outliers in the data used for your "interesting" pattern or trend. You must include an explanation of the detection method used, how it works, and the any outliers detected. NOTE: there may not be any detectable outliers using the selected detection method – if this is the case, please state this clearly in the explanation given.    6
FR9    Define a hypothesis to test your “interesting” pattern or trend    Using an appropriate hypothesis testing formulation to define a hypothesis and provide an explanation for your choices.    6
FR10    Test your hypothesis with statistical significance level of 0.05    Using an appropriate Python library, test the hypothesis stated in FR9. You must include a detailed explanation of your findings to achieve good marks for this task.    7

B.2. Deliverables
A Jupyter Notebook file (in .ipynb format) containing a complete solution to this Programming Task. 
oYou must use the template provided [ There is a Jupyter Notebook template available in your GitHub repository - UFCFVQ-15-M_Resit_Programming_Task_2_Template.ipynb].
B.3. Submission
You should commit your completed Jupyter Notebook file to your GitHub repository with an appropriate commit message.
B.4. Grading Criteria
Marks are allocated as follows: 
oup to 33 marks for the Python code solution
Marks will be awarded for each requirement according to the level of completion. 
To gain high marks you must follow the requirement instruction precisely. 
oup to 5 marks for adherence to good coding standards.
Section C. Process Development Report
You are expected to identify the strengths/weaknesses of your approach to your coding tasks. 
For this coursework, you must write a reflective report which focuses on the process you took to develop a solution to the two programming tasks described in Section A and Section B above. Please reflect on your experiences rather than simply describing what you did. 
The report must be split into TWO different sections – one for each programming task. 
Each section should: 
oinclude an explanation of how you approached the task:
describe your thought process.
did you find it easy or difficult? Why?
what problems did you encounter? How did you overcome them?
oidentify any strengths/weaknesses of the approaches used.
oconsider how the approaches used could be improved.
osuggest alternative approaches that could have been taken instead of the ones you used.
C.1. Requirements
The development process report MUST be submitted in .docx format – pdf, pages, or any other file format will NOT be accepted for this task.
The report must not exceed 800 words. Please indicate the word count at the end of the document.
C.2. Deliverables
A development process report written in .docx format.
C.3. Submission
You should commit the report to your GitHub repository with an appropriate commit message.
C.4. Grading Criteria
There are 14 marks available for the report – 7 marks per section.
oMarks will be awarded for appropriate use of technical language, critical reflection on development process and quality of engagement with the reflective process.

Appendix 2 – Example Column Extraction
For the following illustration, you should assume that the column number parameter is equal to 1 for the data file. There are 9 columns in this file and so column number can range between 0 and 8. For this data, the function would return two values: “Glucose” and [148,85,183,89,137,116,78,115,197,125,110,168,139]

Appendix 3 – In-Memory Data Structure
Using the file illustrated in Appendix 2, the Dictionary produced in FR2 should look something like the illustration below. However, you must ensure that your function can work for any CSV file with a similar structure (such as a file with 5 columns and 100 rows or with 20 columns and 1000 rows).
{
    "Pregnancies" : [6,1,8,1,0,5,3,10,2,8,4,10,10],
    "Glucose" : [148,85,183,89,137,116,78,115,197,125,110,168,139],
    "BloodPressure" : [72,66,64,66,40,74,50,0,70,96,92,74,80],
    "SkinThickness" : [35,29,0,23,35,0,32,0,45,0,0,0,0],
    "Insulin" : [0,0,0,94,168,0,88,0,543,0,0,0,0],
    "BMI" : [33.6,26.6,23.3,28.1,43.1,25.6,31,35.3,30.5,0,37.6,38,27.1],
    "DiabetesPedigreeFunction" : [0.627,0.351,0.672,0.167,2.288,0.201,   0.248,0.134,0.158,0.232,0.191,0.537,1.441],
    "Age" : [50,31,32,21,33,30,26,29,53,54,30,34,57],
    "Outcome" : [1,0,1,0,1,0,1,0,1,1,0,1,0]
}
Appendix 4 – Statistical data based on In-Memory Data Structure 
Using the in-memory data structure illustrated in Appendix 3, the List of Tuples produced in FR4 should look something like the illustration below. The full data output is too large to include here and so only some of the data has been included to help illustrate what is required. Remember that different CSV data files will result in different data being stored. The data file you have been provided with does not include any of the data shown below. Don’t be tempted to simply copy the result below into your Jupyter Notebook.
[
    ("Pregnancies", "Glucose", 0.337),
    ("Pregnancies", "BloodPressure", -0.0025),
    ("Pregnancies", "SkinThickness", -0.7481),
    ("Pregnancies", "Insulin",  -0.4772),
    ("Pregnancies", "BMI", -0.2313),
    ("Pregnancies", "DiabetesPedigreeFunction", -0.0872),
    ("Pregnancies", "Age", 0.3428),
    ("Pregnancies", "Outcome", 0.0167),

    ("Glucose", "Pregnancies", 0.337),
    ("Glucose", "BloodPressure", 0.1429),
    ("Glucose", "SkinThickness", -0.0028),
    ("Glucose", "Insulin", 0.4304),
    ("Glucose", "BMI", 0.0584),
    ("Glucose", "DiabetesPedigreeFunction", 0.2192),
    ("Glucose", "Age", 0.5328),
    ("Glucose", "Outcome", 0.5465),

    +++++++ More data would be included here ++++++++

    ("Outcome", "Pregnancies", 0.0167),
("Outcome", "Glucose", 0.5465),
    ("Outcome", "BloodPressure", 0.0755),
    ("Outcome", "SkinThickness", 0.3585),
    ("Outcome", "Insulin", 0.3355),
    ("Outcome", "BMI", -0.0768),
    ("Outcome", "DiabetesPedigreeFunction", 0.2185),
    ("Outcome", "Age", 0.314)
]
Appendix 5 – Output table for Statistics
Using the output from the function produced in FR4, the following table outputs a subset of the available columns (as defined by the function parameter) using the border character * and padding within the cells to ensure the table is readable:


 

  • 17
    点赞
  • 19
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
智慧校园整体解决方案是响应国家教育信息化政策,结合教育改革和技术创新的产物。该方案以物联网、大数据、人工智能和移动互联技术为基础,旨在打造一个安全、高效、互动且环保的教育环境。方案强调从数字化校园向智慧校园的转变,通过自动数据采集、智能分析和按需服务,实现校园业务的智能化管理。 方案的总体设计原则包括应用至上、分层设计和互联互通,确保系统能够满足不同用户角色的需求,并实现数据和资源的整合与共享。框架设计涵盖了校园安全、管理、教学、环境等多个方面,构建了一个全面的校园应用生态系统。这包括智慧安全系统、校园身份识别、智能排课及选课系统、智慧学习系统、精品录播教室方案等,以支持个性化学习和教学评估。 建设内容突出了智慧安全和智慧管理的重要性。智慧安全管理通过分布式录播系统和紧急预案一键启动功能,增强校园安全预警和事件响应能力。智慧管理系统则利用物联网技术,实现人员和设备的智能管理,提高校园运营效率。 智慧教学部分,方案提供了智慧学习系统和精品录播教室方案,支持专业级学习硬件和智能化网络管理,促进个性化学习和教学资源的高效利用。同时,教学质量评估中心和资源应用平台的建设,旨在提升教学评估的科学性和教育资源的共享性。 智慧环境建设则侧重于基于物联网的设备管理,通过智慧教室管理系统实现教室环境的智能控制和能效管理,打造绿色、节能的校园环境。电子班牌和校园信息发布系统的建设,将作为智慧校园的核心和入口,提供教务、一卡通、图书馆等系统的集成信息。 总体而言,智慧校园整体解决方案通过集成先进技术,不仅提升了校园的信息化水平,而且优化了教学和管理流程,为学生、教师和家长提供了更加便捷、个性化的教育体验。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值