36103 Statistical Thinking for Data Science

SUBJECT OUTLINE

36103 Statistical Thinking for Data Science

Subject description

Statistical thinking is the foundational mindset in data science, emphasizing the use of statistical principles and methods to understand, analyze, and derive meaningful insights from data. It serves as the core of data science. This  subject equips students with essential skills and concepts for applying statistical thinking in the context of applied data science. Initially, students are introduced to fundamental statistical principles, developing a simultaneous understanding of modern methods for statistical inference, and gaining valuable hands-on experience with real-world data. Subsequently, they delve into a range of statistical models and estimation techniques, applying their acquired knowledge to engage in a complete data science research cycle. Collaborating in teams, students learn how to formulate research inquiries, employ formal statistics and real-world datasets to address them, and effectively communicate their findings through both oral presentations and written reports.

The progression of this subject starts with more teaching-intensive methods such as workshops and lectures to give students the technical and conceptual know-how to work as practicing data scientists. As the subject progresses, students increasingly move towards an individually driven learning mode, allowing both teams and individuals the flexibility to enhance their statistical thinking and skills.

Upon completion of the subject, students possess a robust foundation in technical, conceptual, and practical aspects, empowering them to continue their development as Data Scientists.

Subject learning objectives (SLOs)

Upon successful completion of this subject students should be able to:

1. Manage the complexity of real data science projects and their inevitable compromises

2. Formulate authentic data science questions precise enough to be answered by valid statistical techniques

3. Justify the use of different statistical concepts and tools to audiences from a wide range of backgrounds

4. Find, clean, and merge datasets from a range of sources to answer real world data science problems

5. Apply statistical methods that are appropriate to a dataset and stakeholder requirements

6. Interpret the results of a statistical analysis correctly, visualizing and reporting upon them in ways that create value for, and are sensitive to the needs of, a wide range of stakeholders

7. Collaborate with and contribute to the professional community of data scientists, both local and global

Course intended learning outcomes (CILOs)

This subject also contributes specifically to the development of the following course outcomes:

Exploring and testing models and describing behaviours of complex systems

Explore and test models and generalisations for describing the behaviour of sociotechnical systems and selecting data sources, taking into account the needs and values of different contexts and stakeholders (1.2)

Making the invisible visible

Use transdisciplinary approaches to seeing and doing to uncover underrepresented, or misrepresented, elements of a system (1.4)

● Exploring, interpreting and visualising data

Explore, analyse, manipulate, interpret and visualise data using data science techniques, software and technologies to make sense of data rich environments (2.2)

Designing and managing data investigations

Apply and assess data science concepts, theories, practices and tools for designing and managing data discovery investigations in professional environments that draw upon diverse data sources, including efforts to shed light on  underrepresented components (2.4)

Developing strategies for innovation

Explore, interrogate, generate, apply, test and evaluate problem-solving strategies to extract economic, business, social, strategic or other value from data (3.1)

Working together

Develop a collaborative and team-oriented mindset to harness value for stakeholders to produce innovative solutions to challenges (3.3)

Engaging audiences

Explore and craft interpretative narratives that engage key audiences with data analytics and potential significance for action, at a societal, industrial, organisational, group or individual levels (4.2)

· Informing decision making

Develop, test, justify and deliver data project propositions, methodologies, analytics outcomes and

recommendations for informing decision-making, both to specialist and non-specialist audiences (4.3)

Contribution to the development of graduate attributes

Your experiences as a student in this subject support you to develop the following graduate attributes (GA):

GA 1 Sociotechnical systems thinking

GA 2 Creative, analytical and rigorous sense making

GA 3 Create value in problem solving and inquiry

GA 4 Persuasive and robust communication

Teaching and learning strategies

Authentic problem based learning: This subject relies heavily upon the principle that students learn best by doing. It offers a range of authentic data science problems to solve that help to develop students’ statistical thinking about complex problems. Students work on real world data analysis problems using datasets that they create using modern data harvesting techniques. These are used to answer realistic data science questions in broad areas of topical interest. This exposes them to the true ambiguities, constraints, and complexities of working as a data scientist for a variety of different stakeholders.

Blend of online and face to face activities: This subject is offered through a series of block sessions blending online with face-to-face learning. Students interact face-to-face with each other and the teaching team in three intensive modules that require the completion of both preparation and after class activities. They concurrently use a range of complementary online resources to develop their statistical thinking according to identified weaknesses in their background knowledge. They are expected to engage in online discussion and to actively participate in other blended activities.

Collaborative work: We place a strong emphasis on group activities and collaboration in diverse teams. As a data science professional you need to approach professional projects and challenges by working with people from different  backgrounds, expectations, and expertise. This course simulates that environment by requiring students to work with a team of peers who come from many different backgrounds. Group assessments help students to develop effective strategies for working as a part of a data science team, as well as an appreciation that there are diverse perspectives on many different topics in data science and innovation.

Self paced evaluation and improvement: This subject takes students from an exceptionally wide range of backgrounds, some of who are better versed in statistical methods, and Python, than others. We help all students to self-diagnose their weaknesses and strengths, and to work to improve in areas that they identify as a priority for the professional niche that they would like to occupy as a practicing data scientist. Students choose their own path through a wide variety of curated resources as needed.

Embedding English Language: An aim of this subject is to help you develop academic and professional language   and communication skills in order to succeed at university and in the workplace. To determine your current academic language proficiency, you are required to complete an online language screening task, OPELA (information available at

https://www.edu.au/research-and-teaching/learning-and-teaching/enhancing/language-and-learning/about-opela-student If you receive a Basic grade for OPELA, you must attend additional Language Development Tutorials (each week from    week [3/4] to week [11/12] in order to pass the subject. These tutorials are designed to support you to develop your language and communication skills. Students who do not complete the OPELA and/or do not attend 80% of the Language Development Tutorials will receive a Fail X grade

Assessment

This subject is 100% coursework based with no exams. A detailed assessment brief is available on canvas detailing each assessment task, please refer to this throughout the course.

Assessments are a blend of individual and team-based work.

Assessment task 1: Exploration of data skills and issues

Objective(s): 3 and 5 Type: Report

Groupwork: Individual Weight: 20%

Task: This assessment is intended to conduct exploratory data analysis (EDA) on a marketing campaign

dataset from a telecommunication company. A telecommunication company recently launched a

marketing campaign to promote the adoption of their new subscription plan among customers. The company seeks assistance in gaining a comprehensive understanding of their customers and

identifying the customer segments that display the highest responsiveness to marketing campaigns.  The response variable, subscribed, indicates whether the client subscribed to a new plan, which was the objective of the campaign.

The dataset may have issues such missing information and data errors. Identifying and handling such issues is part of the assessment.

The requirements involve applying a minimum of three distinct exploratory data analysis techniques to gain preliminary insights from the data.

Length: A maximum of 7 pages

Due: 11.59pm Sunday 10 March 2024

Assessment task 2: Data analysis project

Objective(s): 1, 2, 3, 4, 5, 6 and 7 Type: Project

Groupwork: Group, group assessed Weight: 30%

Task: Students work in teams of 5-7 people with complementary skills and backgrounds. Each team

selects a context and work to define research questions that help them to propose, exectue, and disseminate a data science project.

Project presentation (group) worth 15%

Students work in teams to carry out their proposed project. Projects are presented to the class.

Project report (group) worth 15%

Students work in teams to carry out their proposed project. Project reports are submitted in written format.

Length: Group Presentation: 10-15 minutes

Group Report: 500-700 words

Due: See Further information.

Further Presentation: Saturday 27 April online information:

Report: Due 11:59 pm Sunday 12 May

Assessment task 3: Individual project exploration

Objective(s): 2, 3 and 6 Type: Project

Groupwork: Individual Weight: 50%

Task: Assessment 3 builds on Assessment 1.

The objective of this assessment is to develop data science models that provide insights into the business question of which customer segments are most responsive to marketing campaigns.     Your report must show results for at least two different sets of predictions.

• At least one of your models should be a parametric model.

• At least one of your models should be a non-parametric model.

You should use at least one estimation method introduced in Module 3. Your report should include the following elements:

1.  Justification for modelselection, including an explanation of the configuration and training choices made.

2.  Parametric estimates and their corresponding interpretations.

3.  A comparative analysis of the models, employing cross-validation or validation metrics.

4.   Proficiency in Data mining, demonstrated by the ability to extract relevant business insights from the data and effectively articulating them.

Length: 700 to 1000 words Canvas Submission.

Due: 11.59pm Sunday 2 June 2024

Minimum requirements

To meet the minimum requirement for the course, students must attain a minimum of 50% marks to pass.

Additionally, it is a requirement of this subject that all students complete OPELA. Students who received a Basic grade in the OPELA are required to attend 80% of the Language Development Tutorials in order to pass the subject.

Students who do not complete the OPELA and/or do not attend 80% of the Language Development Tutorials will receive a Fail X grade.

Recommended texts

Other learning resources:

Depending on your background and what you are planning to learn you will find at least one useful. You are not expected to read all of these resources cover-to-cover. Use them to help you solve specific problems.

To learn statistical concepts:

James, G., Witten, D., Hastie, T. and Tibshirani, R. (2021). An Introduction to Statistical Learning with Applications in R

(Second Edition). New York: Springer. (An Introduction to Statistical Learning (statlearning.com))

Bruce, P., & Bruce, A. (2017). Practical Statistics for Data Scientists: 50 Essential Concepts. O'Reilly Media, Inc. You can get it here. We will refer to it as PSDS in this subject.

To learn linear regression modelling: Brian Caffo, Regression models for Data Science in R, Lean pubs. You can get a free copy here: leanpub.com/regmods/read . It is written as a companion book to the Coursera Regression

Models class,and also has a series of YouTube videos accompanying it. We will refer to it as RM throughout this subject.

To run a good Data Science project: Godsey, B. (2017). Think Like a Data Scientist: Tackle the data science process step-by-step. Manning Publications Co.. You can get it here.

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值