Thinking Through Analytics Data
本文将介绍如何从头到尾对数据进行分析。我们将探索Dataquest这个网站上用户的匿名化分析数据。我们将探索用户是如何进行学习的,数据源主要有两个:
- 数据库
- 网站前端的收集的数据
A Quick Look At Dataquest
首先需要明确Dataquest这个网站是怎样构造的:当前处在一个任务中,任务是由远程数据库,以及一些知识点组成。每个任务包含多个屏幕(screen),屏幕的目录在右边,可以点击它跳到相应的屏幕中。这些屏幕可以是code屏幕,也可以是文本屏幕,code屏幕通常需要你写答案,然后点击运行来检测答案的正确性。系统所使用的语言是python3.
Looking At Student Data
第一个数据集来自数据库,包含了:
- 学习进展信息(progress data):是否成功完成某个屏幕,学生所写的代码,比如你刚完成了一个屏幕的内容就产生了一个新的记录(是否成功完成以及你的代码)。每个progress数据由一个pk值唯一确定。
尝试数据(attempt data):包含学生对每个任务所作的各种代码尝试记录,每个progress data都有一个或多个与之关联的attempt data,每一个attempt数据有一个pk值唯一确定,attempt中的screen_progress属性就是progress的pk值,这是attempt的外键,通过这个外键将attempt与progress联系到一起。
为了使分析更简单,本文提取了50个学生的数据库信息:
# The attempts are stored in the attempts variable, and progress is stored in the progress variable.
# Here's how one progress record looks.
print("Progress Record:")
# Pretty print is a custom function we made to output json data in a nicer way.
pretty_print(progress[0])
print("\n")
# Here's how one attempt record looks.
print("Attempt Record:")
pretty_print(attempts[0])
'''
# 一条Progress记录有fields,model,pk三个键,而fields中有attempts,complete,user等更详细的键。
Progress Record:
{
"fields": {
"attempts": 0,
"complete": true,
"created": "2015-04-07T21:21:57.316Z",
"last_code": "# We'll be coding in python.\n# Python is a great general purpose language, and is used in a lot of data science and machine learning applications.\n# If you don't know python, that's okay -- important concepts will be introduced as we go along.\n# In python, any line that starts with a # is called a comment, and is used to put in notes and messages.\n# It isn't part of the code, and isn't executed.",
"last_context": null,
"last_correct_code": "# We'll be coding in python.\n# Python is a great general purpose language, and is used in a lot of data science and machine learning applications.\n# If you don't know python, that's okay -- important concepts will be introduced as we go along.\n# In python, any line that starts with a # is called a comment, and is used to put in notes and messages.\n# It isn't part of the code, and isn't executed.",
"last_output": "{\"check\":true,\"output\":\"\",\"hint\":\"\",\"vars\":{},\"code\":\"# We'll be coding in python.\\n# Python is a great general purpose language, and is used in a lot of data science and machine learning applications.\\n# If you don't know python, that's okay -- important concepts will be introduced as we go along.\\n# In python, any line that starts with a # is called a comment, and is used to put in notes and messages.\\n# It isn't part of the code, and isn't executed.\"}",
"screen": 1,
"updated": "2015-04-07T21:25:07.799Z",
"user": 48309
},
"model": "missions.screenprogress",
"pk": 299076
}
# 一条Attempt 记录有fields,model,pk三个键,同样fields中有更详细的键screen_progress等。
Attempt Record:
{
"fields": {
"code": "# We'll be coding in python.\n# Python is a great general purpose language, and is used in a lot of data science and machine learning applications.\n# If you don't know python, that's okay -- important concepts will be introduced as we go along.\n# In python, any line that starts with a # is called a comment, and is used to put in notes and messages.\n# It isn't part of the code, and isn't executed.",
"correct": true,
"created": "2015-03-01T16:33:56.537Z",
"screen_progress": 231467,
"updated": "2015-03-01T16:33:56.537Z"
},
"model": "missions.screenattempt",
"pk": 62474
}
'''
The Structure Of The Data
可以发现progress以及attempts都是字典格式的数据。
Progress record
- pk – the id of the record in the database
- fields
- attempts – a count of how many attempts the student made on the
screen. - complete – whether the student successfully passed the screen (True
if they have / False if not). - created – what time the student first saw the screen.
- last_code – the text of the last code the student wrote.
- last_correct_code – the last code the student wrote that was
correct. Null if they don’t have anything correct. - screen – the id of the screen this progress is associated with.
- user – the id of the user this progress is associated with.
- attempts – a count of how many attempts the student made on the
Attempt record
- pk – the id of the record in the database
- fields
- code – the code that was submitted for this attempt.
- correct – whether or not the student got the answer right.
- screen_progress – the id of the progress record this at