Dataquest用户数据分析

Thinking Through Analytics Data

本文将介绍如何从头到尾对数据进行分析。我们将探索Dataquest这个网站上用户的匿名化分析数据。我们将探索用户是如何进行学习的,数据源主要有两个:

  • 数据库
  • 网站前端的收集的数据

A Quick Look At Dataquest

首先需要明确Dataquest这个网站是怎样构造的:当前处在一个任务中,任务是由远程数据库,以及一些知识点组成。每个任务包含多个屏幕(screen),屏幕的目录在右边,可以点击它跳到相应的屏幕中。这些屏幕可以是code屏幕,也可以是文本屏幕,code屏幕通常需要你写答案,然后点击运行来检测答案的正确性。系统所使用的语言是python3.

Looking At Student Data

第一个数据集来自数据库,包含了:

  • 学习进展信息(progress data):是否成功完成某个屏幕,学生所写的代码,比如你刚完成了一个屏幕的内容就产生了一个新的记录(是否成功完成以及你的代码)。每个progress数据由一个pk值唯一确定。
  • 尝试数据(attempt data):包含学生对每个任务所作的各种代码尝试记录,每个progress data都有一个或多个与之关联的attempt data,每一个attempt数据有一个pk值唯一确定,attempt中的screen_progress属性就是progress的pk值,这是attempt的外键,通过这个外键将attempt与progress联系到一起

  • 为了使分析更简单,本文提取了50个学生的数据库信息:

# The attempts are stored in the attempts variable, and progress is stored in the progress variable.

# Here's how one progress record looks.
print("Progress Record:")
# Pretty print is a custom function we made to output json data in a nicer way.
pretty_print(progress[0])
print("\n")

# Here's how one attempt record looks.
print("Attempt Record:")
pretty_print(attempts[0])
'''
# 一条Progress记录有fields,model,pk三个键,而fields中有attempts,complete,user等更详细的键。
Progress Record:
{
    "fields": {
        "attempts": 0,
        "complete": true,
        "created": "2015-04-07T21:21:57.316Z",
        "last_code": "# We'll be coding in python.\n# Python is a great general purpose language, and is used in a lot of data science and machine learning applications.\n# If you don't know python, that's okay -- important concepts will be introduced as we go along.\n# In python, any line that starts with a # is called a comment, and is used to put in notes and messages.\n# It isn't part of the code, and isn't executed.",
        "last_context": null,
        "last_correct_code": "# We'll be coding in python.\n# Python is a great general purpose language, and is used in a lot of data science and machine learning applications.\n# If you don't know python, that's okay -- important concepts will be introduced as we go along.\n# In python, any line that starts with a # is called a comment, and is used to put in notes and messages.\n# It isn't part of the code, and isn't executed.",
        "last_output": "{\"check\":true,\"output\":\"\",\"hint\":\"\",\"vars\":{},\"code\":\"# We'll be coding in python.\\n# Python is a great general purpose language, and is used in a lot of data science and machine learning applications.\\n# If you don't know python, that's okay -- important concepts will be introduced as we go along.\\n# In python, any line that starts with a # is called a comment, and is used to put in notes and messages.\\n# It isn't part of the code, and isn't executed.\"}",
        "screen": 1,
        "updated": "2015-04-07T21:25:07.799Z",
        "user": 48309
    },
    "model": "missions.screenprogress",
    "pk": 299076
}

# 一条Attempt 记录有fields,model,pk三个键,同样fields中有更详细的键screen_progress等。
Attempt Record:
{
    "fields": {
        "code": "# We'll be coding in python.\n# Python is a great general purpose language, and is used in a lot of data science and machine learning applications.\n# If you don't know python, that's okay -- important concepts will be introduced as we go along.\n# In python, any line that starts with a # is called a comment, and is used to put in notes and messages.\n# It isn't part of the code, and isn't executed.",
        "correct": true,
        "created": "2015-03-01T16:33:56.537Z",
        "screen_progress": 231467,
        "updated": "2015-03-01T16:33:56.537Z"
    },
    "model": "missions.screenattempt",
    "pk": 62474
}
'''

The Structure Of The Data

可以发现progress以及attempts都是字典格式的数据。

Progress record

  • pk – the id of the record in the database
  • fields
    • attempts – a count of how many attempts the student made on the
      screen.
    • complete – whether the student successfully passed the screen (True
      if they have / False if not).
    • created – what time the student first saw the screen.
    • last_code – the text of the last code the student wrote.
    • last_correct_code – the last code the student wrote that was
      correct. Null if they don’t have anything correct.
    • screen – the id of the screen this progress is associated with.
    • user – the id of the user this progress is associated with.

Attempt record

  • pk – the id of the record in the database
  • fields
    • code – the code that was submitted for this attempt.
    • correct – whether or not the student got the answer right.
    • screen_progress – the id of the progress record this at
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值