在线考试系统学员答题批改日志，实战练习

最新推荐文章于 2024-07-19 21:08:24 发布

大数据与云计算开发者Cd

最新推荐文章于 2024-07-19 21:08:24 发布

阅读量475

点赞数

文章标签： hbase hadoop spark

本文链接：https://blog.csdn.net/qq_56795768/article/details/122340815

版权

一、环境要求 Hadoop+Hive+Spark+HBase 开发环境

三、数据描述

这是一份来自于某在线考试系统的学员答题批改日志，日志中记录了日志生成时间,题目难度系数，题目所属的知识点 ID，做题的学生 ID，题目 ID 以及作答批改结果。日志的结构如下：

四、功能要求

1.数据准备

请在 HDFS 中创建目录/app/data/exam，并将 answer_question.log 传到该目录。

[root@gree2 exam]# hdfs dfs -mkdir -p /app/data/exam

[root@gree2 exam]# hdfs dfs -put ./answer_question.log /app/data/exam

2.在 Spark-Shell 中，加载 HDFS 文件系统 answer_question.log 文件，并使用 RDD 完成以下分析，也可使用 Spark 的其他方法完成数据分析。

scala> val answer_logRdd=sc.textFile("/app/data/exam/answer_question.log")

①提取日志中的知识点 ID，学生 ID，题目 ID，作答结果 4 个字段的值

②将提取后的知识点 ID，学生 ID，题目 ID，作答结果字段的值以文件的形式保存到 HDFS的/app/data/result 目录下。

scala> val answer_log=answerRdd.map(x=>x.split("_")).map(x=>{var y=x(3).split("r")(0);var y2=x(3).split("r")(1).toString.trim.split(",")(0);(x(1),x(2),y,y2 )}).map(x=>x.productIterator.mkString("\t")).saveAsTextFile("/app/data/result")

3.创建 HBase 数据表

在 HBase 中创建命名空间（namespace）exam，在该命名空间下创建 analysis 表，使用学生 ID 作为 RowKey，该表下有 2 个列族 accuracy、question。accuracy 列族用于保存学员答题正确率统计数据（总分 accuracy:total_score ，答题的试题数 accuracy:question_count，正确率 accuracy:accuracy）；question 列族用于分类保存学员正确，错误和半对的题目 id （正确 question:right，错误 question:error，半对 question:half

hbase(main):004:0> create 'exam:analysis','accuracy','question'

4.请在 Hive 中创建数据库 exam，在该数据库中创建外部表 ex_exam_record 指向 /app/data/result 下 Spark 处理后的日志数据 ;创建外部表 ex_exam_anlysis 映射至 HBase 中的 analysis 表的 accuracy 列族;创建外部表 ex_exam_question 映射至 HBase 中的 analysis 表的 question 列族

ex_exam_anlysis 表结构如下：

//创建外部表

create external table ex_exam_record(
    topic_id string,
    student_id string,
    question_id string,
    score Double
)
row format delimited fields terminated by ","
stored as textfile location "/app/data/result/";

//创建外部表 ex_exam_anlysis 映射至 HBase中的 analysis 表的 accuracy 列族;创建外部表 ex_exam_question 映射至 HBase 中的analysis 表的 question 列族

create external table ex_exam_anlysis(
    student_id string,
    total_score float,
    question_count int,
    accuracy float
)
stored by "org.apache.hadoop.hive.hbase.HBaseStorageHandler"
with serdeproperties ("hbase.columns.mapping"=":key,accuracy:total_score,accuracy:question_count,accuracy:accuracy")
tblproperties ("hbase.table.name"="exam:analysis")



create external table if not exists ex_exam_question
(
    student_id string,
    right      string,
    half       string,
    error      string
)
stored by "org.apache.hadoop.hive.hbase.HBaseStorageHandler"
with serdeproperties ("hbase.columns.mapping"=":key,question:right,question:half,question:error")
tblproperties ("hbase.table.name"="exam:analysis");

5.使用 ex_exam_record 表中的数据统计每个学员总分、答题的试题数和正确率，并保存到 ex_exam_anlysis 表中，其中正确率的计算方法如下：正确率=总分/答题的试题数

insert into ex_exam_anlysis
select t.student_id,t.total,t.num,t.rate from
(select student_id,sum(score) as total,count(1) as num,(sum(score)/count(1)) as rate
from ex_exam_record
group by student_id) t;

6.使用 ex_exam_record 表中的数据统计每个作对，做错，半对的题目列表。

①题目 id 以逗号分割，并保存到 ex_exam_question 表中。

②完成统计后，在 HBase Shell 中遍历 exam:analysis 表并只显示 question 列族中的数据，

insert into ex_exam_question
select student_id,
concat_ws(",",collect_list(case when score=1 then question_id end)) as right,
concat_ws(",",collect_list(case when score=0.5 then question_id end)) as half,
concat_ws(",",collect_list(case when score=0 then question_id end)) as error
from ex_exam_record group by student_id;

大数据与云计算开发者Cd

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
在线考试系统学员答题批改日志，实战练习

一、环境要求 Hadoop+Hive+Spark+HBase 开发环境三、数据描述这是一份来自于某在线考试系统的学员答题批改日志，日志中记录了日志生成时间,题目难度系数，题目所属的知识点 ID，做题的学生 ID，题目 ID 以及作答批改结果。日志的结构如下：四、功能要求1.数据准备请在 HDFS 中创建目录/app/data/exam，并将 answer_question.log 传到该目录。[root@gree2 exam]# hdfs dfs -mkdir -p
复制链接

扫一扫