综合性题目练习

最新推荐文章于 2023-12-12 18:45:14 发布

小埋璐璐

最新推荐文章于 2023-12-12 18:45:14 发布

阅读量318

点赞数 3

本文链接：https://blog.csdn.net/Malu_/article/details/103860659

版权

1.数据的预处理阶段
2.数据的入库操作阶段
3.数据的分析阶段
4.数据保存到数据库阶段
5.数据的查询显示阶段
给出数据格式表和数据示例，请先阅读数据说明，再做相应题目。

数据说明：
表1-1 视频表
在这里插入图片描述
原始数据：
qR8WRLrO2aQ:mienge:406:People & Blogs:599:2788:5:1:0:4UUEKhr6vfA:zvDPXgPiiWI:TxP1eXHJQ2Q:k5Kb1K0zVxU:hLP_mJIMNFg:tzNRSSTGF4o:BrUGfqJANn8:OVIc-mNxqHc:gdxtKvNiYXc:bHZRZ-1A-qk:GUJdU6uHyzU:eyZOjktUb5M:Dv15_9gnM2A:lMQydgG1N2k:U0gZppW_-2Y:dUVU6xpMc6Y:ApA6VEYI8zQ:a3_boc9Z_Pc:N1z4tYob0hM:2UJkU2neoBs
预处理之后的数据：
qR8WRLrO2aQ:mienge:406:People,Blogs:599:2788:5:1:0:4UUEKhr6vfA,zvDPXgPiiWI,TxP1eXHJQ2Q,k5Kb1K0zVxU,hLP_mJIMNFg,tzNRSSTGF4o,BrUGfqJANn8,OVIc-mNxqHc,gdxtKvNiYXc,bHZRZ-1A-qk,GUJdU6uHyzU,eyZOjktUb5M,Dv15_9gnM2A,lMQydgG1N2k,U0gZppW_-2Y,dUVU6xpMc6Y,ApA6VEYI8zQ,a3_boc9Z_Pc,N1z4tYob0hM,2UJkU2neoBs

1、对原始数据进行预处理，格式为上面给出的预处理之后的示例数据。

通过观察原始数据形式，可以发现，每个字段之间使用“:”分割，视频可以有多个视频类别，类别之间&符号分割，且分割的两边有空格字符，同时相关视频也是可以有多个，多个相关视频也是用“:”进行分割。为了分析数据时方便，我们首先进行数据重组清洗操作。
即：将每条数据的类别用“，”分割，同时去掉两边空格，多个“相关视频id”也使用“,”进行分割
1.数据的预处理阶段
实现效果【截图】

在这里插入图片描述
实现代码【代码】
Map代码

package split;

import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class VodeoMapper extends Mapper<LongWritable, Text,NullWritable,Text> {

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

        String line = value.toString();
        if (line.contains("&")){
            //获取第9个：号的索引
        int index = StringUtils.ordinalIndexOf(line, ":", 9);
        //将第9个索引之后的：都换成，
        String replace = line.substring(index+1).replace(":", ",");
        StringBuilder sb =new StringBuilder();
        sb.append(line.substring(0,index+1));
        sb.append(replace);
        String string = sb.toString();
            String[] split = string.split("&");
            String s1 = split[0]+","+split[1];
            String[] s = s1.split(" ");
            String s2 = s[0] + s[1] + s[2];
            context.write(NullWritable.get(),new Text(s2));
        }



    }
}

Reduce代码
在这里插入图片描述

package split;

import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class VodeiReducer extends Reducer<NullWritable,Text,NullWritable,Text> {

    @Override
    protected void reduce(NullWritable key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        for (Text value : values) {
            context.write(NullWritable.get(),value);
        }
    }
}

Util代码
在这里插入图片描述
驱动代码

package split;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class VideoSplit {

    public static void main(String[] args) throws Exception {
        Configuration con = new Configuration();
        Job job = Job.getInstance(con);
        job.setMapperClass(VodeoMapper.class);
        job.setReducerClass(VodeiReducer.class);
        job.setJarByClass(VideoSplit.class);

        job.setMapOutputKeyClass(NullWritable.class);
        job.setMapOutputValueClass(Text.class);

        job.setOutputKeyClass(NullWritable.class);
        job.setOutputValueClass(Text.class);

        FileInputFormat.setInputPaths(job,new Path("G:\\Adata\\考核\\考试练习题\\video.txt"));
        FileOutputFormat.setOutputPath(job,new Path("G:\\Adata\\考核\\考试练习题\\video"));

        boolean b = job.waitForCompletion(true);
        System.exit(b?0:1);

    }
}

2、把预处理之后的数据进行入库到hive中

2.1创建数据库和表

	创建数据库名字为：video
	创建原始数据表：
	视频表：video_ori  用户表：video_user_ori
	创建ORC格式的表：
	视频表：video_orc 用户表：video_user_orc
	给出创建原始表语句

创建video_ori视频表：
create table video_ori(
videoId string,
uploader string,
age int,
category array,
length int,
views int,
rate float,
ratings int,
comments int,
relatedId array)
row format delimited
fields terminated by “:”
collection items terminated by “,”
stored as textfile;
创建video_user_ori用户表：
create table video_user_ori(
uploader string,
videos int,
friends int)
row format delimited
fields terminated by “,”
stored as textfile;

请写出ORC格式的建表语句：
创建video_orc表
create table video_orc(
videoId string,
uploader string,
age string,
category string,
length string,
views string,
rate string,
ratings string,
comments string,
relatedId string)
row format delimited
fields terminated by “:”
stored as ORC;

创建video_user_orc表：
create table video_user_orc(
uploader string,
videos string,
friends string)
row format delimited
fields terminated by “,”
stored as ORC;

2.2分别导入预处理之后的视频数据到原始表video_ori和导入原始用户表的数据到video_user_ori中

请写出导入语句：
video_ori：
load data local inpath ‘/opt/part-r-00000’ ovewrite into table video_ori;
video_user_ori：
load data local inpath ‘/opt/user.txt’ into table video_user_ori;

2.3从原始表查询数据并插入对应的ORC表中

请写出插入语句：
video_orc：

insert into table video_orc select * from video_ori;
video_user_orc：
insert into table video_user_orc select * from video_user_ori;

3、对入库之后的数据进行hivesql查询操作

3.1从视频表中统计出视频评分为5分的视频信息，把查询结果保存到/export/rate.txt

请写出sql语句：
hive -e "select * from video.video_orc where rate=5 " > 5.txt

3.2从视频表中统计出评论数大于100条的视频信息,把查询结果保存到/export/comments.txt

请写出sql语句：
hive -e "select * from video.video_orc where comments >100 " > 100.txt

4、把hive分析出的数据保存到hbase中

4.1创建hive对应的数据库外部表

创建rate外部表的语句：
create external table rate(
videoId string,
uploader string,
age string,
category string,
length string,
views string,
rate string,
ratings string,
comments string,
relatedId string)
row format delimited
fields terminated by “\t”
stored as textfile;

创建comments外部表的语句：
create external table comments(
videoId string,
uploader string,
age string,
category string,
length string,
views string,
rate string,
ratings string,
comments string,
relatedId string)
row format delimited
fields terminated by “\t”
stored as textfile;

4.2加载第3步的结果数据到外部表中

请写出加载语句到rate表：
load data local inpath ‘/opt/5.txt’ into table rate;
请写出加载语句到comments表：
load data local inpath ‘/opt/100.txt’ into table comments;

4.3创建hive管理表与HBase进行映射

给出此步骤的语句
Hive中的rate，comments两个表分别对应hbase中的hbase_rate，hbase_comments两个表

创建hbase_rate表并进行映射

create table video.hbase_rate(
videoId string,
uploader string,
age string,
category string,
length string,
views string,
rate string,
ratings string,
comments string,
relatedId string)
stored by ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’
with serdeproperties(“hbase.columns.mapping” = “cf:uploader,cf:age,cf:category,cf:length,cf:views,cf:rate,cf:ratings,cf:comments,cf:relatedId”)
tblproperties(“hbase.table.name” = "hbase_rate");

创建hbase_comments表并进行映射：
create table video.hbase_comments(
videoId string,
uploader string,
age string,
category string,
length string,
views string,
rate string,
ratings string,
comments string,
relatedId string)
stored by ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’
with serdeproperties(“hbase.columns.mapping” = “cf:uploader,cf:age,cf:category,cf:length,cf:views,cf:rate,cf:ratings,cf:comments,cf:relatedId”)
tblproperties(“hbase.table.name” = “hbase_comments”);、

4.4请写出通过insert overwrite select，插入hbase_rate表的语句

insert into table hbase_rate select * from rate;

请写出通过insert overwrite select，插入hbase_comments表的语句

insert into table hbase_comments select * from comments;

5.通过hbaseapi进行查询操作

5.1请使用hbaseapi 对hbase_rate表，按照通过startRowKey=1和endRowKey=100进行扫描查询出结果。

代码结果【截图】

代码
在这里插入图片描述

5.2请使用hbaseapi对hbase_comments表，只查询comments列的值。

在这里插入图片描述

1、请使用hbaseapi 对hbase_rate表，按照通过startRowKey=1和endRowKey=100进行扫描查询出结果。

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    conf.set("hbase.zookeeper.quorum", "node01:2181,node02:2181,node03:2181");
    Connection connection = ConnectionFactory.createConnection(conf);
    Admin admin = connection.getAdmin();
    Table tableName = connection.getTable(TableName.valueOf("hbase_rate"));
    Scan scan = new Scan();
    ResultScanner scanner = tableName.getScanner(scan);
    for (Result result : scanner) {
        Cell[] cells = result.rawCells();
        for (Cell cell : cells) {
            System.out.println(
                    "RowKey= "+Bytes.toString(CellUtil.cloneRow(cell))+
                    "\tcloneFamily= "+Bytes.toString(CellUtil.cloneFamily(cell))+
                    "\tcloneQualifier= "+Bytes.toString(CellUtil.cloneQualifier(cell))+
                    "\tcloneValue= "+Bytes.toString(CellUtil.cloneValue(cell))
            );
        }
    }
}

2、请使用hbaseapi对hbase_comments表，只查询comments列的值。

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    conf.set("hbase.zookeeper.quorum", "node01:2181,node02:2181,node03:2181");
    Connection connection = ConnectionFactory.createConnection(conf);
    Admin admin = connection.getAdmin();
    Table tableName = connection.getTable(TableName.valueOf("hbase_comments"));
    Scan scan = new Scan();
    scan.addColumn(Bytes.toBytes("info"),Bytes.toBytes("comments"));
    ResultScanner scanner = tableName.getScanner(scan);
    for (Result result : scanner) {
        Cell[] cells = result.rawCells();
        for (Cell cell : cells) {
            System.out.println(
                    "RowKey= " + Bytes.toString(CellUtil.cloneRow(cell)) +
                            "\tcloneFamily= " + Bytes.toString(CellUtil.cloneFamily(cell)) +
                            "\tcloneQualifier= " + Bytes.toString(CellUtil.cloneQualifier(cell)) +
                            "\tcloneValue= " + Bytes.toString(CellUtil.cloneValue(cell))
            );
        }
    }
}

小埋璐璐

关注

3
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
综合性题目练习

1.数据的预处理阶段2.数据的入库操作阶段3.数据的分析阶段4.数据保存到数据库阶段5.数据的查询显示阶段给出数据格式表和数据示例，请先阅读数据说明，再做相应题目。数据说明：表1-1 视频表原始数据：qR8WRLrO2aQ:mienge:406:People & Blogs:599:2788:5:1:0:4UUEKhr6vfA:zvDPXgPiiWI:TxP1eXHJQ...
复制链接

扫一扫