hadoop学习之hive练习题+答案

10 篇文章 0 订阅

所用数据:https://pan.baidu.com/s/1XJBCKCMG727XQ6KA1E5Pow
提取码:dsdz

hadoop学习之hive练习题+答案

第一题

建表

create table visit(
user_id STRING,
shop STRING
)
row format delimited
fields terminated by ' ';
load data local inpath '/root/kb08/hive/1/jd.txt' into table visit;

1、每个店铺UV(访客数)

select shop,count(*)UV
from visit
group by shop;
+-------+-----+
| shop  | uv  |
+-------+-----+
| a     | 80  |
| b     | 40  |
| c     | 32  |
+-------+-----+

2、每个店铺访问次数top3的访客信息

with
r1 as 
(select
shop,user_id,count(*)UV
from visit
group by shop,user_id
having shop='a'
order by UV DESC
limit 3),
r2 as 
(select
shop,user_id,count(*)UV
from visit
group by shop,user_id
having shop='b'
order by UV DESC
limit 3),
r3 as 
(select
shop,user_id,count(*)UV
from visit
group by shop,user_id
having shop='c'
order by UV DESC
limit 3)
select * from r1 union all select * from r2 union all select * from r3;

+-----------+--------------+---------+--+
| _u1.shop  | _u1.user_id  | _u1.uv  |
+-----------+--------------+---------+--+
| a         | u1           | 32      |
| a         | u5           | 24      |
| a         | u2           | 16      |
| b         | u4           | 16      |
| b         | u5           | 8       |
| b         | u2           | 8       |
| c         | u2           | 16      |
| c         | u6           | 8       |
| c         | u3           | 8       |
+-----------+--------------+---------+--+

第二题

建表

create table storeconsume(
shop STRING,
month STRING,
price DOUBLE
)
row format delimited
fields terminated by ',';
load data local inpath '/root/kb08/hive/2/store.txt' into table storeconsume;

每个店铺的当月销售额和累计到当月的总销售额

select 
shop,
month,
SUM,
sum(SUM) over(distribute by shop sort by month)
from
(select 
shop,
month,
sum(price) SUM
from storeconsume
group by shop,month) r1;

+-------+--------+---------+----------+--+
| shop  | month  |   sum   |  _wcol0  |
+-------+--------+---------+----------+--+
| a     | 01     | 350.0   | 350.0    |
| a     | 02     | 5000.0  | 5350.0   |
| a     | 03     | 600.0   | 5950.0   |
| b     | 01     | 7800.0  | 7800.0   |
| b     | 02     | 2500.0  | 10300.0  |
| c     | 01     | 470.0   | 470.0    |
| c     | 02     | 630.0   | 1100.0   |
+-------+--------+---------+----------+--+

第三题

建表

create table sale(
name STRING,
month STRING,
pv INT
)
row format delimited
fields terminated by ',';
load data local inpath '/root/kb08/hive/3/pv.txt' into table sale;

每个用户截止到每月为止的最大单月访问次数和累计到该月的总访问次数。

select
name,
month,
max_pv,
sum(sum_pv) over(distribute by name sort by month) sum
from
(select
name,
month,
max(pv) max_pv,
sum(pv) sum_pv
from sale
group by name,month) r1;

+-------+----------+---------+------+--+
| name  |  month   | max_pv  | sum  |
+-------+----------+---------+------+--+
| A     | 2015-01  | 15      | 33   |
| A     | 2015-02  | 6       | 43   |
| A     | 2015-03  | 22      | 81   |
| B     | 2015-01  | 25      | 30   |
| B     | 2015-02  | 10      | 45   |
| B     | 2015-03  | 23      | 89   |
+-------+----------+---------+------+--+

第四题

建表

-- 科目表(编号,名称,任课老师编号)
create table course(
sub_id INT,
sub_name STRING,
teacher_id INT
)
row format delimited
fields terminated by '\t';
load data local inpath '/root/kb08/hive/4/course.txt' into table course;
-- 成绩表(学生编号,科目编号,分数)
create table stuscore(
stu_id INT,
sub_id INT,
score DOUBLE
)
row format delimited
fields terminated by '\t';
load data local inpath '/root/kb08/hive/4/score.txt' into table stuscore;
-- 学生表(学生编号,学生姓名,出生日期,性别)
create table student(
stu_id INT,
stu_name STRING,
brithday DATE,
stu_gender STRING
)
row format delimited
fields terminated by '\t';
load data local inpath '/root/kb08/hive/4/student.txt' into table student;
-- 老师表(教师编号,教师姓名)
create table teacher(
teacher_id INT,
teacher_name STRING
)
row format delimited
fields terminated by '\t';
load data local inpath '/root/kb08/hive/4/teacher.txt' into table teacher;

1、查询"01"课程比"02"课程成绩高的学生的信息及课程分数:

select T.stu_id,T.stu_name,T.brithday,T.stu_gender,r1.score
from
(select stu_id,score from stuscore where sub_id='1') r1
inner join
(select stu_id,score from stuscore where sub_id='2') r2
on r1.stu_id = r2.stu_id
join student T
on T.stu_id = r2.stu_id
where r1.score > r2.score;
+-----------+-------------+-------------+---------------+-----------+--+
| t.stu_id  | t.stu_name  | t.brithday  | t.stu_gender  | r1.score  |
+-----------+-------------+-------------+---------------+-----------+--+
| 2         | 钱电          | 1990-12-21  || 70.0      |
| 4         | 李云          | 1990-08-06  || 50.0      |
+-----------+-------------+-------------+---------------+-----------+--+

2、查询"01"课程比"02"课程成绩低的学生的信息及课程分数:

select T.stu_id,T.stu_name,T.brithday,T.stu_gender,r1.score
from
(select stu_id,score from stuscore where sub_id='1') r1
inner join
(select stu_id,score from stuscore where sub_id='2') r2
on r1.stu_id = r2.stu_id
join student T
on T.stu_id = r2.stu_id
where r1.score < r2.score;
+-----------+-------------+-------------+---------------+-----------+--+
| t.stu_id  | t.stu_name  | t.brithday  | t.stu_gender  | r1.score  |
+-----------+-------------+-------------+---------------+-----------+--+
| 1         | 赵雷          | 1990-01-01  || 80.0      |
| 5         | 周梅          | 1991-12-01  || 76.0      |
+-----------+-------------+-------------+---------------+-----------+--+

3、查询平均成绩大于等于 60 分的同学的学生编号和学生姓名和平均成绩:

select T.stu_id,T.stu_name,r1.avg_score
from
(select stu_id,avg(score) avg_score from stuscore group by stu_id) r1
inner join student T
on T.stu_id = r1.stu_id
where r1.avg_score>=60;
+-----------+-------------+--------------------+--+
| t.stu_id  | t.stu_name  |    r1.avg_score    |
+-----------+-------------+--------------------+--+
| 1         | 赵雷          | 89.66666666666667  |
| 2         | 钱电          | 70.0               |
| 3         | 孙风          | 80.0               |
| 5         | 周梅          | 81.5               |
| 7         | 郑竹          | 93.5               |
+-----------+-------------+--------------------+--+

4、查询平均成绩小于 60 分的同学的学生编号和学生姓名和平均成绩:

select T.stu_id,T.stu_name,r1.avg_score
from
(select stu_id,sum(score)/3 avg_score from stuscore group by stu_id) r1
inner join student T
on T.stu_id = r1.stu_id
where r1.avg_score<60;
+-----------+-------------+---------------------+--+
| t.stu_id  | t.stu_name  |    r1.avg_score     |
+-----------+-------------+---------------------+--+
| 4         | 李云          | 33.333333333333336  |
| 5         | 周梅          | 54.333333333333336  |
| 6         | 吴兰          | 21.666666666666668  |
+-----------+-------------+---------------------+--+

5、查询所有同学的学生编号、学生姓名、选课总数、所有课程的总成绩

select T.stu_id,T.stu_name,r1.sub_num,r1.sum_score
from
(select stu_id,count(*) sub_num,sum(score) sum_score 
from stuscore
group by stu_id) r1
inner join student T
on T.stu_id = r1.stu_id;
+-----------+-------------+-------------+---------------+--+
| t.stu_id  | t.stu_name  | r1.sub_num  | r1.sum_score  |
+-----------+-------------+-------------+---------------+--+
| 1         | 赵雷          | 3           | 269.0         |
| 2         | 钱电          | 3           | 210.0         |
| 3         | 孙风          | 3           | 240.0         |
| 4         | 李云          | 3           | 100.0         |
| 5         | 周梅          | 2           | 163.0         |
| 6         | 吴兰          | 2           | 65.0          |
| 7         | 郑竹          | 2           | 187.0         |
+-----------+-------------+-------------+---------------+--+

6、查询"李"姓老师的数量:

select count(*) num from teacher where teacher_name like '李%';
+------+--+
| num  |
+------+--+
| 1    |
+------+--+

7、查询学过"张三"老师授课的同学的信息:

select T.stu_id,T.stu_name,T.brithday,T.stu_gender
from student T
inner join
stuscore E
on T.stu_id=E.stu_id
join
(select sub_id
from teacher R
inner join course C
on R.teacher_id=C.teacher_id
where R.teacher_name='张三') r1
where E.sub_id=r1.sub_id;

+-----------+-------------+-------------+---------------+--+
| t.stu_id  | t.stu_name  | t.brithday  | t.stu_gender  |
+-----------+-------------+-------------+---------------+--+
| 1         | 赵雷          | 1990-01-01  ||
| 2         | 钱电          | 1990-12-21  ||
| 3         | 孙风          | 1990-05-20  ||
| 4         | 李云          | 1990-08-06  ||
| 5         | 周梅          | 1991-12-01  ||
| 7         | 郑竹          | 1989-07-01  ||
+-----------+-------------+-------------+---------------+--+

8、查询没学过"张三"老师授课的同学的信息:

select T.stu_id,T.stu_name,T.brithday,T.stu_gender
from student T
left join
(select stu_id 
from stuscore E
join  course C
on E.sub_id=C.sub_id
join  teacher R
on C.teacher_id=R.teacher_id and teacher_name='张三') r1
on T.stu_id = r1.stu_id
where r1.stu_id is null;
+-----------+-------------+-------------+---------------+--+
| t.stu_id  | t.stu_name  | t.brithday  | t.stu_gender  |
+-----------+-------------+-------------+---------------+--+
| 6         | 吴兰          | 1992-03-01  ||
| 8         | 王菊          | 1990-01-20  ||
+-----------+-------------+-------------+---------------+--+

9、查询学过编号为"01"并且也学过编号为"02"的课程的同学的信息:

select T.stu_id,T.stu_name,T.brithday,T.stu_gender
from student T
join
(select stu_id from stuscore where sub_id='1') r1
on T.stu_id=r1.stu_id
join
(select stu_id from stuscore where sub_id='2') r2
on T.stu_id=r2.stu_id;
+-----------+-------------+-------------+---------------+--+
| t.stu_id  | t.stu_name  | t.brithday  | t.stu_gender  |
+-----------+-------------+-------------+---------------+--+
| 1         | 赵雷          | 1990-01-01  ||
| 2         | 钱电          | 1990-12-21  ||
| 3         | 孙风          | 1990-05-20  ||
| 4         | 李云          | 1990-08-06  ||
| 5         | 周梅          | 1991-12-01  ||
+-----------+-------------+-------------+---------------+--+

10、查询学过编号为"01"但是没有学过编号为"02"的课程的同学的信息:

select T.stu_id,T.stu_name,T.brithday,T.stu_gender
from student T
join
(select stu_id from stuscore where sub_id='1') r1
on T.stu_id=r1.stu_id
left join
(select stu_id from stuscore where sub_id='2') r2
on T.stu_id=r2.stu_id
where r2.stu_id is null;
+-----------+-------------+-------------+---------------+--+
| t.stu_id  | t.stu_name  | t.brithday  | t.stu_gender  |
+-----------+-------------+-------------+---------------+--+
| 6         | 吴兰          | 1992-03-01  ||
+-----------+-------------+-------------+---------------+--+
  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
1.上传tar包 2.解压 tar -zxvf hive-1.2.1.tar.gz 3.安装mysql数据库 推荐yum 在线安装 4.配置hive (a)配置HIVE_HOME环境变量 vi conf/hive-env.sh 配置其中的$hadoop_home (b)配置元数据库信息 vi hive-site.xml 添加如下内容: javax.jdo.option.ConnectionURL jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true JDBC connect string for a JDBC metastore javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver Driver class name for a JDBC metastore javax.jdo.option.ConnectionUserName root username to use against metastore database javax.jdo.option.ConnectionPassword hadoop password to use against metastore database 5.安装hive和mysq完成后,将mysql的连接jar包拷贝到$HIVE_HOME/lib目录下 如果出现没有权限的问,在mysql授权(在安装mysql的机器上执行) mysql -uroot -p #(执行下面的语句 *.*:所有库下的所有表 %:任何IP地址或主机都可以连接) GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'root' WITH GRANT OPTION; FLUSH PRIVILEGES; 6. Jline包版本不一致的问,需要拷贝hive的lib目录中jline.2.12.jar的jar包替换掉hadoop中的 /home/hadoop/app/hadoop-2.6.4/share/hadoop/yarn/lib/jline-0.9.94.jar 启动hive bin/hive ---------------------------------------------------------------------------------------------------- Hive几种使用方式: 1.Hive交互shell bin/hive 2.Hive JDBC服务(参考java jdbc连接mysql) 3.hive启动为一个服务器,来对外提供服务 bin/hiveserver2 nohup bin/hiveserver2 1>/var/log/hiveserver.log 2>/var/log/hiveserver.err & 启动成功后,可以在别的节点上用beeline去连接 bin/beeline -u jdbc:hive2://mini1:10000 -n root 或者 bin/beeline ! connect jdbc:hive2://mini1:10000 4.Hive命令 hive -e ‘sql’ bin/hive -e 'select * from t_test'
HadoopHive、Spark和Zookeeper都是大数据技术栈中重要的组件。 Hadoop是一个由Apache开源的分布式文件系统和计算框架。它能够将大规模的数据分散存储在千台、万台、甚至更多的服务器上,并且实现数据的高效处理和分析。 Hive是在Hadoop之上构建的数据仓库基础设施。它提供了一个类似于SQL的查询语言,可以让用户通过简单的查询语句对存储在Hadoop集群中的数据进行操作和分析。Hive可以将结构化和半结构化的数据映射为一张表格,并提供了诸如过滤、连接、聚合等功能。 Spark是一个快速的、通用的集群计算系统。它提供了分布式数据处理的能力,采用了内存计算方式,相比于Hadoop MapReduce带来了更高的性能和更广泛的应用场景。Spark支持多种编程语言和丰富的组件库,如Spark SQL、Spark Streaming等,可以用于数据处理、机器学习、图计算等任务。 Zookeeper是一个开源的分布式协调服务。它实现了一个分布式的、高可用的、高性能的协调系统,可以提供诸如统一命名服务、配置管理、分布式锁等功能。Zookeeper能够帮助大数据系统中的不同组件进行协调和通信,确保分布式系统的一致性和可靠性。 综上所述,Hadoop用于分布式存储和计算,Hive用于数据仓库和查询分析,Spark用于高性能的集群计算,而Zookeeper用于分布式协调和通信。这四个技术在大数据领域中扮演着不可替代的角色,相辅相成,同构建了现代大数据生态系统。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值