大数据--HiveQL语句(实战练习)

题目:

1、 创建员工信息数据文件employess.txt,数据文件中包含内容如下(姓名、年龄、薪资、):

Lilith Hardy,30,6000,50,Finance Department

Byron Green,36,5000,25,Personnel Department

Yvette Ward,21,4500,15.5,

Arlen Esther,28,8000,20,Finance Department

Rupert Gold,39,10000,66,R&D Department

Deborah Madge,41,6500,0,R&D Department

Tim Springhall,22,6000,36.5,R&D Department

Olga Belloc,36,5600,10,Sales Department

Bruno Wallis,43,6700,0,Personnel Department

Flora Dan,27,4000,35,Sales Department

要求:查询员工信息表中员工年龄小于等于25岁或大于等于35岁的员工信息。

2、 创建学生数据文件students.txt,数据文件中包含内容如下(学号、姓名、专业、成绩):

1,student1, history,88

2,student2,geography,97

3,student3,chinese,65

4,student4,physics,63

5,student5, history,84

6,student6,psychology,72

7,student7, geography,69

8,student8, chinese,89

9,student9, history,86

10,student10, chinese,91

创建教师数据文件teacher.txt,数据文件中包含如下内容

history,teacher01

geography,teacher02

chinese,teacher03

physics,teacher04

psychology,teacher05

要求:

1) 对汉语专业同学按照课程成绩进行升序排序;

2) 使用左外连接,连接学生名单表与教师名单表;

3) 将历史专业的学生与教师合并在一起;

4) 计算地理专业学生的平均成绩

一.第一题

1.显示当前所有数据库

Show databases;

clip_image002

2.创建数据库study

CREATE DATABASE IF NOT EXISTS study 
COMMENT "This is study database"
LOCATION '/user/hive_db/create_db/';

clip_image004

3.切换数据库

USE study;

clip_image006

4.创建数据表employess_table

CREATE EXTERNAL TABLE IF NOT EXISTS 
study.employess_table(
name STRING COMMENT "This is name",
age INT COMMENT "This is age",
money INT COMMENT "This is money",
time FLOAT COMMENT "This is emm,I do not know",
department STRING COMMENT "This is department ")
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '_'
MAP KEYS TERMINATED BY ':'
LINES TERMINATED BY '\n
STORED AS textfile
LOCATION '/hive_data/class '
TBLPROPERTIES("comment"="This is a employess table");

clip_image008

5.导入数据

(1)创建数据文件employess

clip_image010

(2)准备数据文件

在HDFS创建目录/hive_data/class

hdfs dfs -mkdir /hive_data/class

将文件employess上传到HDFS的/hive_data/class目录

hdfs dfs -put /export/data/hive_data/employess /hive_data/class

(3)加载数据文件

(或者在这里创建数据表,指定/hive_data/class/为location)

向外部表employess_table加载文件employess

LOAD DATA INPATH '/hive_data/class/employess' OVERWRITE INTO TABLE study.employess_table;

clip_image012

6. 查询员工信息表中员工年龄小于等于25岁或大于等于35岁的员工信息

Select * from employess_table where age<=25 or age>=35;

clip_image014

二.第二题

1.创建数据文件

clip_image016

clip_image018

hdfs dfs -mkdir /hive_data/class
hdfs dfs -put /export/data/hive_data/student /hive_data/class
hdfs dfs -put /export/data/hive_data/teacher /hive_data/class

clip_image020

2.创建student_table和teacher_table

Student表手动导入数据(location填一个没有数据文件的地址,不能为空),Teacher表直接加载数据

(浏览器输入192.168.121.130及时查看HDFS文件系统的变化,便于理解)

导入原理参考:

https://wszyx.blog.csdn.net/article/details/111703745

https://blog.csdn.net/chandelierds/article/details/111456580

CREATE EXTERNAL TABLE IF NOT EXISTS 
study.student_table(
id INT COMMENT "This is id",
name STRING COMMENT "This is name",
subject STRING COMMENT "This is subject",
score INT COMMENT "This is score")
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '_'
MAP KEYS TERMINATED BY ':'
LINES TERMINATED BY '\n'
STORED AS textfile
LOCATION '/hive_data/hive'
TBLPROPERTIES("comment"="This is a student table");
LOAD DATA INPATH '/hive_data/class/student' OVERWRITE INTO TABLE study.student_table;

clip_image022

可以看到student文件已经不见了,转移到了表的location地址,hive文件夹下

clip_image024

clip_image026

CREATE EXTERNAL TABLE IF NOT EXISTS 
study.teacher_table(
subject STRING COMMENT "This is subject",
name STRING COMMENT "This is name")
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '_'
MAP KEYS TERMINATED BY ':'
LINES TERMINATED BY '\n'
STORED AS textfile
LOCATION '/hive_data/class'
TBLPROPERTIES("comment"="This is a teacher table");

这里teacher表就直接将class文件夹下的所有文件载入表中,原来的文件还是在class中(location地址)

clip_image028

3.HiveQL操作语句

(1)对汉语专业同学按照课程成绩进行升序排序

这里发现导入的数据有空格..所以直接用=会匹配不上..,可以使用模糊匹配或者trim函数去除两边空格

Select * from student_table where subject like '%chinese%' order by score asc;

clip_image030

(2)使用左外连接,连接学生名单表与教师名单表

select * from student_table s left join teacher_table t on trim(s.subject)=trim(t.subject);

clip_image032

(3)将历史专业的学生与教师合并在一起

select * from student_table s left join teacher_table t on trim(s.subject)=trim(t.subject) where trim(s.subject)= 'history';

clip_image034

(4)计算地理专业学生的平均成绩

Select avg(score) from student_table where trim(subject)='geography';

clip_image036

  • 2
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值