一、优化实战
### --- 数据说明
~~~ 学生信息表(student_txt)定义如下:
~~~ # 创建数据库
hive (default)> create database tuning;
hive (default)> use tuning;
~~~ # 创建表
hive (tuning)> create table if not exists tuning.student_txt(
s_no string comment '学号',
s_name string comment '姓名',
s_birth string comment '出生日期',
s_age int comment '年龄',
s_sex string comment '性别',
s_score int comment '综合得分',
s_desc string comment '自我介绍'
)
row format delimited
fields terminated by '\t';
~~~ # 数据加载
hive (tuning)> load data local inpath '/home/hadoop/data/student/*.txt' into table
tuning.student_txt;
~~~ # 查询数据
~~~ 数据文件位置:/root/hive/student,50个文件,每个文件平均大小 40M 左右,包含4W条左右的信息;
hive (tuning)> select count(*) from tuning.student_txt;
Hadoop job information for Stage-1: number of mappers: 9; number of reducers: 1 //有9个mappers,1个reducers
~~~~~~~~~~
Stage-Stage-1: Map: 9 Reduce: 1 Cumulative CPU: 71.32 sec HDFS Read: 2193236480 HDFS Write: 107 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 11 seconds 320 msec
~~~~~~~~~~
2000000
二、SQL案例
### --- 查询 student_txt 表,每个年龄最晚出生和最早出生的人的出生日期,
~~~ 并将其存入表student_stat 中。student_stat 表结构如下:
hive (tuning)> create table student_stat
(age int, brith string)
partitioned by (tp string);
### --- 需要执行的SQL如下:
~~~ 开启动态分区
hive (tuning)> set hive.exec.dynamic.partition=true;
hive (tuning)> set hive.exec.dynamic.partition.mode=nonstrict;