CC00028.hadoop——|Hadoop&Hive.V28|——|Hive.v28|Hive优化策略|实战.v04|

yanqi_vip

已于 2022-04-10 16:38:58 修改

阅读量118

点赞数

分类专栏： bigdatav007——hadoop.v002 文章标签：大数据 hive java hadoop spark

于 2022-04-07 14:42:00 首次发布

不予转载

本文链接：https://blog.csdn.net/yanqi_vip/article/details/124031111

版权

本文详细探讨了Hadoop与Hive在大数据处理中的优化策略，通过具体的SQL案例展示如何提升查询效率，并对执行计划进行了深入分析，旨在帮助读者掌握Hive的性能调优技巧。

摘要由CSDN通过智能技术生成

一、优化实战

### --- 数据说明

~~~     学生信息表(student_txt)定义如下：

~~~     # 创建数据库
hive (default)> create database tuning;
hive (default)> use tuning;

~~~     # 创建表

hive (tuning)> create table if not exists tuning.student_txt(
s_no string comment '学号',
s_name string comment '姓名',
s_birth string comment '出生日期',
s_age int comment '年龄',
s_sex string comment '性别',
s_score int comment '综合得分',
s_desc string comment '自我介绍'
)
row format delimited
fields terminated by '\t';

~~~     # 数据加载
hive (tuning)> load data local inpath '/home/hadoop/data/student/*.txt' into table
tuning.student_txt;

~~~     # 查询数据
~~~     数据文件位置：/root/hive/student，50个文件，每个文件平均大小 40M 左右，包含4W条左右的信息；

hive (tuning)> select count(*) from tuning.student_txt;
Hadoop job information for Stage-1: number of mappers: 9; number of reducers: 1 //有9个mappers，1个reducers
~~~~~~~~~~
Stage-Stage-1: Map: 9  Reduce: 1   Cumulative CPU: 71.32 sec   HDFS Read: 2193236480 HDFS Write: 107 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 11 seconds 320 msec
~~~~~~~~~~
2000000

二、SQL案例

### --- 查询 student_txt 表，每个年龄最晚出生和最早出生的人的出生日期，
~~~     并将其存入表student_stat 中。student_stat 表结构如下：

hive (tuning)> create table student_stat
(age int, brith string)
partitioned by (tp string);

### --- 需要执行的SQL如下：
~~~     开启动态分区

hive (tuning)> set hive.exec.dynamic.partition=true;
hive (tuning)> set hive.exec.dynamic.partition.mode=nonstrict;