用户信息表
select * from b;
+-------+---------+--+
| b.id | b.name |
+-------+---------+--+
| 1 | 张三 |
| 2 | 李四 |
| 3 | 赵六 |
+-------+---------+--+
用户爱好记录表
select * from c;
+-------+----------+--+
| c.id | c.aihao |
+-------+----------+--+
| 1 | 打球 |
| 1 | 唱歌 |
| 1 | 看书 |
| 2 | 看书 |
| 2 | 打球 |
| 2 | 跳舞 |
| 2 | 飙车 |
| 2 | 打游戏 |
| 3 | 看书 |
+-------+----------+--+
计算每个同学的name,以及有几个爱好
sql
select
b.name,
tmp.cts
from
(select id,sum(1) as cts from c group by id) tmp
join b
on tmp.id=b.id
;
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 2 Cumulative CPU: 5.4 sec HDFS Read: 11670 HDFS Write: 249 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 2 Cumulative CPU: 4.89 sec HDFS Read: 7419 HDFS Write: 249 SUCCESS
Stage-Stage-7: Map: 1 Cumulative CPU: 1.65 sec HDFS Read: 5718 HDFS Write: 168 SUCCESS
Total MapReduce CPU Time Spent: 11 seconds 940 msec
OK
+---------+----------+--+
| b.name | tmp.cts |
+---------+----------+--+
| 李四 | 5 |
| 张三 | 3 |
| 赵六 | 1 |
+---------+----------+--+
3 rows selected (79.642 seconds)
正常情况产生了3个job
不正常情况如果某个job卡住了比如application_1628137675824_0060这个任务显示reduce长时间卡在98%或者99%
Starting Job = job_1628137675824_0060, Tracking URL = http://wxt01:8088/proxy/application_1628137675824_0059/
Kill Command = /usr/app/hadoop-2.8.5//bin/hadoop job -kill job_1628137675824_0060
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 2
2021-08-05 19:20:56,449 Stage-2 map = 0%, reduce = 0%
2021-08-05 19:21:01,601 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.91 sec
2021-08-05 19:21:06,753 Stage-2 map = 100%, reduce = 50%, Cumulative CPU 3.68 sec
2021-08-05 19:21:09,904 Stage-2 map = 100%, reduce = 99%, Cumulative CPU 3.68 sec
2021-08-05 19:22:09,904 Stage-2 map = 100%, reduce = 99%, Cumulative CPU 3.68 sec
2021-08-05 19:23:09,904 Stage-2 map = 100%, reduce = 99%, Cumulative CPU 3.68 sec
2021-08-05 19:24:09,904 Stage-2 map = 100%, reduce = 99%, Cumulative CPU 3.68 sec
2021-08-05 19:25:09,904 Stage-2 map = 100%, reduce = 99%, Cumulative CPU 3.68 sec
排查是不是数据倾斜,在yarn上找到这个job_1628137675824_0060,上面显示是stage-2阶段,我们点进去看reduce,如果每个reduce task运行时间都差不多,可能是reduce个数分配的少了,也有可能是这个节点本身的原因,我们有开启那个推测执行,如果推测执行启动的任务也还是慢,就说明是发生了数据倾斜.
我们在yarn外部界面中找到job_1628137675824_0060这个任务,点进去查看reduce tasks的counters
比如其它运行完的reduce task 这个读取的条数为差不多都在20000000条左右,
然后这个卡住的这个reduce task读取的数据条数为2个亿左右,明显比其他reduce中的数据多了10倍.
所以是发生了数据倾斜.就是在job_1628137675824_0060这个任务,stage-2阶段
然后去查看sql的执行计划
explain
select
b.name,
tmp.cts
from
(select id,sum(1) as cts from c group by id) tmp
join b
on tmp.id=b.id
;
说明是group by阶段sum时发生了数据倾斜
表c中有倾斜的key
然后查看这个表里c表中的数据
select
id,
count(1) as cts
from c
group by id
order by cts desc limit 100;
然后针对情况进行具体解决