我们每次执行hive的hql时,shell里都会提示一段话:
...
Number of reduce tasks not specified. Estimated from input data size: 500
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
...
这个是调优的经常手段,主要有一下三个属性来决定
hive.exec.reducers.bytes.per.reducer 这个参数控制一个job会有多少个reducer来处理,依据的是输入文件的总大小。默认1GB。
This controls how many reducers a map-reduce job should have, depending on the total size of input files to the job. Default is 1GBhive.exec.reducers.max 这个参数控制最大的reducer的数量, 如果 input / bytes per reduce > max 则会启动这个参数所指定的reduce个数。 这个并不会影响mapre.reduce.tasks参数的设置。默认的max是999。
This controls the maximum number of reducers a map-reduce job can have. If input_file_size divided by "hive.exec.bytes.per.reducer" is greater than this value, the map-reduce job will ha