1、问题描述:
公司的数据平台的HiveJdbc查询一直有一个问题,就是日志获取太慢了,有时候差不多和结果一起出来的,这就非常影响用户的体验,半天都没任何输出。另一个是Beeline客户端不一致,beeline客户端每次都能很快的获取日志。
这里首先我们普及一个经验就是第一批日志获取的快慢,非常影响用户体验。如果第一批日志来的快,用户可以确认任务已经开始跑了,MR的JobId也会返回。日志的示例如下:
INFO : Compiling command(queryId=app_20180412185224_ebd3d373-31bb-430b-9daf-44f01049a9d4): select count(*) from ods.team
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null)
INFO : Completed compiling command(queryId=app_20180412185224_ebd3d373-31bb-430b-9daf-44f01049a9d4); Time taken: 0.057 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=app_20180412185224_ebd3d373-31bb-430b-9daf-44f01049a9d4): select count(*) from ods.team
INFO : Query ID = app_20180412185224_ebd3d373-31bb-430b-9daf-44f01049a9d4
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Number of reduce tasks determined at compile time: 1
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : number of splits:3
INFO : Submitting tokens for job: job_1523499276700_0219
hive-jdbc查询后端的流程(hive-jdbc版本2.1.0):
1、启动一个线程T1,T1用于阻塞获取结果,T1还用于启动日志获取线程T2
2、T2启动获取Hive的查询日志
详细的代码请看下面这个DEMO, T1就是main线程获取hive日志的结果,T2就是LogTask获取hive的查询日志
public class HiveJdbcQueryLog {
public static void main(String[] args) throws Exception {
Class.forName("org.apache.hive.jdbc.HiveDriver");
Connection connection = DriverManager.getConnection("jdbc:hive2://hive-server0:10000", "app", "");
HiveStatement stmt = (HiveStatement) connection.createStatement();
String sql = "select count(*) from table";
try {
Thread logThread = new Thread(new LogTask(stmt));
logThread.setDaemon(true);
logThread.st