前言
在关联hive SQL时,yarn application有多个与其对应,所以需要划分二者对应关系。通过开启hive seesion,解析hive session日志,来解析 hive sql 于hive job[即yarn application]的对应关系
hive hook
hook类型
- PreExecute and PostExecute: 扩展Hook接口,用于 hive SQL执行之前,执行之后的操作处理
- ExecuteWithHookContext :扩展Hook接口,通过HookContext,包含所有hook实现类要用的信息.HookContext会被传递给所有带"WithContext"的Hook类
- HiveDriverRunHook ,扩展 Hook类,在hive jdbc driver运行时允许自定义的逻辑处理程序处理
- HiveSemanticAnalyzerHook,扩展 Hook类,允许可插拔的自定义SQL语义分析逻辑组件。通过preAnalyze(),postAnalyze()方法,在hive执行之前和之后进行语义分析
- HiveSessionHook ,扩展 Hook类,提供session级别hook,会在一个新的session启动时调用,通过hive.server2.session.hook配置
hive 现成的hook 例子
- DriverTestHook, 打印hive执行的SQL命令
- PreExecutePrinter and PostExecutePrinter : hive执行之前和之后时进行打印hive执行相关参数的hook类
- ATSHook, 是个ExecuteWithHookContext,推送query和plan信息到YARN timeline server
- EnforceReadOnlyTables, 是个ExecuteWithHookContext,阻止对只读表的修改
- LineageLogger ,是个ExecuteWithHookContext,打印血缘信息到日志文件,血缘信息保护所有的查询血缘
- PostExecOrcFileDump 后执行hook,打印ORC file信息
- PostExecTezSummaryPrinter 后执行hook,打印Tez 计数器的相关描述
- UpdateInputAccessTimeHook 预执行hook,在访问hive 输入表之前[即允许查询SQL]更新输入表访问信息
开发流程
配置依赖
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>2.1.1</version>
</dependency> <dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>2.1.1</version>
</dependency>
代码
package com.wacai.stanlee.hook;
import org.apache.commons.lang.CharEncoding;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.hive.conf.HiveConf;
import org.apache.hadoop.hive.ql.QueryPlan;
import org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext;
import org.apache.hadoop.hive.ql.hooks.HookContext;
import org.apache.hadoop.hive.ql.session.SessionState;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.net.URLEncoder;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;
public class JcRestHook implements ExecuteWithHookContext {
private static Logger logger = LoggerFactory.getLogger(JcRestHook.class);
public void run(HookContext hookContext) throws Exception {
//获取查询计划
QueryPlan queryPlan = hookContext.getQueryPlan();
HiveConf conf = hookContext.getConf();
String queryId = queryPlan.getQueryId();
if(StringUtils.isEmpty(queryId)){
logger.warn("queryId is null or empty,return");
}
logger.info("queryId: "+queryId);
//获取query string(sql)
String queryStr = URLEncoder.encode(queryPlan.getQueryStr(), CharEncoding.UTF_8);
logger.info("queryStr: "+queryStr);
//获取hadoopJobName
//String jobName = conf.getVar(HiveConf.ConfVars.HADOOPJOBNAME);
String jobName = conf.getVar(HiveConf.ConfVars.HIVEQUERYNAME);
//logger.info("jobName: "+jobName);
//获取hiveserver hook
String server = conf.getAllProperties().getProperty("hiveserver2.execute.hook.server");
if(StringUtils.isEmpty(server)){
logger.warn("server is null or empty,return");
return;
}
logger.info("server: "+server);
String rest = conf.getAllProperties().getProperty("hiveserver2.execute.hook.rest");
logger.info("rest: "+rest);
if(StringUtils.isEmpty(rest)){
logger.warn("rest is null or empty,return");
return;
}
Map<String,String> params = new HashMap<String, String>();
params.put("server",server);
params.put("hook",hookContext.getHookType().toString());
params.put("queryId",queryId);
params.put("queryStr",queryStr);
//确定是否获取的hadoopJobName,applicationName
params.put("jobName",jobName);
params.put("datetime",String.valueOf(new Date().getTime()));
params.put("histFileName", SessionState.get().getHiveHistory().getHistFileName());
}
}
打包后,放置到hive auxlib目录,或进行加载
设置hive-site.xml参数,重启hive服务
<property>
<name>hive.session.history.enabled</name>
<value>true</value>
<description>hive session for hive hook</description>
</property>
<property>
<name>hive.exec.pre.hooks</name>
<value>com.wacai.stanlee.prototype.UpdateInputAccessTimeHook$PreExec</value>
<description>update operations of last access time for hive metastor </description>
</property>
<property>
<name>hive.exec.post.hooks</name>
<value>com.wacai.stanlee.hivehook.GetTbOwnerHook,com.wacai.stanlee.prototype.HqlLinageHook</value>
</property>
<property>
引用文章
- http://dharmeshkakadia.github.io/hive-hook/
- https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/hooks/PostExecutePrinter.java
- https://www.cnblogs.com/smartloli/p/5928919.html
- demigelemiao https://www.cnblogs.com/yurunmiao/p/4224137.html