自定义 Hive Hook

前言

在关联hive SQL时,yarn application有多个与其对应,所以需要划分二者对应关系。通过开启hive seesion,解析hive session日志,来解析 hive sql 于hive job[即yarn application]的对应关系

hive hook

hook类型

  • PreExecute and PostExecute: 扩展Hook接口,用于 hive SQL执行之前,执行之后的操作处理
  • ExecuteWithHookContext :扩展Hook接口,通过HookContext,包含所有hook实现类要用的信息.HookContext会被传递给所有带"WithContext"的Hook类
  • HiveDriverRunHook ,扩展 Hook类,在hive jdbc driver运行时允许自定义的逻辑处理程序处理
  • HiveSemanticAnalyzerHook,扩展 Hook类,允许可插拔的自定义SQL语义分析逻辑组件。通过preAnalyze(),postAnalyze()方法,在hive执行之前和之后进行语义分析
  • HiveSessionHook ,扩展 Hook类,提供session级别hook,会在一个新的session启动时调用,通过hive.server2.session.hook配置

hive 现成的hook 例子

  • DriverTestHook, 打印hive执行的SQL命令
  • PreExecutePrinter and PostExecutePrinter : hive执行之前和之后时进行打印hive执行相关参数的hook类
  • ATSHook, 是个ExecuteWithHookContext,推送query和plan信息到YARN timeline server
  • EnforceReadOnlyTables, 是个ExecuteWithHookContext,阻止对只读表的修改
  • LineageLogger ,是个ExecuteWithHookContext,打印血缘信息到日志文件,血缘信息保护所有的查询血缘
  • PostExecOrcFileDump 后执行hook,打印ORC file信息
  • PostExecTezSummaryPrinter 后执行hook,打印Tez 计数器的相关描述
  • UpdateInputAccessTimeHook 预执行hook,在访问hive 输入表之前[即允许查询SQL]更新输入表访问信息

开发流程

配置依赖

 <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>2.1.1</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>2.1.1</version>
        </dependency> <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>2.1.1</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>2.1.1</version>
        </dependency>

代码

package com.wacai.stanlee.hook;

import org.apache.commons.lang.CharEncoding;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.hive.conf.HiveConf;
import org.apache.hadoop.hive.ql.QueryPlan;
import org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext;
import org.apache.hadoop.hive.ql.hooks.HookContext;
import org.apache.hadoop.hive.ql.session.SessionState;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.net.URLEncoder;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;

public class JcRestHook implements ExecuteWithHookContext {
    private static Logger logger = LoggerFactory.getLogger(JcRestHook.class);
    public void run(HookContext hookContext) throws Exception {
        //获取查询计划
        QueryPlan queryPlan = hookContext.getQueryPlan();
        HiveConf conf = hookContext.getConf();

        String queryId = queryPlan.getQueryId();
        if(StringUtils.isEmpty(queryId)){
            logger.warn("queryId is null or empty,return");
        }
        logger.info("queryId:  "+queryId);

        //获取query string(sql)
        String queryStr = URLEncoder.encode(queryPlan.getQueryStr(), CharEncoding.UTF_8);

        logger.info("queryStr: "+queryStr);

        //获取hadoopJobName
        //String  jobName = conf.getVar(HiveConf.ConfVars.HADOOPJOBNAME);
        String jobName = conf.getVar(HiveConf.ConfVars.HIVEQUERYNAME);
        //logger.info("jobName: "+jobName);

        //获取hiveserver hook
        String server = conf.getAllProperties().getProperty("hiveserver2.execute.hook.server");
        if(StringUtils.isEmpty(server)){
            logger.warn("server is null or empty,return");
            return;
        }
        logger.info("server:  "+server);

        String rest = conf.getAllProperties().getProperty("hiveserver2.execute.hook.rest");
        logger.info("rest: "+rest);


        if(StringUtils.isEmpty(rest)){
            logger.warn("rest is null or empty,return");
            return;
        }

        Map<String,String> params = new HashMap<String, String>();
        params.put("server",server);
        params.put("hook",hookContext.getHookType().toString());
        params.put("queryId",queryId);
        params.put("queryStr",queryStr);
        //确定是否获取的hadoopJobName,applicationName
        params.put("jobName",jobName);
        params.put("datetime",String.valueOf(new Date().getTime()));
        params.put("histFileName", SessionState.get().getHiveHistory().getHistFileName());


    }
}



打包后,放置到hive auxlib目录,或进行加载

设置hive-site.xml参数,重启hive服务

  • hive-site.xml
<property>
    <name>hive.session.history.enabled</name>
    <value>true</value>
   <description>hive session for hive hook</description>
  </property>




 <property>
    <name>hive.exec.pre.hooks</name>
<value>com.wacai.stanlee.prototype.UpdateInputAccessTimeHook$PreExec</value>
    <description>update operations of last access time for hive metastor </description>
  </property>
  
  <property>
    <name>hive.exec.post.hooks</name>
<value>com.wacai.stanlee.hivehook.GetTbOwnerHook,com.wacai.stanlee.prototype.HqlLinageHook</value>
  </property>
  <property>

引用文章

  • http://dharmeshkakadia.github.io/hive-hook/
  • https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/hooks/PostExecutePrinter.java
  • https://www.cnblogs.com/smartloli/p/5928919.html
  • demigelemiao https://www.cnblogs.com/yurunmiao/p/4224137.html
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值