Hive源码分析(1)——CLi输入处理

最新推荐文章于 2023-11-29 15:11:00 发布

天天乐见

最新推荐文章于 2023-11-29 15:11:00 发布

阅读量747

点赞数

分类专栏：大数据组件文章标签： hive

本文链接：https://blog.csdn.net/a794922102/article/details/105811957

版权

大数据组件专栏收录该内容

21 篇文章 1 订阅

订阅专栏

Hive源码分析(一)——CLi输入处理

北京时间：2020年04月28日10:30

环境Hive3.1.1

1、程序栈主要执行流程

main:683, CliDriver (org.apache.hadoop.hive.cli)

程序入口：

public static void main(String[] args) throws Exception {
    int ret = new CliDriver().run(args);
    System.exit(ret);
}

run:759, CliDriver (org.apache.hadoop.hive.cli)

public  int run(String[] args) throws Exception {
	// 参数解析OptionsProcessor( -f -hiveconf -d -i 等等参数解析) 
    OptionsProcessor oproc = new OptionsProcessor();
    if (!oproc.process_stage1(args)) {
        return 1;
    }

     // log4j 日志加载 initHiveLog4j
    // ( 此时会调用HiveConf 并将里面的一些静态变量初始化了，获取日志的一些配置)
    boolean logInitFailed = false;
    String logInitDetailMessage;
    try {
        logInitDetailMessage = LogUtils.initHiveLog4j();
    } catch (LogInitializationException e) {
        logInitFailed = true;
        logInitDetailMessage = e.getMessage();
    }
	
    // 创建CliSessionState 、HiveConf
    CliSessionState ss = new CliSessionState(new HiveConf(SessionState.class));
    ss.in = System.in;
    try {
        ss.out = new PrintStream(System.out, true, "UTF-8");
        ss.info = new PrintStream(System.err, true, "UTF-8");
        ss.err = new CachingPrintStream(System.err, true, "UTF-8");
    } catch (UnsupportedEncodingException e) {
        return 3;
    }
	
    // 继续解析参数（-S -database -e -f -v -i等参数，并将其设置到CliSessionState ss）
    if (!oproc.process_stage2(ss)) {
        return 2;
    }

    // 当前会话是否在 silent 模式运行
    // 如果不是 silent 模式，info 级打在日志中的消息，都将以标准错误流的形式输出到控制台。
    if (!ss.getIsSilent()) {
        if (logInitFailed) {
            System.err.println(logInitDetailMessage);
        } else {
            SessionState.getConsole().printInfo(logInitDetailMessage);
        }
    }
    
	// 到此：以上代码主要是解析命令行中的配置参数
    
    // 设置通过命令行指定的所有属性
    HiveConf conf = ss.getConf();
    for (Map.Entry<Object, Object> item : ss.cmdProperties.entrySet()) {
        conf.set((String) item.getKey(), (String) item.getValue());
        ss.getOverriddenConfigurations().put((String) item.getKey(), 
                                             (String) item.getValue());
    }

    // 读取提示符配置（hive.cli.prompt）和替换变量
    prompt = conf.getVar(HiveConf.ConfVars.CLIPROMPT);
    prompt = new VariableSubstitution(new HiveVariableSource() {
        @Override
        public Map<String, String> getHiveVariable() {
            return SessionState.get().getHiveVariables();
        }
    }).substitute(conf, prompt);
    prompt2 = spacesForString(prompt);
	
    // 这里的if else 方法都是调用org.apache.hadoop.hive.ql.session.SessionState#start方法
    // 只是传入的参数不同
    if (HiveConf.getBoolVar(conf, ConfVars.HIVE_CLI_TEZ_SESSION_ASYNC)) {
        // 传入：start(startSs, true, console);
        SessionState.beginStart(ss, console);
    } else {
       	// 传入：start(startSs, false, null);
        SessionState.start(ss);
    }

    // 更新并且打印当前线程的名称：
    ss.updateThreadName();

   	// 创建和初始化视图注册缓存
    HiveMaterializedViewsRegistry.get().init();

    try {
        /****** 核心方法：下面开始执行 ******/
        return executeDriver(ss, conf, oproc);
    } finally {
        ss.resetThreadName();
        ss.close();
    }
}

小结：主要是处理Hive命令参数

解析Hive命令后接参数
同时设置到到HiveConf
执行命令

executeDriver:821, CliDriver (org.apache.hadoop.hive.cli)

/**
 * 执行命令
 * @param ss CliSessionState
 * @param conf HiveConf
 * @param oproc 命令行所设置的参数
 * @return status 返回执行状态
 * @throws Exception
 */
private int executeDriver(CliSessionState ss, HiveConf conf, OptionsProcessor oproc)
    throws Exception {
	// 创建CliDriver，并且设置命令行中所设置的参数
    CliDriver cli = new CliDriver();
    cli.setHiveVariables(oproc.getHiveVariables());

    // 使用指定的数据库（会自动执行：use hive_temp(具体的数据库名);）
    cli.processSelectDatabase(ss);

    // 初始化指定的SQL脚本文件
    cli.processInitFiles(ss);
	
    // 命令行使用-e参数(不进入hive模式，直接执行SQL字符串)
    if (ss.execString != null) {
        int cmdProcessStatus = cli.processLine(ss.execString);
        // 返回结果
        return cmdProcessStatus;
    }

    try {
        // 命令行使用-f参数（不进入hive模式，直接执行SQL脚本文件）
        if (ss.fileName != null) {
            /****** 处理SQL文件 ******/
            return cli.processFile(ss.fileName);
        }
    } catch (FileNotFoundException e) {
        System.err.println("Could not open input file for reading. (" + e.getMessage() + 
                           																	")");
        return 3;
    }
    if ("mr".equals(HiveConf.getVar(conf, ConfVars.HIVE_EXECUTION_ENGINE))) {
        console.printInfo(HiveConf.generateMrDeprecationWarning());
    }

    setupConsoleReader();
	// 下面是在hive模式中
    String line;
    int ret = 0;
    // 用于多行输入拼接为完整的一行SQL
    String prefix = "";
    // hive.cli.print.current.db：是否显示当前数据
    String curDB = getFormattedDb(conf, ss);
    String curPrompt = prompt + curDB;
    String dbSpaces = spacesForString(curDB);
	// 从控制台读取每一行
    while ((line = reader.readLine(curPrompt + "> ")) != null) {
        if (!prefix.equals("")) {
            prefix += '\n';
        }
        // 遇到注释跳过
        if (line.trim().startsWith("--")) {
            continue;
        }
        // 当前SQL遇到了结尾的一行(以";"为结尾)
        // 注意：这里只是以";"为结尾，如果";"在语句中间此处没有处理分割
        if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) {
            // 拼接SQL
            line = prefix + line;
            /****** 执行SQL(核心方法) ******/
            ret = cli.processLine(line, true);
            // 重置拼接的SQ
            prefix = "";
            curDB = getFormattedDb(conf, ss);
            curPrompt = prompt + curDB;
            dbSpaces = dbSpaces.length() == curDB.length() ? dbSpaces : 
            															spacesForString(curDB);
        } else {
            // 没有SQL结束，继续拼接
            prefix = prefix + line;
            curPrompt = prompt2 + dbSpaces;
            continue;
        }
    }

    return ret;
}

小结：根据参数处理相应不同输入类型的SQL

是否是Hive命令执行SQL字符串，直接处理SQL字符串，返回结果（processLine方法）
是否是HIve命令执行SQL文件，直接处理SQL文件，返回结果（processFile方法）
只能是Hive命令行模式，处理输入SQL（processLine方法）

processLine:402, CliDriver (org.apache.hadoop.hive.cli)

public int processLine(String line, boolean allowInterrupting) {
    SignalHandler oldSignal = null;
    Signal interruptSignal = null;
	// 是否允许中断
    if (allowInterrupting) {
        // 当前执行Ctrl+C
        interruptSignal = new Signal("INT");
        oldSignal = Signal.handle(interruptSignal, new SignalHandler() {
            private boolean interruptRequested;

            @Override
            public void handle(Signal signal) {
                boolean initialRequest = !interruptRequested;
                interruptRequested = true;

                 // 第一次Ctrl+C，关闭当前的java程序
                if (!initialRequest) {
                    console.printInfo("Exiting the JVM");
                    System.exit(127);
                }

                // 中断信息提示
                console.printInfo("Interrupting... Be patient, this might take some time.");
                console.printInfo("Press Ctrl+C again to kill JVM");

                // 第一次Ctrl+C，关闭所有在执行的MR jobs
                HadoopJobExecHelper.killRunningJobs();
                TezJobExecHelper.killRunningJobs();
                HiveInterruptUtils.interrupt();
            }
        });
    }

    try {
        int lastRet = 0, ret = 0;

        // 对命令行字符串，用';'分割开来，去除';'
        // 不能使用split方法，直接使用split可能会导致依旧含有';'
        List<String> commands = splitSemiColon(line);

        String command = "";
        // 遍历传入的命令串
        for (String oneCmd : commands) {
            // 如果是'\'结尾，则删除掉'\',并且加上";"
            if (StringUtils.endsWith(oneCmd, "\\")) {
                command += StringUtils.chop(oneCmd) + ";";
                continue;
            } else {
                command += oneCmd;
            }
            // 空字符串，跳过
            if (StringUtils.isBlank(command)) {
                continue;
            }
			 /****** 执行SQL（核心方法） ******/
            ret = processCmd(command);
            // 重置命令
            command = "";
            // 最后一次执行结果
            lastRet = ret;
            // 是否忽略错误
            boolean ignoreErrors = HiveConf.getBoolVar(conf, 
                                                       	HiveConf.ConfVars.CLIIGNOREERRORS);
            // 执行出错，并且不忽略错误，返回错误结果
            if (ret != 0 && !ignoreErrors) {
                return ret;
            }
        }
        return lastRet;
    } finally {
        // Once we are done processing the line, restore the old handler
        if (oldSignal != null && interruptSignal != null) {
            Signal.handle(interruptSignal, oldSignal);
        }
    }
}

小结：判断中断，处理SQL字符串

判断是否允许中断
处理SQL字符串
继续执行SQL

processCmd:127, CliDriver (org.apache.hadoop.hive.cli)

public int processCmd(String cmd) {
    CliSessionState ss = (CliSessionState) SessionState.get();
    ss.setLastCommand(cmd);

    ss.updateThreadName();

    // 刷新打印流，因此它不包括最后一个命令的输出
    ss.err.flush();
    String cmd_trimmed = HiveStringUtils.removeComments(cmd).trim();
    String[] tokens = tokenizeCmd(cmd_trimmed);
    int ret = 0;

    // 处理退出程序
    if (cmd_trimmed.toLowerCase().equals("quit") || 
        											cmd_trimmed.toLowerCase().equals("exit")) {
        // 如果我们已经走到这一步，要么前面的命令都成功，要么这是退出命令行
        // 无论哪种情况，这都算是成功运行
        ss.close();
        System.exit(0);

    } 
    // 在Hive模式下执行SQL脚本
    else if (tokens[0].equalsIgnoreCase("source")) {
        String cmd_1 = getFirstCmd(cmd_trimmed, tokens[0].length());
        cmd_1 = new VariableSubstitution(new HiveVariableSource() {
            @Override
            public Map<String, String> getHiveVariable() {
                return SessionState.get().getHiveVariables();
            }
        }).substitute(ss.getConf(), cmd_1);
		 // 获取文件
        File sourceFile = new File(cmd_1);
        if (! sourceFile.isFile()){
            console.printError("File: "+ cmd_1 + " is not a file.");
            ret = 1;
        } else {
            try {
                /****** 处理SQL文件 ******/
                ret = processFile(cmd_1);
            } catch (IOException e) {
                console.printError("Failed processing file "+ cmd_1 +" "+ 
                                   e.getLocalizedMessage(), stringifyException(e));
                ret = 1;
            }
        }
    } 
    // 对于shell命令的处理
    else if (cmd_trimmed.startsWith("!")) {
        // for shell commands, use unstripped command
        String shell_cmd = cmd.trim().substring(1);
        shell_cmd = new VariableSubstitution(new HiveVariableSource() {
            @Override
            public Map<String, String> getHiveVariable() {
                return SessionState.get().getHiveVariables();
            }
        }).substitute(ss.getConf(), shell_cmd);

        // shell_cmd = "/bin/bash -c \'" + shell_cmd + "\'";
        try {
            ShellCmdExecutor executor = new ShellCmdExecutor(shell_cmd, ss.out, ss.err);
            ret = executor.execute();
            if (ret != 0) {
                console.printError("Command failed with exit code = " + ret);
            }
        } catch (Exception e) {
            console.printError("Exception raised from Shell command " + 
                               				e.getLocalizedMessage(),stringifyException(e));
            ret = 1;
        }
    }  else { 
        try {
            // 进入本地模式，也就是Hive模式下的命令行输入SQL
            try (CommandProcessor proc = CommandProcessorFactory.get(tokens, 
                                                                     (HiveConf) conf)) {
                if (proc instanceof IDriver) {
                    /****** 继续处理 ******/
                    ret = processLocalCmd(cmd, proc, ss);
                } else {
                    ret = processLocalCmd(cmd_trimmed, proc, ss);
                }
            }
        } catch (SQLException e) {
            console.printError("Failed processing command " + tokens[0] + " " + 											e.getLocalizedMessage(), 
                               org.apache.hadoop.util.StringUtils.stringifyException(e));
            ret = 1;
        }
        catch (Exception e) {
            throw new RuntimeException(e);
        }
    }

    ss.resetThreadName();
    return ret;
}

小结：处理命令

继续处理命令，如果使用SQL文件，则执行processFile方法
本地SQL命令，则执行processLocalCmd方法

下面来看下对SQL文件的处理

org.apache.hadoop.hive.cli.CliDriver#processFile

public int processFile(String fileName) throws IOException {
    Path path = new Path(fileName);
    FileSystem fs;
    if (!path.toUri().isAbsolute()) {
        fs = FileSystem.getLocal(conf);
        path = fs.makeQualified(path);
    } else {
        fs = FileSystem.get(path.toUri(), conf);
    }
    BufferedReader bufferReader = null;
    int rc = 0;
    try {
        bufferReader = new BufferedReader(new InputStreamReader(fs.open(path)));
        // 到上面这一步为止都是在读取SQL文件
        
        /****** 处理SQL文件 ******/
        rc = processReader(bufferReader);
    } finally {
        IOUtils.closeStream(bufferReader);
    }
    return rc;
}

org.apache.hadoop.hive.cli.CliDriver#processReader

public int processReader(BufferedReader r) throws IOException {
    String line;
    StringBuilder qsb = new StringBuilder();
	// 读取文件中的每一行
    while ((line = r.readLine()) != null) {
        // 忽略SQL文件中的注释
        if (! line.startsWith("--")) {
            // SQL 拼接
            qsb.append(line + "\n");
        }
    }
	/****** 这里还是调用processLine方法 ******/
    return (processLine(qsb.toString()));
}

小结：只是添加了一个文件的读取，最后还是调用org.apache.hadoop.hive.cli.CliDriver#processReader方法

到这里为止，对于输入的处理已经完成，下一步便是对SQL的编译执行。

天天乐见

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hive源码分析(1)——CLi输入处理

##Hive源码分析(一)——CLi输入处理北京时间：2020年04月28日10:30环境Hive3.1.11、程序栈主要执行流程main:683, CliDriver (org.apache.hadoop.hive.cli)程序入口：public static void main(String[] args) throws Exception { int ret = ...
复制链接

扫一扫