Hive开发中使用变量的两种方法

最新推荐文章于 2023-11-19 16:25:43 发布

worldchinalee

最新推荐文章于 2023-11-19 16:25:43 发布

阅读量6.8k

点赞数

分类专栏： hive 文章标签： hive 参数

hive 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

在使用hive开发数据分析代码时，经常会遇到需要改变运行参数的情况，比如select语句中对日期字段值的设定，可能不同时间想要看不同日期的数据，这就需要能动态改变日期的值。如果开发量较大、参数多的话，使用变量来替代原来的字面值非常有必要，本文总结了几种可以向hive的SQL中传入参数的方法，以满足类似的需要。

准备测试表和测试数据

第一步先准备测试表和测试数据用于后续测试：

然后执行建表和导入数据的sql文件：

 
  
         1 
       

         2 
       

         3 
       

         4 
       

         5 
       

         6 
       

         7 
       

         8 
       

         9 
       

         10 
       

         11 
       
 
        [ 
        czt 
        @ 
        www 
        . 
        crazyant 
        . 
        net  
        testHivePara 
        ] 
        $ 
          
        hive 
          
        - 
        f 
          
        student 
        . 
        sql  
       
 
        Hive  
        history  
        file 
        = 
        / 
        tmp 
        / 
        crazyant 
        . 
        net 
        / 
        hive_job_log_czt_201309131615_1720869864 
        . 
        txt 
       
 
        OK 
       
 
        Time  
        taken 
        : 
          
        2.131 
          
        seconds 
       
 
        OK 
       
 
        Time  
        taken 
        : 
          
        0.878 
          
        seconds 
       
 
        Copying  
        data  
        from  
        file 
        : 
        / 
        home 
        / 
        users 
        / 
        czt 
        / 
        testdata_student 
       
 
        Copying  
        file 
        : 
          
        file 
        : 
        / 
        home 
        / 
        users 
        / 
        czt 
        / 
        testdata_student 
       
 
        Loading  
        data  
        to 
          
        table  
        test 
        . 
        student 
       
 
        OK 
       
 
        Time  
        taken 
        : 
          
        1.76 
          
        seconds 
       
 
 

其中student.sql内容如下：

 
         1 
       
         2 
       
         3 
       
         4 
       
         5 
       
         6 
       
         7 
       
         8 
       
         9 
       
         10 
       
         11 
       
         12 
       
         13 
       
         14 
       
         15 
       
         16 
       
         17 
       
         18 
       
        use 
          
        test 
        ; 
          
        -- 
        -学生信息表 
       
        create  
        table  
        IF 
          
        NOT 
          
        EXISTS  
        student 
        ( 
       
        sno  
        bigint  
        comment 
          
        '学号' 
          
        , 
          
        sname  
        string 
          
        comment 
          
        '姓名' 
          
        , 
          
        sage  
        bigint  
        comment 
          
        '年龄' 
          
        , 
       
        pdate  
        string 
          
        comment 
          
        '入学日期' 
       
        ) 
       
        COMMENT 
          
        '学生信息表' 
       
        ROW  
        FORMAT  
        DELIMITED 
       
        FIELDS  
        TERMINATED  
        BY 
          
        '\t' 
       
        LINES  
        TERMINATED  
        BY 
          
        '\n' 
       
        STORED  
        AS 
          
        TEXTFILE 
        ; 
       
        LOAD  
        DATA  
        LOCAL  
        INPATH 
          
        '/home/users/czt/testdata_student' 
       
        INTO  
        TABLE  
        student 
        ;

testdata_student测试数据文件内容如下：

方法1：shell中设置变量，hive -e中直接使用

测试的shell文件名：

 
         1 
       
         2 
       
         3 
       
         4 
       
         5 
       
        #!/bin/bash 
       
        tablename 
        = 
        "student" 
       
        limitcount 
        = 
        "8" 
       
        hive 
          
        - 
        S 
          
        - 
        e 
          
        "use test; select * from ${tablename} limit ${limitcount};"

运行结果：

 
         1 
       
         2 
       
         3 
       
         4 
       
         5 
       
         6 
       
         7 
       
         8 
       
         9 
       
         10 
       
         11 
       
         12 
       
        [ 
        czt 
        @ 
        www 
        . 
        crazyant 
        . 
        net  
        testHivePara 
        ] 
        $ 
          
        sh 
          
        - 
        x 
          
        shellhive 
        . 
        sh 
          
        + 
          
        tablename 
        = 
        student 
       
        + 
          
        limitcount 
        = 
        8 
       
        + 
          
        hive 
          
        - 
        S 
          
        - 
        e 
          
        'use test; select * from student limit 8;' 
       
        1 
                
        name1 
             
        21 
               
        20130901 
       
        2 
                
        name2 
             
        22 
               
        20130901 
       
        3 
                
        name3 
             
        23 
               
        20130901 
       
        4 
                
        name4 
             
        24 
               
        20130901 
       
        5 
                
        name5 
             
        25 
               
        20130902 
       
        6 
                
        name6 
             
        26 
               
        20130902 
       
        7 
                
        name7 
             
        27 
               
        20130902 
       
        8 
                
        name8 
             
        28 
               
        20130902

由于hive自身是类SQL语言，缺乏shell的灵活性和对过程的控制能力，所以采用shell+hive的开发模式非常常见，在shell中直接定义变量，在hive -e语句中就可以直接引用；

注意：使用-hiveconf定义，在hive -e中是不能使用的

修改一下刚才的shell文件，采用-hiveconf的方法定义日期参数：

 
         1 
       
         2 
       
         3 
       
         4 
       
         5 
       
         6 
       
         7 
       
         8 
       
         9 
       
         10 
       
         11 
       
         12 
       
         13 
       
         14 
       
         15 
       
        #!/bin/bash 
       
        tablename 
        = 
        "student" 
       
        limitcount 
        = 
        "8" 
       
        hive 
          
        - 
        S 
          
        \ 
       
        - 
        hiveconf  
        enter_school_date 
        = 
        "20130902" 
          
        \ 
       
        - 
        hiveconf  
        min_age 
        = 
        "26" 
          
        \ 
       
        - 
        e 
          
        \ 
       
        "    use test; \ 
       
                select * from ${tablename} \ 
       
                where \  
       
                    pdate='${hiveconf:enter_school_date}' \  
       
                    and \  
       
                    sage>'${hiveconf:min_age}' \  
       
                limit ${limitcount};"

运行会失败，因为该脚本在shell环境中运行的，于是shell试图去解析${hiveconf:enter_school_date}和${hiveconf:min_age}变量，但是这两个SHELL变量并没有定义，所以会以空字符串放在这个位置。

运行时该SQL语句会被解析成下面这个样子：

 
  
         1 
       
 
        + 
          
        hive 
          
        - 
        S 
          
        - 
        hiveconf  
        enter_school_date 
        = 
        20130902 
          
        - 
        hiveconf  
        min_age 
        = 
        26 
          
        - 
        e 
          
        'use test; explain select * from student where pdate=' 
        \' 
        '' 
        \' 
        ' and sage>' 
        \' 
        '' 
        \' 
        ' limit 8;' 
       
 
 

方法2：使用-hiveconf定义，在SQL文件中使用

因为换行什么的很不方便，hive -e只适合写少量的SQL代码，所以一般都会写很多hql文件，然后使用hive –f的方法来调用，这时候可以通过-hiveconf定义一些变量，然后在SQL中直接使用。

先编写调用的SHELL文件：

 
         1 
       
         2 
       
         3 
       
        #!/bin/bash 
       
        hive 
          
        - 
        hiveconf  
        enter_school_date 
        = 
        "20130902" 
          
        - 
        hiveconf  
        min_ag 
        = 
        "26" 
          
        - 
        f 
          
        testvar 
        . 
        sql

被调用的testvar.sql文件内容：

 
         1 
       
         2 
       
         3 
       
         4 
       
         5 
       
         6 
       
         7 
       
         8 
       
        use 
          
        test 
        ; 
          
        select * 
          
        from  
        student 
       
        where  
       
        pdate 
        = 
        '${hiveconf:enter_school_date}' 
          
        and 
       
        sage 
          
        > 
          
        '${hiveconf:min_ag}' 
       
        limit 
          
        8 
        ;

执行过程：

 
  
         1 
       

         2 
       

         3 
       

         4 
       

         5 
       

         6 
       

         7 
       

         8 
       

         9 
       

         10 
       

         11 
       

         12 
       

         13 
       

         14 
       

         15 
       

         16 
       

         17 
       

         18 
       
 
        [ 
        czt 
        @ 
        www 
        . 
        crazyant 
        . 
        net  
        testHivePara 
        ] 
        $ 
          
        sh 
          
        - 
        x 
          
        shellhive 
        . 
        sh 
          
       
 
        + 
          
        hive 
          
        - 
        hiveconf  
        enter_school_date 
        = 
        20130902 
          
        - 
        hiveconf  
        min_ag 
        = 
        26 
          
        - 
        f 
          
        testvar 
        . 
        sql 
       
 
        Hive  
        history  
        file 
        = 
        / 
        tmp 
        / 
        czt 
        / 
        hive_job_log_czt_201309131651_2035045625 
        . 
        txt 
       
 
        OK 
       
 
        Time  
        taken 
        : 
          
        2.143 
          
        seconds 
       
 
        Total  
        MapReduce  
        jobs 
          
        = 
          
        1 
       
 
        Launching  
        Job 
          
        1 
          
        out  
        of 
          
        1 
       
 
        Number  
        of  
        reduce  
        tasks  
        is 
          
        set  
        to 
          
        0 
          
        since  
        there' 
        s 
          
        no  
        reduce  
        operator 
       
 
        Kill  
        Command 
          
        = 
          
        hadoop  
        job 
          
        - 
        kill  
        job_20130911213659 
        _42303 
       
 
        2013 
        - 
        09 
        - 
        13 
          
        16 
        : 
        52 
        : 
        00 
        , 
        300 
          
        Stage 
        - 
        1 
          
        map 
          
        = 
          
        0 
        % 
        , 
           
        reduce 
          
        = 
          
        0 
        % 
       
 
        2013 
        - 
        09 
        - 
        13 
          
        16 
        : 
        52 
        : 
        14 
        , 
        609 
          
        Stage 
        - 
        1 
          
        map 
          
        = 
          
        28 
        % 
        , 
           
        reduce 
          
        = 
          
        0 
        % 
       
 
        2013 
        - 
        09 
        - 
        13 
          
        16 
        : 
        52 
        : 
        24 
        , 
        642 
          
        Stage 
        - 
        1 
          
        map 
          
        = 
          
        71 
        % 
        , 
           
        reduce 
          
        = 
          
        0 
        % 
       
 
        2013 
        - 
        09 
        - 
        13 
          
        16 
        : 
        52 
        : 
        34 
        , 
        639 
          
        Stage 
        - 
        1 
          
        map 
          
        = 
          
        98 
        % 
        , 
           
        reduce 
          
        = 
          
        0 
        % 
       
 
        Ended  
        Job 
          
        = 
          
        job_20130911213659_42303 
       
 
        OK 
       
 
        7 
                
        name7 
            
        27 
               
        20130902 
       
 
        8 
                
        name8 
            
        28 
               
        20130902 
       
 
        Time  
        taken 
        : 
          
        54.268 
          
        seconds 
       
 
 

总结

本文主要阐述了两种在hive中使用变量的方法，第一种是在shell中定义变量然后在hive -e的SQL语句中直接用${var_name}的方法调用；第二种是使用hive –hiveconf key=value –f run.sql模式使用-hiveconf来设置变量，然后在SQL文件中使用${hiveconf:varname}的方法调用。用这两种方法可以满足开发的时候向hive传递参数的需求，会很好的提升开发效率和代码质量。