Hive自定义函数

Hive自定义函数

 

--查看支持的函数

--show functions;

--查看具体一个函数使用

--desc function round

 

Hive自定义函数

 

需求:

tongjivotetools字段的工具数量

select ntools,count_tools_length(votetools) from tb_language_count;

自定义实现函数

编码要求

(1)继承于UDF类

(2)方法规定

-a、Implement one or more methods named evaluate

-b、evaluate should never be a void method.

 However it can return null if needed.

-c、方法的参数和返回值的类型:Java类型或者Hadoop类型。

推荐使用Hadoop类型。

 

public class CountToolsUDF extends UDF {
    public IntWritable evaluate(Text votetools){
        String value = votetools.toString();
        if (StringUtils.isBlank(value)){
            return new IntWritable(0);
        }
        int length = value.trim().split(";").length;
        return new IntWritable(length);
    }

    public static void main(String[] args) {
        Text text = new Text("Anaconda;KNIME;scikit-learn;Tableau;Turi (former Dato/GraphLab);Weka;Python;Scala;SQL language;Open Source Hadoop Tools;Spark;SQL on Hadoop tools;Keras;Tensorflow;Theano;PyTorch");
        System.out.println(new CountToolsUDF().evaluate(text));
    }
}

pom.xml配置文件: 

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.huadian.bigdata</groupId>
    <artifactId>hadoop</artifactId>
    <version>1.0-SNAPSHOT</version>

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>

        <groupId>com.huadian.bigdata</groupId>
        <artifactId>hadoop</artifactId>
        <version>1.0-SNAPSHOT</version>

        <repositories>
            <repository>
                <id>aliyun</id>
                <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
            </repository>
            <repository>
                <id>cloudera</id>
                <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
            </repository>
            <repository>
                <id>jboss</id>
                <url>http://repository.jboss.com/nexus/content/groups/public</url>
            </repository>
        </repositories>


        <properties>
            <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
            <maven.compiler.source>1.7</maven.compiler.source>
            <maven.compiler.target>1.7</maven.compiler.target>
            <hadoop.version>2.7.3</hadoop.version>
            <hive.version>1.2.1</hive.version>
        </properties>

        <dependencies>

            <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client -->
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-client</artifactId>
                <version>${hadoop.version}</version>
            </dependency>
            <!-- Hive Client -->
            <dependency>
                <groupId>org.apache.hive</groupId>
                <artifactId>hive-service</artifactId>
                <version>${hive.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hive</groupId>
                <artifactId>hive-exec</artifactId>
                <version>${hive.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hive</groupId>
                <artifactId>hive-jdbc</artifactId>
                <version>${hive.version}</version>
            </dependency>


        </dependencies>

        <build>
            <pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) -->
                <plugins>
                    <plugin>
                        <artifactId>maven-clean-plugin</artifactId>
                        <version>3.0.0</version>
                    </plugin>
                    <!-- see http://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging -->
                    <plugin>
                        <artifactId>maven-resources-plugin</artifactId>
                        <version>3.0.2</version>
                    </plugin>
                    <plugin>
                        <artifactId>maven-compiler-plugin</artifactId>
                        <version>3.7.0</version>
                    </plugin>
                    <plugin>
                        <artifactId>maven-surefire-plugin</artifactId>
                        <version>2.20.1</version>
                    </plugin>
                    <plugin>
                        <artifactId>maven-jar-plugin</artifactId>
                        <version>3.0.2</version>
                    </plugin>
                    <plugin>
                        <artifactId>maven-install-plugin</artifactId>
                        <version>2.5.2</version>
                    </plugin>
                    <plugin>
                        <artifactId>maven-deploy-plugin</artifactId>
                        <version>2.8.2</version>
                    </plugin>
                </plugins>
            </pluginManagement>
        </build>

    </project>
    <dependencies>

        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
        <!-- Hive Client -->
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-service</artifactId>
            <version>${hive.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>${hive.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>${hive.version}</version>
        </dependency>


    </dependencies>

    <build>
        <pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) -->
            <plugins>
                <plugin>
                    <artifactId>maven-clean-plugin</artifactId>
                    <version>3.0.0</version>
                </plugin>
                <!-- see http://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging -->
                <plugin>
                    <artifactId>maven-resources-plugin</artifactId>
                    <version>3.0.2</version>
                </plugin>
                <plugin>
                    <artifactId>maven-compiler-plugin</artifactId>
                    <version>3.7.0</version>
                </plugin>
                <plugin>
                    <artifactId>maven-surefire-plugin</artifactId>
                    <version>2.20.1</version>
                </plugin>
                <plugin>
                    <artifactId>maven-jar-plugin</artifactId>
                    <version>3.0.2</version>
                </plugin>
                <plugin>
                    <artifactId>maven-install-plugin</artifactId>
                    <version>2.5.2</version>
                </plugin>
                <plugin>
                    <artifactId>maven-deploy-plugin</artifactId>
                    <version>2.8.2</version>
                </plugin>
            </plugins>
        </pluginManagement>
    </build>

</project>

然后打包成jar包,上传至hive。

将Jar包添加到ClassPath下面

add jar /opt/cdh5.7.6/hive-1.1.0-cdh5.7.6/hadoop-1.0-SNAPSHOT.jar

组成函数

CREATE FUNCTION db_hive. count_tools_length AS 'com.huadian.bigdata.mapreduce.CountToolsUDF'

单引号内为idea中写的函数的类名路径。

select ntools,db_hive.count_tools_length(votetools) from tb_language_count limit 10;

自定义函数的类型

UDF(User-defined function

一对一:

传递一个参数,然后一个值,substring

UDAT (Aggregate Functions)

多对一:

传递多个参数,返回一个参数 max

通常group by连用

UDTF(Table-Generating Functions

一对多:

传入1个值,返回多个值

 

分析窗口函数

需求:找出雇员(emp)表,各个部门工资前三个员工信息

来源:Oracle,db2,Sql server等数据库,这类型数据库被金融公司使用,做一些报表。

hive相关支出:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics

 

3个分析函数

找出雇员(emp)表,各个部门工资前三个员工信息

ROW_NUMBER()函数:

select
empno,ename,sal,deptno,
ROW_NUMBER() OVER(PARTITION by deptno ORDER BY sal DESC) as rnk
from
tb_emp


RANK():相同的值,排名一样,有2个第一名,就没有第二名了

select
   empno,ename,sal,deptno,
   RANK() OVER(PARTITION by deptno ORDER BY sal DESC) as rnk
from
   tb_emp

DENSE_RANK:相同的值,排名一样,不会影响后续排序,2个第一名,第二名还有

select
   empno,ename,sal,deptno,
   DENSE_RANK() OVER(PARTITION by deptno ORDER BY sal DESC) as rnk
from
   tb_emp

 

hive中设置日志存储目录

修改配置文件名称

mv hive-log4j.properties.template hive-log4j.properties

修改配置

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值