Hive UDF自定义函数-----------报错解析

一、报如下错误

-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":"bike"},"value":{"_col0":3,"_col1":10.23}}
        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265) 

 1.在hive shell 把自定义的python脚本加载到hive中

add file hdfs:///user/hive/lib/udf1.py; 

2.using  'udf1.py' 检查这里是否正确

select transform(id,vtype,price) using 'udf1.py' as (vtype string,mean float,var float) from (select * from test cluster by vtype) as temp_table; 

二、报下面的错误

Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: [Error 20003]: An error occurred when trying to close the Operator running your custom script.
        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) 

 1.报上面的错误,基本都是脚本自身有问题,调试下脚本。

举个例子:

#!/usr/bin/python
# _*_ coding:utf-8 _*_
import sys
import logging
import numpy as np
import pandas as pd
sep='\t'
 


def read_input(input_data):
    for line in input_data:
        line = line.strip()
        if line == "":
            continue
        yield line.split(sep)
def main():
    data=read_input(sys.stdin)
    for vtype,group in groupby(data,itemgetter(1)):
        group=[(int(rowid),vtype,float(price)) for rowid,vtype,price in gr  
oup]     #如果不把price做类型转换,在 df['price'].mean()就会有报错的问题,就会抛出上面的异常。
        df=pd.DataFrame(group,columns=('id','vtype','price'))
        output=[vtype,df['price'].sum(),df['price'].mean()]
        #print len(group)
        print (sep.join(str(o) for o in output ))

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值