java hive udf,Hive UDF文本到数组

I'm trying to create some UDF for Hive which is giving me some more functionality than the already provided split() function.

import org.apache.hadoop.hive.ql.exec.UDF;

import org.apache.hadoop.io.Text;

public class LowerCase extends UDF {

public Text evaluate(final Text text) {

return new Text(stemWord(text.toString()));

}

/**

* Stems words to normal form.

*

* @param word

* @return Stemmed word.

*/

private String stemWord(String word) {

word = word.toLowerCase();

// Remove special characters

// Porter stemmer

// ...

return word;

}

}

This is working in Hive. I export this class into a jar file. Then I load it into Hive with

add jar /path/to/myJar.jar;

and create a function using

create temporary function lower_case as 'LowerCase';

I've got a table with a String field in it. The statement is then:

select lower_case(text) from documents;

But now I want to create a function returning an array (as e.g. split does).

import java.util.ArrayList;

import java.util.List;

import java.util.StringTokenizer;

import org.apache.hadoop.hive.ql.exec.UDF;

import org.apache.hadoop.io.Text;

public class WordSplit extends UDF {

public Text[] evaluate(final Text text) {

List splitList = new ArrayList<>();

StringTokenizer tokenizer = new StringTokenizer(text.toString());

while (tokenizer.hasMoreElements()) {

Text word = new Text(stemWord((String) tokenizer.nextElement()));

splitList.add(word);

}

return splitList.toArray(new Text[splitList.size()]);

}

/**

* Stems words to normal form.

*

* @param word

* @return Stemmed word.

*/

private String stemWord(String word) {

word = word.toLowerCase();

// Remove special characters

// Porter stemmer

// ...

return word;

}

}

Unfortunately this function does not work if I do the exact same loading procedure mentioned above. I'm getting the following error:

FAILED: SemanticException java.lang.IllegalArgumentException: Error: name expected at the position 7 of 'struct<>' but '>' is found.

As I haven't found any documentation mentioning this kind of transformation, I'm hoping that you will have some advice for me!

解决方案

I don't think 'UDF' interface will provide what you want. You want to use GenericUDF. I would use the source of the split UDF as a guide.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值