比如字段如下,需求在hive中可以获取value
sex=1&hight=180&weight=100&sal=2000000
sex=0&hight=176&weight=101&sal=1000000
sex=1&hight=170&weight=102&sal=2000000
sex=0&hight=169&weight=103&sal=3000000
sex=0&hight=188&weight=104&sal=7000000
sex=1&hight=187&weight=105&sal=9000000
方法1 ,自定义hive函数
基本思路打包成json,利用json工具解析
import org.apache.hadoop.hive.ql.exec.UDF;
import org.codehaus.jettison.json.JSONException;
import org.codehaus.jettison.json.JSONObject;
public class Key2Value extends UDF {
public String evaluate(String str, String key) throws JSONException {
if (str.isEmpty() || key.isEmpty()) {
return null;
}
//转为json格式,借用json工具解析
String s1 = str.replace("&", ",");
String s2 = s1.replace("=", ":");
String s3 = "{" + s2 + "}";
JSONObject jsonObject = new JSONObject(s3);
return jsonObject.getString(key);
}
public static void main(String[] args) throws JSONException {
Key2Value key2Value = new Key2Value();
String str = "sex=1&hight=187&weight=105&sal=9000000";
String sal = key2Value.evaluate(str, "sal");
System.out.println(sal);//9000000
}
}
利用maven打包成jar上传到服务器,之后注册,创建临时名称,调用
hive> add jar /mysoft/test.jar;
Added [/mysoft/test.jar] to class path
Added resources: [/mysoft/test.jar]
hive> create temporary function key2Value as 'Key2Value';
OK
Time taken: 0.019 seconds
hive> select key2Value("sex=1&hight=180&weight=100&sal=2000000","sal");
2000000
方法2:借用parse_url
函数
select parse_url(concat("http://127.0.0.1/:8000/?","sex=1&hight=187&weight=105&sal=9000000"),"QUERY","sal")
结果
9000000
注意事项
hive> create temporary function key2Value as 'Key2Value';
FAILED: Class Key2Value does not implement UDF, GenericUDF, or UDAF
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
删除jar包后重启hive,再次进行注册
总结:
- 依据key一区value,可以构建自定义函数或调用
parse_url
方法,不是url可以先构建url - 相比之下,第二种方法好一些,因为维护自定义函数也是比较麻烦的!