--指数化处理
热度*(π-1.8),然后四舍五入后分段 分段规则:【<=50(1/段),>50&<=100(10/段),>100&<=1000(30/段),>1000&<=5000(100/段),>5000(1000/段)】向上取段
结果在50以下的 ,每1为一个段
结果在50-100之间的,每10为一个段,
结果在100和1000之间的,每30为一个段
结果在1000和5000之间,每100为一个段
结果在5000以上,每1000为一个段
所有分段指数向上取
若搜索量为0,默认处理结果是1.34,四舍五入后为1,向上取段后为1
若搜索量为1,热度*(π-1.8)的处理结果是1.34,四舍五入后为1,向上取段后为1
若搜索量为40,热度*(π-1.8)的处理结果是53.6,四舍五入后为54,向上取段为60
依次类推
import org.apache.hadoop.hive.ql.exec.UDF;
/**
* TODO Comment of UDFIndexation Function: UDFIndexation() Sample :
* UDFIndexation(pv) return the indexation of the pv num
*
* @author pengxuan.lipx
*/
public class UDFIndexation extends UDF {
public long evaluate(long arguments, String varValues, String grade, String index) {
long indexResult = 1L;
long indexResult1 = 0L;
String[] varValuesStr = varValues.replace(" ", "").split(",");
double paiValue = Double.parseDouble(varValuesStr[0]);
double indexValue = Double.parseDouble(varValuesStr[1]);
String[] gradeStr = grade.replace(" ", "").split(",");
long[] gradeDle = new long[gradeStr.length];
for (int i = 0; i < gradeStr.length; i++) {
gradeDle[i] = Long.parseLong(gradeStr[i]);
}
String[] indexStr = index.replace(" ", "").split(",");
int[] indexDle = new int[indexStr.length];
for (int i = 0; i < indexStr.length; i++) {
indexDle[i] = Integer.parseInt(indexStr[i]);
}
indexResult1 = (long) Math.round(arguments * (paiValue - indexValue));
System.out.println(arguments * (paiValue - indexValue));
for (int i = 0; i < gradeDle.length - 1; i++) {
if (gradeDle[i] < indexResult1 && indexResult1 <= gradeDle[i + 1]) {
indexResult = UDFIndexation.roundedUp(indexResult1, indexDle[i]);
} else if (gradeDle[i + 1] < indexResult1) {
indexResult = UDFIndexation.roundedUp(indexResult1, indexDle[i + 1]);
}
}
return indexResult;
}
public static long roundedUp(long roundedArg, int median) {
if (roundedArg % median == 0) {
return (long) Math.floor(roundedArg / median) * median;
}
return (long) Math.floor((roundedArg + median) / median) * median;
}
/**
* @param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
UDFIndexation udfin = new UDFIndexation();
//System.out.println(udfin.evaluate(new String("3590")));
//System.out.println(udfin.evaluate(new Object()));
//System.out.println(udfin.evaluate(500));
//System.out.println(udfin.evaluate(""));
//System.out.println(udfin.evaluate(new LongWritable(0)));
long argtest = 1000;
String varValues = "3.14, 1.8";
String grade = "0, 50, 100, 1000, 5000";
String index = "1, 10, 30, 100, 1000";
System.out.println(udfin.evaluate(argtest, varValues, grade, index));
}
}
1、UDF函数可以直接应用于select语句,对查询结构做格式化处理后,再输出内容。
2、编写UDF函数的时候需要注意一下几点:
a)自定义UDF需要继承org.apache.hadoop.hive.ql.UDF。
b)需要实现evaluate方法。
c)evaluate函数支持重载。
3.写udf类
4、应用
a)把程序打包放到目标机器上去;
b)进入hive客户端,添加jar包:hive>addjar /run/jar/udf_test.jar;
c)创建临时函数:
add jar /home/dwapp/pengxuan.lipx/hive_scripts/udfindex.jar;
add jar /dhwdata/hadoop/hadoop-0.19.2-core.jar;
add jar /dhwdata/hive/lib/hive-exec.jar;
CREATE TEMPORARY FUNCTION indexation AS 'com.alibaba.hive.udf.lpxuan.UDFIndexation';
select indexation(500,'3.14, 1.8','0, 50, 100, 1000, 5000','1, 10, 30, 100, 1000') from dual;