案例一
在hive或spark计算中需要注意两数之和或之差如果其中有一个为null,那么结果等于null,所以提供以下udf
自定义函数求两数之和
def diy_add(a, b):
r = None
if a is None:
if b is None:
return r
else:
return b
else:
if b is None:
return a
else:
return a + b
spark.udf.register("diy_add", lambda x, y: diy_add(x, y))
# 两个参数均为数值类型
spark.sql("select diy_add(1, 4)")
5
# 两个参数一个数值类型,一个null
spark.sql("select diy_add(1, 4/0)")
1
spark.sql("select diy_add(1/0, 4)")
4
# 两个参数均为null
spark.sql("select diy_add(1/0, 4/0)")
null
示例二
#!/usr/bin/python3.6
# -*- coding: utf-8 -*-
from pyspark.sql import functions as F
from pyspark.sql import types as T
from pyspark import SparkConf
from pyspark.sql import SparkSession
conf = SparkConf()
conf.set("spark.app.name", "lb