08-UDFs

User-Defined Functions

  1. Define a function

  2. Create and apply UDF

  3. Register UDF to use in SQL

  4. Use Decorator Syntax (Python Only)

  5. Use Vectorized UDF (Python Only)

Methods
  • UDF Registration (spark.udf): register

  • Built-In Functions : udf

  • Python UDF Decorator : @udf

  • Pandas UDF Decorator : @pandas_udf

Define a function

Define a function in local Python/Scala to get the first letter of a string from the email field.

def firstLetterFunction(email):
  return email[0]

该函数在spark.DataFrame中是无法使用的。

from pyspark.sql.functions import col
display(salesDF.select(firstLetterFunction(col("email"))))

在这里插入图片描述

通过udf函数将该函数定义为udf函数后就可以使用了

from pyspark.sql.functions import udf
firstLetterUDF = udf(firstLetterFunction)
display(salesDF.select(firstLetterUDF(col("email"))))

在这里插入图片描述

Register UDF to use in SQL

Register UDF using spark.udf.register to create UDF in the SQL namespace.

salesDF.createOrReplaceTempView("sales")

spark.udf.register("sql_udf", firstLetterFunction)
SELECT email,sql_udf(email) AS firstLetter FROM sales

在这里插入图片描述

Use Decorator Syntax (Python Only)

Alternatively, define UDF using decorator syntax in Python with the datatype the function returns.

# Our input/output is a string
@udf("string")
def decoratorUDF(email: str) -> str:
  return email[0]
from pyspark.sql.functions import col
salesDF = spark.read.parquet("/mnt/dbswarehouse/raw/sales.parquet")
display(salesDF.select(decoratorUDF(col("email"))))

在这里插入图片描述

Use Vectorized UDF (Python Only)

import pandas as pd
from pyspark.sql.functions import pandas_udf

# We have a string input/output
@pandas_udf("string")
def vectorizedUDF(email: pd.Series) -> pd.Series:
  return email.str[0]

# Alternatively
vectorizedUDF = pandas_udf(lambda s: s.str[0], "string")
display(salesDF.select(vectorizedUDF(col("email"))))

在这里插入图片描述

We can also register these Vectorized UDFs to the SQL namespace.

spark.udf.register("sql_vectorized_udf", vectorizedUDF)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值