pythonlower和title_pyspark-为什么isupper(),islower(),istitle()在udf中不起作用?...

我试图创建udf来检查名称字符串是全部大写还是小写。为什么它没有产生我所期望的?例如:

def check_case(name):

if name.isupper() : check="yes"

else : check="no"

return check

my_udf = udf(lambda x: check_case(name), StringType())

df.withColumn("casecheck",my_udf(col("firstName"))).select("firstName","casecheck").show()

输出低于此值显然是错误的。我尝试使用islower(),istitle(),也产生了错误的结果。(它将为所有记录返回全是或全否)。知道为什么它不能在udf中工作吗?

谢谢!

+---------+---------+

|firstName|casecheck|

+---------+---------+

| GRETCHEN| no|

| IFswkG| no|

| April| no|

我也尝试过这个:

def check_case(name):

if name.isupper() : check="yes"

else : check="no"

return check

my_udf = udf(check_case, StringType())

df.withColumn("casecheck",my_udf("firstName")).select("firstName","casecheck").show()

现在我得到错误:

Py4JJavaError: An error occurred while calling o1046.showString.

: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 385.0 failed 4 times, most recent failure: Lost task 0.3 in stage 385.0 (TID 9580, ip-10-22-10-102.ec2.internal, executor 32): org.apache.spark.api.python.PythonException: Traceback (most recent call last):

File "/mnt/yarn/usercache/zeppelin/appcache/application_1598626762284_0001/container_1598626762284_0001_01_000061/pyspark.zip/pyspark/worker.py", line 377, in main

process()

File "/mnt/yarn/usercache/zeppelin/appcache/application_1598626762284_0001/container_1598626762284_0001_01_000061/pyspark.zip/pyspark/worker.py", line 372, in process

serializer.dump_stream(func(split_index, iterator), outfile)

File "/mnt/yarn/usercache/zeppelin/appcache/application_1598626762284_0001/container_1598626762284_0001_01_000061/pyspark.zip/pyspark/worker.py", line 248, in

func = lambda _, it: map(mapper, it)

File "", line 1, in

File "/mnt/yarn/usercache/zeppelin/appcache/application_1598626762284_0001/container_1598626762284_0001_01_000061/pyspark.zip/pyspark/worker.py", line 85, in

return lambda *a: f(*a)

File "/mnt/yarn/usercache/zeppelin/appcache/application_1598626762284_0001/container_1598626762284_0001_01_000061/pyspark.zip/pyspark/util.py", line 113, in wrapper

return f(*args, **kwargs)

File "", line 5, in check_case

AttributeError: 'NoneType' object has no attribute 'isupper'

更多编辑:

def check_case(name):

if name != None and name.isupper() : check="yes"

elif name!= None and name.islower() : check="no"

else : check= None

return check

my_udf = udf(check_case, StringType())

df.withColumn("casecheck",my_udf("firstName")).select("firstName","casecheck").show()

输出是

+---------+---------+

|firstName|casecheck|

+---------+---------+

| GRETCHEN| yes|

| GRETCHEN| yes|

| GRETCHEN| yes|

| Christos| null|

| IFswkG| null|

| April| null|

| MATTHEW| yes|

| riUj| null|

| HARRY| yes|

解决方案

首先,您传递的name不是xlambda函数,您只需在udf中指定该函数,就不需要lambda了。

my_udf = udf(check_case, StringType())

在您的函数中,您需要处理None和isupper islower条件,如

def check_case(name):

if name!= None and (name.isupper() or name.islower()):

check = "yes"

else :

check= "no"

return check

另外,通过创建这样的列,您可以拥有一个更简单有效的解决方案(udf可能会更昂贵)

df.withColumn("casecheck",

when((col("firstName") != None)

& (col("firstname").isupper() | col("firstname").islower()), "yes")

.otherwise("no"))

.select("firstName","casecheck").show()

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值