pythonlower和title_pyspark-为什么isupper（），islower（），istitle（）在udf中不起作用？...

最新推荐文章于 2023-01-09 13:35:44 发布

琦心

最新推荐文章于 2023-01-09 13:35:44 发布

阅读量193

点赞数

文章标签： pythonlower和title

本文链接：https://blog.csdn.net/weixin_30015835/article/details/111909901

版权

我试图创建udf来检查名称字符串是全部大写还是小写。为什么它没有产生我所期望的？例如：

def check_case(name):

if name.isupper() : check="yes"

else : check="no"

return check

my_udf = udf(lambda x: check_case(name), StringType())

df.withColumn("casecheck",my_udf(col("firstName"))).select("firstName","casecheck").show()

输出低于此值显然是错误的。我尝试使用islower()，istitle()，也产生了错误的结果。(它将为所有记录返回全是或全否)。知道为什么它不能在udf中工作吗？

谢谢！

+---------+---------+

|firstName|casecheck|

+---------+---------+

| GRETCHEN| no|

| IFswkG| no|

| April| no|

我也尝试过这个：

def check_case(name):

if name.isupper() : check="yes"

else : check="no"

return check

my_udf = udf(check_case, StringType())

df.withColumn("casecheck",my_udf("firstName")).select("firstName","casecheck").show()

现在我得到错误：

Py4JJavaError: An error occurred while calling o1046.showString.

: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 385.0 failed 4 times, most recent failure: Lost task 0.3 in stage 385.0 (TID 9580, ip-10-22-10-102.ec2.internal, executor 32): org.apache.spark.api.python.PythonException: Traceback (most recent call last):

File "/mnt/yarn/usercache/zeppelin/appcache/application_1598626762284_0001/container_1598626762284_0001_01_000061/pyspark.zip/pyspark/worker.py", line 377, in main

process()

File "/mnt/yarn/usercache/zeppelin/appcache/application_1598626762284_0001/container_1598626762284_0001_01_000061/pyspark.zip/pyspark/worker.py", line 372, in process

serializer.dump_stream(func(split_index, iterator), outfile)

File "/mnt/yarn/usercache/zeppelin/appcache/application_1598626762284_0001/container_1598626762284_0001_01_000061/pyspark.zip/pyspark/worker.py", line 248, in

func = lambda _, it: map(mapper, it)

File "", line 1, in

File "/mnt/yarn/usercache/zeppelin/appcache/application_1598626762284_0001/container_1598626762284_0001_01_000061/pyspark.zip/pyspark/worker.py", line 85, in

return lambda *a: f(*a)

File "/mnt/yarn/usercache/zeppelin/appcache/application_1598626762284_0001/container_1598626762284_0001_01_000061/pyspark.zip/pyspark/util.py", line 113, in wrapper

return f(*args, **kwargs)

File "", line 5, in check_case

AttributeError: 'NoneType' object has no attribute 'isupper'

更多编辑：

def check_case(name):

if name != None and name.isupper() : check="yes"

elif name!= None and name.islower() : check="no"

else : check= None

return check

my_udf = udf(check_case, StringType())

df.withColumn("casecheck",my_udf("firstName")).select("firstName","casecheck").show()

输出是

+---------+---------+

|firstName|casecheck|

+---------+---------+

| GRETCHEN| yes|

| Christos| null|

| IFswkG| null|

| April| null|

| MATTHEW| yes|

| riUj| null|

| HARRY| yes|

解决方案

首先，您传递的name不是xlambda函数，您只需在udf中指定该函数，就不需要lambda了。

my_udf = udf(check_case, StringType())

在您的函数中，您需要处理None和isupper islower条件，如

def check_case(name):

if name!= None and (name.isupper() or name.islower()):

check = "yes"

else :

check= "no"

return check

另外，通过创建这样的列，您可以拥有一个更简单有效的解决方案(udf可能会更昂贵)

df.withColumn("casecheck",

when((col("firstName") != None)

& (col("firstname").isupper() | col("firstname").islower()), "yes")

.otherwise("no"))

.select("firstName","casecheck").show()

琦心

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
pythonlower和title_pyspark-为什么isupper（），islower（），istitle（）在udf中不起作用？...

我试图创建udf来检查名称字符串是全部大写还是小写。为什么它没有产生我所期望的？例如：def check_case(name):if name.isupper() : check="yes"else : check="no"return checkmy_udf = udf(lambda x: check_case(name), StringType())df.withColumn("caseche...
复制链接

扫一扫