python中right_PySpark SQL中的LEFT和RIGHT函数

我是PySpark的新人。我用pandas拉了一个csv文件。

并使用registerEmptable函数创建了一个临时表。from pyspark.sql import SQLContext

from pyspark.sql import Row

import pandas as pd

sqlc = SQLContext(sc)

aa1 = pd.read_csv("D:\mck1.csv")

aa2 = sqlc.createDataFrame(aa1)

aa2.show()

+--------+-------+----------+------------+---------+------------+-------------------+

| City| id|First_Name|Phone_Number|new_date|new code| New_date|

+--------+-------+----------+------------+---------+------------+-------------------+

|KOLKATTA|9000007| AAA| 1111119411| 20080714| 13|2016-08-16 00:00:00|

|KOLKATTA|9000007| BBB| 1111119421| 20080714| 13|2016-08-06 00:00:00|

|KOLKATTA|9000007| CCC| 1111119461| 20080714| 13|2016-08-13 00:00:00|

|KOLKATTA|9000007| DDD| 1111119471| 20080714| 13|2016-08-27 00:00:00|

|KOLKATTA|9000007| EEE| 1111119491| 20080714| 13|2016-08-15 00:00:00|

|KOLKATTA|9111147| FFF| 1111119401| 20080714| 13|2016-08-24 00:00:00|

|KOLKATTA|9585458| FORMULA| 1111110112| 19990930| 13|2016-08-16 00:00:00|

|KOLKATTA|9569878| APPLEII| 1111110132| 19990930| 13|2016-08-06 00:00:00|

aa3 = aa2.registerTempTable("mytable1")

sqlc.sql(""" select right(phone_number,4) from mytable1 """).show()

现在我试着用右边的电话号码(电话号码,4)拉最后四个字符,并面对下面的错误---------------------------------------------------------------------------

Py4JJavaError Traceback (most recent call last)

in ()

----> 1 sqlc.sql(""" select right(Phone_number,4) from mytable1 """).show()

C:\spark-1.4.1-bin-hadoop2.6\python\pyspark\sql\context.pyc in sql(self, sqlQuery)

500 [Row(f1=1, f2=u'row1'), Row(f1=2, f2=u'row2'), Row(f1=3, f2=u'row3')]

501 """

--> 502 return DataFrame(self._ssql_ctx.sql(sqlQuery), self)

503

504 @since(1.0)

C:\spark-1.4.1-bin-hadoop2.6\python\lib\py4j-0.8.2.1-src.zip\py4j\java_gateway.py in __call__(self, *args)

536 answer = self.gateway_client.send_command(command)

537 return_value = get_return_value(answer, self.gateway_client,

--> 538 self.target_id, self.name)

539

540 for temp_arg in temp_args:

C:\spark-1.4.1-bin-hadoop2.6\python\lib\py4j-0.8.2.1-src.zip\py4j\protocol.py in get_return_value(answer, gateway_client, target_id, name)

298 raise Py4JJavaError(

299 'An error occurred while calling {0}{1}{2}.\n'.

--> 300 format(target_id, '.', name), value)

301 else:

302 raise Py4JError(

Py4JJavaError: An error occurred while calling o55.sql.

: java.lang.RuntimeException: [1.9] failure: ``union'' expected but `right' found

select right(Phone_number,4) from mytable1

^

at scala.sys.package$.error(package.scala:27)

at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:36)

at org.apache.spark.sql.catalyst.DefaultParserDialect.parse(ParserDialect.scala:67)

at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:145)

为什么pyspark不支持左右函数?

我怎样才能在四个字符的右边写一个列呢?

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值