pyspark 条件,pyspark加入多个条件

I want ask if you have any idea about how i can specify lot of conditions in

pyspark when i use .join()

Example :

with hive :

query= "select a.NUMCNT,b.NUMCNT as RNUMCNT ,a.POLE,b.POLE as RPOLE,a.ACTIVITE,b.ACTIVITE as RACTIVITE FROM rapexp201412 b \

join rapexp201412 a where (a.NUMCNT=b.NUMCNT and a.ACTIVITE = b.ACTIVITE and a.POLE =b.POLE )\

But in pyspark I don't know how to make it because the following:

df_rapexp201412.join(df_aeveh,df_rapexp2014.ACTIVITE==df_rapexp2014.ACTIVITE and df_rapexp2014.POLE==df_aeveh.POLE,'inner')

does not work!!

解决方案

Quoting from spark docs:

join(other, on=None, how=None) Joins with another DataFrame, using the

given join expression.

The following performs a full outer join between df1 and df2.

Parameters: other – Right side of the join on – a string for join

column name, a list of column names, , a join expression (Column) or a

list of Columns. If on is a string or a list of string indicating the

name of the join column(s), the column(s) must exist on both sides,

and this performs an inner equi-join. how – str, default ‘inner’. One

of inner, outer, left_outer, right_outer, semijoin.

>>> df.join(df2, df.name == df2.name, 'outer').select(df.name, df2.height).collect()

[Row(name=None, height=80), Row(name=u'Alice', height=None), Row(name=u'Bob', height=85)]

>>> cond = [df.name == df3.name, df.age == df3.age]

>>> df.join(df3, cond, 'outer').select(df.name, df3.age).collect()

[Row(name=u'Bob', age=5), Row(name=u'Alice', age=2)]

So you need to use the "condition as a list" option like in the last example.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值