spark java dataframe,Apache Spark:如何使用Java在dataFrame中具有空值的列中插入数据

I have to insert values available in DataFrame1 into one of the column with empty values with DataFrame2. Basically updating column in DataFrame2.

Both DataFrames have 2 common columns.

Is there a way to do same using Java? Or there can be different approach?

Sample Input :

1) File1.csv

BILL_ID,BILL_NBR_TYPE_CD,BILL_NBR,VERSION,PRIM_SW

0501841898,BIN ,404154,1000,Y

0681220958,BIN ,735332,1000,Y

5992410180,BIN ,454680,1000,Y

6995270884,SREBIN ,1000252750295575,1000,Y

Here BILL_ID is system id and BILL_NBR is external id.

2) File2.csv

TXN_ID,TXN_TYPE,BILL_ID,BILL_NBR_TYPE_CD,BILL_NBR

01234, ABC ," ",BIN ,404154

22365, XYZ ," ",BIN ,735332

45890, LKJ ," ",BIN ,454680

23456, MPK ," ",SREBIN ,1000252750295575

Sample Output

As shown below BILL_ID value should be populated in File2.csv

01234, ABC ,501841898,BIN ,404154

22365, XYZ ,681220958,BIN ,735332

45890, LKJ ,5992410180,BIN ,454680

23456, MPK ,6995270884,SREBIN ,1000252750295575

I have created two DataFrames and loaded both file's data into it, now I am not sure how to proceed.

EDIT

Basically I want clarity on below three steps:

how to get BILL_NBR and BILL_NBR_TYPE_CD values from File2.csv?

For this step I have written : file2Df.select("BILL_NBR_TYPE_CD","BILL_NBR");

How to get BILL_ID values from File1.csv based on the values retrieved in step1 ?

How to update BILL_ID values accordingly in File2.csv ?

I am new to spark and I would appreciate if someone can give pointers.

解决方案

You need to join two tables based on BILL_NBR column.

Assumption: There is one to one relation between BILL_NBR and BILL_ID columns.

Assuming that your dataframe names for File1.csv and File2.csv are file1DF and file2DF respectively, following should work for you:

Dataset file1DF = file1DF.select("BILL_ID","BILL_NBR","BILL_NBR_TYPE_CD");

Dataset file2DF = file2DF.select("TXN_ID","TXN_TYPE","BILL_NBR_TYPE_CD","BILL_NBR");

Dataset file2DF = file2DF.join(file1DF, file1DF("BILL_NBR","BILL_NBR_TYPE_CD"));

Note: I haven't got resources to test above code by running it. Please let me know if you face any compile time or run time error.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值