java页面按钮怎么调用spark,如何使用解析/窗口功能Spark中的Java?

I'm trying to use analytical/window function last_value in Spark Java.

Netezza Query:

select sno, name, addr1, addr2, run_dt,

last_value(addr1 ignore nulls) over (partition by sno, name, addr1, addr2, run_dt order by beg_ts , end_ts rows between unbounded preceding and unbounded following ) as last_addr1

from daily

We want to implement this query n Spark Java (Without using HiveSQLContext):

import org.apache.spark.SparkConf;

import org.apache.spark.api.java.JavaRDD;

import org.apache.spark.api.java.JavaSparkContext;

import org.apache.spark.sql.expressions.Window;

import org.apache.spark.sql.expressions.WindowSpec;

import org.apache.spark.api.java.function.Function;

import org.apache.spark.sql.DataFrame;

import org.apache.spark.sql.SQLContext;

import org.apache.spark.sql.execution.WindowFunctionFrame;

SparkConf conf = new SparkConf().setMaster("local").setAppName("Agg");

JavaSparkContext sc = new JavaSparkContext(conf);

SQLContext sqlContext = new SQLContext(sc);

JavaRDD daily = sc.textFile("C:\\Testing.txt").map(

new Function() {

private static final long serialVersionUID = 1L;

public Stgdailydtl call(String line) throws Exception {

String[] parts = line.split(",");

Stgdailydtl daily = new Stgdailydtl();

daily.setSno(Integer.parseInt(parts[0].trim()));

.....

return daily;

}

});

DataFrame schemaDailydtl = sqlContext.createDataFrame(daily, Stgdailydtl.class);

schemaDailydtl.registerTempTable("daily");

WindowSpec ws = Window.partitionBy("sno, name, addr1, addr2, run_dt").orderBy("beg_ts , end_ts").rowsBetween(0, 100000);

DataFrame df = sqlContext.sql("select sno, name, addr1, addr2, run_dt "

+ "row_number() over(partition by mach_id, msrmt_gbl_id, msrmt_dsc, elmt_dsc, end_cptr_dt order by beg_cptr_ts, end_cptr_ts) from daily ");

}

}

Error:

Exception in thread "main" java.lang.RuntimeException: [1.110] failure: ``union'' expected but `(' found

select stg.mach_id, stg.msrmt_gbl_id, stg.msrmt_dsc, stg.elmt_dsc, stg.elmt_dsc_grp_concat, row_number() over(partition by mach_id, msrmt_gbl_id, msrmt_dsc, elmt_dsc, end_cptr_dt order by beg_cptr_ts, end_cptr_ts) from stgdailydtl stg

^

at scala.sys.package$.error(package.scala:27)

I could not understand how to use WindowSpec/Window object. Please suggest on this.

Thanks for your help

解决方案

You're mixing dataframe syntax and sql syntax - specifically you created a WindowSpec but then didn't use it.

Import org.apache.spark.sql.functions to get the row_number function, then create the column that you're trying to select:

Column rowNum = functions.row_number().over(ws)

Then select it using the dataframe API:

df.select(each, column, you, want, rowNum)

My syntax may be slightly off, I'm used to scala or python, but the gist is something like that.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值