spark执行python脚本_如何使用spark-submit运行Scala脚本(类似于Python脚本)?

本文介绍了尝试使用Spark提交执行Scala脚本时遇到的问题,以及解决方法。当试图用类似Python脚本的方式运行Scala代码时,遇到了'Cannot load main class from JAR file'的错误。解决方案包括在spark-shell中使用`:load`命令加载Scala脚本,这种方法适用于PoC或测试,但不推荐用于生产环境。
摘要由CSDN通过智能技术生成

I try to execute a simple Scala script using Spark as described in the Spark Quick Start Tutorial. I have not troubles to execute the following Python code:

"""SimpleApp.py"""

from pyspark import SparkContext

logFile = "tmp.txt" # Should be some file on your system

sc = SparkContext("local", "Simple App")

logData = sc.textFile(logFile).cache()

numAs = logData.filter(lambda s: 'a' in s).count()

numBs = logData.filter(lambda s: 'b' in s).count()

print "Lines with a: %i, lines with b: %i" % (numAs, numBs)

I execute this code using the following command:

/home/aaa/spark/spark-2.1.0-bin-hadoop2.7/bin/spark-submit hello_world.py

However, if I try to do the same using Scala, I have technical problems. In more detail, the code that I try to execute is:

* SimpleApp.scala */

import org.apache.spark.SparkContext

import org.apache.spark.SparkContext._

import org.apache.spark.SparkConf

object SimpleApp {

def main(args: Array[String]) {

val logFile = "tmp.txt" // Should be some file on your system

val conf = new SparkConf().setAppName("Simple Application")

val sc = new SparkContext(conf)

val logData = sc.textFile(logFile, 2).cache()

val numAs = logData.filter(line => line.contains("a")).count()

val numBs = logData.filter(line => line.contains("b")).count()

println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))

}

}

I try to execute it in the following way:

/home/aaa/spark/spark-2.1.0-bin-hadoop2.7/bin/spark-submit hello_world.scala

As the result I get the following error message:

Error: Cannot load main class from JAR file

Does anybody know what I am doing wrong?

解决方案

I want to add to @JacekLaskowski's an alternative solution I use sometimes for POC or tests purposes.

It would be to use the script.scala from inside the spark-shell with :load.

:load /path/to/script.scala

You won't need to define a SparkContext/SparkSession as the script will use the variables defined in the scope of the REPL.

You also don't need to wrap the code in a Scala object.

PS: I consider this more as a hack and not to use for production purposes.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值