spark note

SparkContext:

def createSparkContext(): SparkContext = {
val master = this.master match {
case Some(m) => m
case None => {
val prop = System.getenv("MASTER")
if (prop != null) prop else "local"
}
}
sparkContext = new SparkContext(master, "Spark shell")
}

For a client to establish a connection to the Spark cluster, the SparkContext object
needs some basic information as follows:
master: The master URL can be in one of the following formats:
local[n]: for a local mode
spark://[sparkip]: to point to a Spark cluster
mesos://: for a mesos path if you are running a mesos cluster
application name: This is the human-readable application name
sparkHome: This is the path to Spark on the master/workers machines
jars: This gives the path to the list of JAR files required for your job

Scala
In a Scala program, you can create a SparkContext instance using the following code:
val spar kContext = new SparkContext(master_path, "application
name", ["optional spark home path"],["optional list of jars"])
While you can hardcode all of these values, it's better to read them from the
environment with reasonable defaults. This approach provides maximum flexibility
to run the code in a changing environment without having to recompile the code.
Using local as the default value for the master machine makes it easy to launch
your application locally in a test environment. By carefully selecting the defaults,
you can avoid having to over-specify them. An example would be as follows:
import spark.sparkContext
import spark.sparkContext._
import scala.util.Properties
val master = Properties.envOrElse("MASTER","local")
val sparkHome = Properties.get("SPARK_HOME")
val myJars = Seq(System.get("JARS")
val sparkContext = new SparkContext(master, "my app", sparkHome,myJars)


The collect() function is especially useful for testing, in much the same way as the
parallelize() function is. The collect() function only works if your data fits
in memory on a single host; in that case it adds the bottleneck of everything having
to come back to a single machine.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值