idea intellij 用sbt管理包
编译:sbt package
执行:
./bin/spark-submit --class "com.dongjia.chapter.helloword.MyHelloWorld" \
--master spark://localhost:7077 \
/Users/zhongling/code/scalahelloword/target/scala-2.10/scalahelloword_2.10-0.1.jar 1
spark 对应的scala 版本要注意通过spark-shell 来检查:
jvm 中用的scala 和 编译用的scala的版本不一致,就没有办法运行了。 花了2个多小时发现的坑。
命令如下:
$ ./bin/spark-shell
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.2
/_/
Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_151)
Type in expressions to have them evaluated.
Type :help for more information.
17/12/26 23:36:06 WARN SparkConf:
SPARK_WORKER_INSTANCES was detected (set to '1').
This is deprecated in Spark 1.0+.
其中:spark standalone 服务配置(spark-1.6.2-bin-hadoop2.6/conf):
cat spark-env.sh
SPARK_MASTER_IP=localhost
SPARK_MASTER_PORT=7077
SPARK_WORKER_PORT=8086
SPARK_MASTER_WEBUI_PORT=8085
SPARK_LOCAL_DIRS=/Users/zhongling/data/spark/data
#SPARK_MASTER_OPTS=
SPARK_LOCAL_IP=localhost
#SPARK_WORKER_CORES=
SPARK_WORKER_MEMORY=512m
SPARK_WORKER_PORT=8087
SPARK_WORKER_WEBUI_PORT=8088
SPARK_WORKER_INSTANCES=1
SPARK_WORKER_DIR=/Users/zhongling/data/spark/worker
#SPARK_WORKER_OPTS=
SPARK_DAEMON_MEMORY=512m
#SPARK_HISTORY_OPTS=
#SPARK_SHUFFLE_OPTS=
#SPARK_DAEMON_JAVA_OPTS=
#SPARK_PUBLIC_DNS=
cat spark-defaults.conf
# spark.master spark://master:7077
spark.eventLog.enabled true
# spark.eventLog.dir hdfs://namenode:8021/directory
# spark.serializer org.apache.spark.serializer.KryoSerializer
# spark.driver.memory 5g
# spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"