:告诉spark-shell hadoop配置文件路径
#用YARN_CONF_DIR或HADOOP_CONF_DIR指定YARN或者Hadoop配置文件存放目录
set HADOOP_HOME=D:\Big-File\Architecture\hadoop\hadoop-2.3.0
set HADOOP_CONF_DIR=D:\Big-File\Architecture\hadoop\hadoop-2.3.0\etc\hadoop
:启动spark-shell时,指定需要加载的类库
bin\spark-shell --jars E:\DM\XXXXXXX-1.0.0.jar
:指定driver内存
bin\spark-shell --driver-memory 512m --verbose
:spark ui地址
http://192.168.1.5:4040/jobs/
:通过spark-submit运行某个应用
bin\spark-submit --master local[4] --class com.test.mllib.XXXXXX E:\DM\XXXXX-1.0.0.jar 2 3 7 10 1300 1307
:java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------
https://issues.apache.org/jira/browse/SPARK-10528
解决方案(适合spark1.6.0,适合spark1.5.2):
1. Open Command Prompt in Admin Mode
2.创建目录d:/tmp/hive
2. winutils.exe chmod 777 /tmp/hive
3. Open Spark-Shell --master local[2]
:日志配置conf/log4j.properties
log4j.rootCategory=INFO, console,FILE
log4j.appender.FILE=org.apache.log4j.DailyRollingFileAppender
log4j.appender.FILE.Threshold=DEBUG
log4j.appender.FILE.file=E:/DM/Spark/spark-1.6.0-bin-hadoop2.6/spark.log
log4j.appender.FILE.DatePattern='.'yyyy-MM-dd
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.ConversionPattern=[%-5p] [%d{yyyy-MM-dd HH:mm:ss}] [%C{1}:%M:%L] %m%n
:java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;JZ)V
使用spark1.6-hadoop2.6访问hadoopp2.6报错,改为spark1.6-hadoop2.3
:"main"java.lang.UnsatisfiedLinkError:org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
直接修改hadoop源码,返回true
:测试访问hadoop
val textFile = sc.textFile("hdfs://localhost:19000/README.txt")
textFile.count
:测试提交任务到Yarn上执行
export HADOOP_CONF_DIR=D:\Big-File\Architecture\hadoop\hadoop-2.3.0\etc\hadoop
bin\spark-submit --class com.test.mllib.test.WorkCountApp --master yarn --deploy-mode client --executor-memory 256M --num-executors 1 E:\DM\code\projects\ch11-testit\target\ch11-testit-1.0.0.jar hdfs://localhost:19000/README.txt
bin\spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --executor-memory 128M --num-executors 1 E:\DM\Spark\spark-1.6.0-bin-hadoop2.3\lib\spark-examples-1.6.0-hadoop2.3.0.jar 10
:spark assembly jar缓存,避免每次重新提交
http://blog.csdn.net/amber_amber/article/details/42081045
: Exit status: 1. Diagnostics: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
http://zy19982004.iteye.com/blog/2031172