版本:Spark 1.5.2 built for Hadoop 2.4.0
今天spark的history server自己挂掉了,查看日志:
16/05/13 14:12:30 WARN DFSClient: Failed to connect to /192.168.2.77:50010 for block, add to deadNodes and continue. java.nio.channels.ClosedByInterruptException
java.nio.channels.ClosedByInterruptException16/05/13 14:12:30 WARN BlockReaderFactory: I/O error constructing remote block reader.
java.nio.channels.ClosedByInterruptException
16/05/13 14:12:30 WARN DFSClient: Failed to connect to /192.168.2.45:50010 for block, add to deadNodes and continue. java.nio.channels.ClosedByInterruptException
java.nio.channels.ClosedByInterruptException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
16/05/13 14:12:30 INFO DFSClient: Could not obtain BP-334845286-192.168.2.4-1418890858930:blk_1166565322_93633541 from any node: java.io.IOException: No live nodes contain current block No live nodes contain current block Block locations: 192.168.2.70:50010 192.168.2.77:50010 192.168.2.45:50010 Dead nodes: 192.168.2.45:50010 192.168.2.70:50010 192.168.2.77:50010. Will get new block locations from namenode and retry...
16/05/13 14:12:30 WARN DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 1999.7114150519237 msec.
16/05/13 14:12:30 WARN QueuedThreadPool: 1 threads could not be stopped
16/05/13 14:12:30 INFO ShutdownHookManager: Shutdown hook called
16/05/13 14:12:30 INFO DFSClient: Successfully connected to /192.168.2.70:50010 for BP-334845286-192.168.2.4-1418890858930:blk_1166565322_93633541
16/05/13 14:12:30 WARN ServletHandler:
javax.servlet.ServletException: java.util.concurrent.ExecutionException: java.io.IOException: Filesystem closed
是不是内存太小了,导致挂掉了,查看history java内存。
#/usr/jdk64/jdk1.7.0_67/bin/jstat -gcutil 93962 1000
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 0.00 100.00 99.97 99.68 1118 19.415 4701 3516.056 3535.471
0.00 0.00 100.00 99.97 99.68 1118 19.415 4702 3517.133 3536.548
主要看看O项都是99多
#top -p 93962
top - 16:33:17 up 277 days, 6:45, 1 user, load average: 34.29, 27.25, 19.73
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
Cpu(s): 96.9%us, 1.9%sy, 0.0%ni, 1.1%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 132103952k total, 123735940k used, 8368012k free, 364688k buffers
Swap: 8191992k total, 28440k used, 8163552k free, 61424260k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
93962 root 20 0 6771m 1.2g 22m S 712.4 1.0 925:02.61 java
# jps -v |grep 93962
235969 HistoryServer -Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=20 -Dspark.history.fs.logDirectory=hdfs://cluster:8020/spark-history -Xms1g -Xmx1g -XX:MaxPermSize=256m
修改spark-env.sh文件
增加 -Xms4096m -Xmx4096m
修改后内容(截选):
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Xms4096m -Xmx4096m -Dspark.history.retainedApplications=20 -Dspark.history.fs.logDirectory=hdfs://cluster:8020/spark-history"
重新启动
#/usr/local/spark/sbin/stop-history-server.sh
#/usr/local/spark/sbin/start-history-server.sh
#/usr/jdk64/jdk1.7.0_67/bin/jstat -gcutil 235969 1000
[root@dn12 conf]# /usr/jdk64/jdk1.7.0_67/bin/jstat -gcutil 235969 1000
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 6.25 2.01 99.07 98.55 279 2.039 0 0.000 2.039
12.50 0.00 10.06 99.07 98.55 280 2.043 0 0.000 2.043
#jps -v |grep 235969
235969 HistoryServer -Dspark.history.ui.port=18080 -Xms4g -Xmx4g -Dspark.history.retainedApplications=20 -Dspark.history.fs.logDirectory=hdfs://cluster:8020/spark-history -Xms1g -Xmx1g -XX:MaxPermSize=256m
#top -p 235969
top - 16:48:38 up 277 days, 7:00, 1 user, load average: 27.02, 31.59, 29.16
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
Cpu(s): 98.0%us, 1.9%sy, 0.0%ni, 0.1%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 132103952k total, 123061968k used, 9041984k free, 367248k buffers
Swap: 8191992k total, 28440k used, 8163552k free, 61682036k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
235969 root 20 0 5015m 478m 22m S 94.2 0.4 4:50.71 java
设置貌似没有生效,-Xms4g -Xmx4g 和 -Xms1g -Xmx1g 同时存在。
找官网看看设置选项:
Environment Variable Meaning
SPARK_DAEMON_MEMORY Memory to allocate to the history server (default: 1g).
SPARK_DAEMON_JAVA_OPTS JVM options for the history server (default: none).
SPARK_PUBLIC_DNS The public address for the history server. If this is not set, links to application history may use the internal address of the server, resulting in broken links (default: none).
SPARK_HISTORY_OPTS spark.history.* configuration options for the history server (default: none).
应该是设置SPARK_DAEMON_MEMORY
增加设置:
SPARK_DAEMON_MEMORY=2048m
修改后:
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=20 -Dspark.history.fs.logDirectory=hdfs://geocloudcluster:8020/spark-history"
export SPARK_DAEMON_MEMORY=4096m
SPARK_DAEMON_JAVA_OPTS=8128m
SPARK_WORKER_MEMORY=2048m
SPARK_REPL_OPTS=-XX:MaxPermSize=2048m
重复以上操作,目前看起来好了很多,还可以增加内存大小:
[root@dn12 conf]# /usr/jdk64/jdk1.7.0_67/bin/jstat -gcutil 104645 1000
S0 S1 E O P YGC YGCT FGC FGCT GCT
6.25 0.00 27.74 67.34 99.48 414 3.196 0 0.000 3.196
6.25 0.00 27.74 67.34 99.48 414 3.196 0 0.000 3.196
# jps -v |grep 104645
104645 HistoryServer -Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=20 -Dspark.history.fs.logDirectory=hdfs://cluster:8020/spark-history -Xms4096m -Xmx4096m -XX:MaxPermSize=256m
参数已经生效。