Scala 安装
下载 Scala-2.11.8.tgz,
添加以下内容
export SCALA_HOME=/root/workspace/software/scala-2.10.4
export PATH=$PATH:$SCALA_HOME/bin1
source 使之生效
source /etc/profile
[root@racnode2 bin]# scala
Welcome to Scala 2.11.8 (JavaHotSpot(TM) 64-Bit Server VM, Java 1.8.0_144).
Type in expressions for evaluation. Ortry :help.
scala> var a = "helloworld!"
a: String = hello world!
scala> :q
[root@racnode2 bin]#
下载 Spark-1.6.3-bin-hadoop2.6.tgz,下载地址Downloads | Apache Spark。
修改环境变量文件 /etc/profile, 添加以下内容。
export SPARK_HOME=/usr/spark-1.6.3-bin-hadoop2.6
export PATH=$PATH:XXX其他软件的环境变量:$SPARK_HOME/bin# 在最后添加:$SPARK_HOME/bin
source 使之生效
source /etc/profile
进入 Spark 安装目录下的 /usr/spark1.6.3_hadoop2.6/conf 目录, 拷贝 spark-env.sh.template 到 spark-env.sh。
cp spark-env.sh.template spark-env.sh
编辑 spark-env.sh,在其中添加以下配置信息:
export SCALA_HOME=/usr/scala-2.11.8
export JAVA_HOME=/usr/jdk1.8.0_144
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_MASTER_IP=192.168.52.128
export SPARK_LOCAL_DIRS=/usr/spark-1.6.3-bin-hadoop2.6
export SPARK_WORKER_MEMORY=4g
JAVA_HOME 指定 Java 安装目录;
SCALA_HOME 指定 scala 安装目录;
SPARK_MASTER_IP 指定 Spark 集群 Master 节点的 IP 地址;
SPARK_WORKER_MEMORY 指定的是 Worker 节点能够分配给 Executors 的最大内存大小;
HADOOP_CONF_DIR 指定 Hadoop 集群配置文件目录。
SPARK_WORKER_MEMORY 我这边的机器内存32g,我设置内存为20g,更加自己的情况修改。
Tips:这里的JDK要部署1.7版本以上的,否则会报如下错误:
[root@racnode2 sbin]# ./start-master.sh
starting org.apache.spark.deploy.master.Master, logging to/usr/spark-1.6.3-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.master.Master-1-racnode2.hadoop.out
failed to launch org.apache.spark.deploy.master.Master:
atjava.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class:org.apache.spark.launcher.Main. Programwill exit.
full log in/usr/spark-1.6.3-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.master.Master-1-racnode2.hadoop.out
这个错误的原因就是当前用户在尝试使用低版本的jdk运行或者编译高版本的Java程序,报无法找到class的error,遇到该错误,下载7或者8的jdk,然后重新部署,并更新环境变量信息即可,这里的环境变量涉及到两个地方的配置:/etc/profile和/usr/spark1.6.3_hadoop2.6/conf/ spark-env.sh,
[root@racnode2conf]# start-all.sh
[root@racnode2conf]# jps
2457 NameNode
2971 NodeManager
2877ResourceManager
2549 DataNode
3681 Jps
2689SecondaryNameNode
[root@racnode2conf]#
[root@racnode2 sbin]# ./start-all.sh
This script is Deprecated. Instead usestart-dfs.sh and start-yarn.sh
Starting namenodes on [racnode2.hadoop]
racnode2.hadoop: starting namenode,logging to /usr/local/hadoop/logs/hadoop-root-namenode-racnode2.hadoop.out
racnode2.hadoop: starting datanode,logging to /usr/local/hadoop/logs/hadoop-root-datanode-racnode2.hadoop.out
racnode1.hadoop: ssh: connect to hostracnode1.hadoop port 22: No route to host
Starting secondary namenodes [0.0.0.0]
0.0.0.0: reverse mapping checkinggetaddrinfo for 5ulkyfcsae9bnha [127.0.0.1] failed - POSSIBLE BREAK-IN ATTEMPT!
0.0.0.0: starting secondarynamenode,logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-racnode2.hadoop.out
starting yarn daemons
starting resourcemanager, logging to/usr/local/hadoop/logs/yarn-root-resourcemanager-racnode2.hadoop.out
racnode2.hadoop: starting nodemanager,logging to /usr/local/hadoop/logs/yarn-root-nodemanager-racnode2.hadoop.out
racnode1.hadoop: ssh: connect to hostracnode1.hadoop port 22: No route to host
[root@racnode2 sbin]# jps
4662 NameNode
4934 SecondaryNameNode
5673 Worker
6255 Jps
5170 NodeManager
4753 DataNode
5609 Master
5077 ResourceManager
[root@racnode2 sbin]#
可以看到多了一个新进程 Master和Worker 进程。
访问:http://masterIP:8080,如下图:
运行 spark-shell,可以进入 Spark 的 shell 控制台,如下:
[root@racnode2 bin]# ./spark-shell
17/09/07 19:40:51 INFOspark.SecurityManager: Changing view acls to: root
17/09/07 19:40:51 INFOspark.SecurityManager: Changing modify acls to: root
17/09/07 19:40:51 INFO spark.SecurityManager:SecurityManager: authentication disabled; ui acls disabled; users with viewpermissions: Set(root); users with modify permissions: Set(root)
17/09/07 19:40:51 INFO spark.HttpServer:Starting HTTP Server
17/09/07 19:40:52 INFO server.Server:jetty-8.y.z-SNAPSHOT
17/09/07 19:40:52 INFOserver.AbstractConnector: Started SocketConnector@0.0.0.0:57628
17/09/07 19:40:52 INFO util.Utils:Successfully started service 'HTTP class server' on port 57628.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version1.6.3
/_/
Using Scala version 2.10.5 (JavaHotSpot(TM) 64-Bit Server VM, Java 1.8.0_144)
Type in expressions to have themevaluated.
Type :help for more information.
17/09/07 19:40:59 INFOspark.SparkContext: Running Spark version 1.6.3
17/09/07 19:40:59 INFOspark.SecurityManager: Changing view acls to: root
17/09/07 19:40:59 INFOspark.SecurityManager: Changing modify acls to: root
17/09/07 19:40:59 INFOspark.SecurityManager: SecurityManager: authentication disabled; ui aclsdisabled; users with view permissions: Set(root); users with modifypermissions: Set(root)
17/09/07 19:41:00 INFO util.Utils:Successfully started service 'sparkDriver' on port 55184.
17/09/07 19:41:00 INFOslf4j.Slf4jLogger: Slf4jLogger started
17/09/07 19:41:01 INFO Remoting:Starting remoting
17/09/07 19:41:01 INFO Remoting:Remoting started; listening on addresses:[akka.tcp://sparkDriverActorSystem@192.168.52.128:55089]
17/09/07 19:41:01 INFO util.Utils:Successfully started service 'sparkDriverActorSystem' on port 55089.
17/09/07 19:41:01 INFO spark.SparkEnv:Registering MapOutputTracker
17/09/07 19:41:01 INFO spark.SparkEnv:Registering BlockManagerMaster
17/09/07 19:41:01 INFO storage.DiskBlockManager:Created local directory at/usr/spark-1.6.3-bin-hadoop2.6/blockmgr-b970925a-643c-4f8a-810d-df461eef8db5
17/09/07 19:41:01 INFOstorage.MemoryStore: MemoryStore started with capacity 517.4 MB
17/09/07 19:41:01 INFO spark.SparkEnv: RegisteringOutputCommitCoordinator
17/09/07 19:41:02 INFO server.Server:jetty-8.y.z-SNAPSHOT
17/09/07 19:41:02 INFOserver.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
17/09/07 19:41:02 INFO util.Utils:Successfully started service 'SparkUI' on port 4040.
17/09/07 19:41:02 INFO ui.SparkUI:Started SparkUI at http://192.168.52.128:4040
17/09/07 19:41:02 INFOexecutor.Executor: Starting executor ID driver on host localhost
17/09/07 19:41:02 INFOexecutor.Executor: Using REPL class URI: http://192.168.52.128:57628
17/09/07 19:41:02 INFO util.Utils:Successfully started service'org.apache.spark.network.netty.NettyBlockTransferService' on port 51665.
17/09/07 19:41:02 INFO netty.NettyBlockTransferService:Server created on 51665
17/09/07 19:41:02 INFOstorage.BlockManagerMaster: Trying to register BlockManager
17/09/07 19:41:02 INFOstorage.BlockManagerMasterEndpoint: Registering block manager localhost:51665with 517.4 MB RAM, BlockManagerId(driver, localhost, 51665)
17/09/07 19:41:02 INFOstorage.BlockManagerMaster: Registered BlockManager
17/09/07 19:41:03 INFO repl.SparkILoop:Created spark context..
Spark context available as sc.
17/09/07 19:41:05 INFO hive.HiveContext:Initializing execution hive, version 1.2.1
17/09/07 19:41:05 INFOclient.ClientWrapper: Inspected Hadoop version: 2.6.0
17/09/07 19:41:05 INFOclient.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims forHadoop version 2.6.0
17/09/07 19:41:06 INFOmetastore.HiveMetaStore: 0: Opening raw store with implemenationclass:org.apache.hadoop.hive.metastore.ObjectStore
17/09/07 19:41:06 INFOmetastore.ObjectStore: ObjectStore, initialize called
17/09/07 19:41:06 INFODataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown- will be ignored
17/09/07 19:41:06 INFODataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will beignored
17/09/07 19:41:07 WARNDataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or oneof dependencies)
17/09/07 19:41:07 WARNDataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or oneof dependencies)
17/09/07 19:41:10 INFOmetastore.ObjectStore: Setting MetaStore object pin classes withhive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
17/09/07 19:41:11 INFODataNucleus.Datastore: The class"org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as"embedded-only" so does not have its own datastore table.
17/09/07 19:41:11 INFODataNucleus.Datastore: The class"org.apache.hadoop.hive.metastore.model.MOrder" is tagged as"embedded-only" so does not have its own datastore table.
17/09/07 19:41:13 INFODataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema"is tagged as "embedded-only" so does not have its own datastoretable.
17/09/07 19:41:13 INFODataNucleus.Datastore: The class"org.apache.hadoop.hive.metastore.model.MOrder" is tagged as"embedded-only" so does not have its own datastore table.
17/09/07 19:41:13 INFOmetastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
17/09/07 19:41:13 INFOmetastore.ObjectStore: Initialized ObjectStore
17/09/07 19:41:14 WARNmetastore.ObjectStore: Version information not found in metastore.hive.metastore.schema.verification is not enabled so recording the schemaversion 1.2.0
17/09/07 19:41:14 WARNmetastore.ObjectStore: Failed to get database default, returningNoSuchObjectException
17/09/07 19:41:14 INFO metastore.HiveMetaStore:Added admin role in metastore
17/09/07 19:41:14 INFOmetastore.HiveMetaStore: Added public role in metastore
17/09/07 19:41:15 INFOmetastore.HiveMetaStore: No user is added in admin role, since config is empty
17/09/07 19:41:15 INFOmetastore.HiveMetaStore: 0: get_all_databases
17/09/07 19:41:15 INFOHiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_all_databases
17/09/07 19:41:15 INFOmetastore.HiveMetaStore: 0: get_functions: db=default pat=*
17/09/07 19:41:15 INFOHiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_functions: db=default pat=*
17/09/07 19:41:15 INFODataNucleus.Datastore: The class"org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as"embedded-only" so does not have its own datastore table.
17/09/07 19:41:17 INFOsession.SessionState: Created HDFS directory: /tmp/hive/root
17/09/07 19:41:17 INFOsession.SessionState: Created local directory: /tmp/root
17/09/07 19:41:17 INFOsession.SessionState: Created local directory:/tmp/e25eb236-702d-41b8-b390-45185702d59a_resources
17/09/07 19:41:17 INFOsession.SessionState: Created HDFS directory:/tmp/hive/root/e25eb236-702d-41b8-b390-45185702d59a
17/09/07 19:41:17 INFOsession.SessionState: Created local directory:/tmp/root/e25eb236-702d-41b8-b390-45185702d59a
17/09/07 19:41:17 INFOsession.SessionState: Created HDFS directory:/tmp/hive/root/e25eb236-702d-41b8-b390-45185702d59a/_tmp_space.db
17/09/07 19:41:18 INFO hive.HiveContext:default warehouse location is /user/hive/warehouse
17/09/07 19:41:18 INFO hive.HiveContext:Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
17/09/07 19:41:18 INFOclient.ClientWrapper: Inspected Hadoop version: 2.6.0
17/09/07 19:41:18 INFOclient.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims forHadoop version 2.6.0
17/09/07 19:41:19 INFOmetastore.HiveMetaStore: 0: Opening raw store with implemenationclass:org.apache.hadoop.hive.metastore.ObjectStore
17/09/07 19:41:19 INFOmetastore.ObjectStore: ObjectStore, initialize called
17/09/07 19:41:19 INFODataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown- will be ignored
17/09/07 19:41:19 INFODataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
17/09/07 19:41:19 WARNDataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or oneof dependencies)
17/09/07 19:41:20 WARNDataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or oneof dependencies)
17/09/07 19:41:21 INFOmetastore.ObjectStore: Setting MetaStore object pin classes withhive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
17/09/07 19:41:23 INFODataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema"is tagged as "embedded-only" so does not have its own datastoretable.
17/09/07 19:41:23 INFODataNucleus.Datastore: The class"org.apache.hadoop.hive.metastore.model.MOrder" is tagged as"embedded-only" so does not have its own datastore table.
17/09/07 19:41:24 INFODataNucleus.Datastore: The class"org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as"embedded-only" so does not have its own datastore table.
17/09/07 19:41:24 INFODataNucleus.Datastore: The class"org.apache.hadoop.hive.metastore.model.MOrder" is tagged as"embedded-only" so does not have its own datastore table.
17/09/07 19:41:24 INFOmetastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
17/09/07 19:41:24 INFO metastore.ObjectStore:Initialized ObjectStore
17/09/07 19:41:25 WARNmetastore.ObjectStore: Version information not found in metastore.hive.metastore.schema.verification is not enabled so recording the schemaversion 1.2.0
17/09/07 19:41:25 WARNmetastore.ObjectStore: Failed to get database default, returningNoSuchObjectException
17/09/07 19:41:25 INFOmetastore.HiveMetaStore: Added admin role in metastore
17/09/07 19:41:25 INFOmetastore.HiveMetaStore: Added public role in metastore
17/09/07 19:41:25 INFOmetastore.HiveMetaStore: No user is added in admin role, since config is empty
17/09/07 19:41:25 INFOmetastore.HiveMetaStore: 0: get_all_databases
17/09/07 19:41:25 INFOHiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_all_databases
17/09/07 19:41:25 INFOmetastore.HiveMetaStore: 0: get_functions: db=default pat=*
17/09/07 19:41:25 INFOHiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_functions: db=default pat=*
17/09/07 19:41:25 INFO DataNucleus.Datastore:The class "org.apache.hadoop.hive.metastore.model.MResourceUri" istagged as "embedded-only" so does not have its own datastore table.
17/09/07 19:41:26 INFOsession.SessionState: Created local directory:/tmp/51d85aba-c051-42a5-81eb-3e5b5f91ebb6_resources
17/09/07 19:41:26 INFOsession.SessionState: Created HDFS directory:/tmp/hive/root/51d85aba-c051-42a5-81eb-3e5b5f91ebb6
17/09/07 19:41:26 INFOsession.SessionState: Created local directory:/tmp/root/51d85aba-c051-42a5-81eb-3e5b5f91ebb6
17/09/07 19:41:26 INFOsession.SessionState: Created HDFS directory:/tmp/hive/root/51d85aba-c051-42a5-81eb-3e5b5f91ebb6/_tmp_space.db
17/09/07 19:41:26 INFO repl.SparkILoop:Created sql context (with Hive support)..
SQL context available as sqlContext.
scala> :q
Stopping spark context.
17/09/07 19:56:47 INFOhandler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static/sql,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stoppedo.s.j.s.ServletContextHandler{/SQL/execution/json,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stoppedo.s.j.s.ServletContextHandler{/SQL/execution,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/SQL/json,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/SQL,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stoppedo.s.j.s.ServletContextHandler{/metrics/json,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stoppedo.s.j.s.ServletContextHandler{/stages/stage/kill,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stoppedo.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stoppedo.s.j.s.ServletContextHandler{/executors/threadDump,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stoppedo.s.j.s.ServletContextHandler{/executors/json,null}
17/09/07 19:56:47 INFO handler.ContextHandler:stopped o.s.j.s.ServletContextHandler{/executors,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stoppedo.s.j.s.ServletContextHandler{/environment/json,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stoppedo.s.j.s.ServletContextHandler{/storage/rdd/json,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stoppedo.s.j.s.ServletContextHandler{/storage/rdd,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stoppedo.s.j.s.ServletContextHandler{/storage/json,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stoppedo.s.j.s.ServletContextHandler{/stages/pool,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stoppedo.s.j.s.ServletContextHandler{/stages/stage/json,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stoppedo.s.j.s.ServletContextHandler{/stages/stage,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stoppedo.s.j.s.ServletContextHandler{/stages/json,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stoppedo.s.j.s.ServletContextHandler{/jobs/job/json,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
17/09/07 19:56:47 INFOhandler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
17/09/07 19:56:47 INFO ui.SparkUI:Stopped Spark web UI at http://192.168.52.128:4040
17/09/07 19:56:47 INFOspark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/09/07 19:56:48 INFOstorage.MemoryStore: MemoryStore cleared
17/09/07 19:56:48 INFOstorage.BlockManager: BlockManager stopped
17/09/07 19:56:48 INFOstorage.BlockManagerMaster: BlockManagerMaster stopped
17/09/07 19:56:48 INFOscheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:OutputCommitCoordinator stopped!
17/09/07 19:56:48 INFOspark.SparkContext: Successfully stopped SparkContext
17/09/07 19:56:48 INFOremote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
17/09/07 19:56:48 INFOremote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down;proceeding with flushing remote transports.
17/09/07 19:56:48 INFOremote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
17/09/07 19:56:48 INFOutil.ShutdownHookManager: Shutdown hook called
17/09/07 19:56:48 INFOutil.ShutdownHookManager: Deleting directory/tmp/spark-f13fa31e-cff8-4504-a43f-5a33562917dd
17/09/07 19:56:48 INFOutil.ShutdownHookManager: Deleting directory/usr/spark-1.6.3-bin-hadoop2.6/spark-468e95b7-945c-4ecf-ad95-3fff574ea2f5
17/09/07 19:56:48 INFOutil.ShutdownHookManager: Deleting directory/tmp/spark-f910557c-1c90-4511-a11e-991499b28bc7
scala> var a = "helloworld!"
a: String = hello world!
scala> :q
至此,整个 Spark 分布式集群的搭建就到这里结束。