大数据分享Spark任务和集群启动流程,Spark集群启动流程
1.调用start-all.sh脚本,开始启动Master
2.Master启动以后,preStart方法调用了一个定时器,定时检查超时的Worker后删除
3.启动脚本会解析slaves配置文件,找到启动Worker的相应节点.开始启动Worker
4.Worker服务启动后开始调用preStart方法开始向所有的Master进行注册
5.Master接收到Worker发送过来的注册信息,Master开始保存注册信息并把自己的URL响应给Worker
6.Worker接收到Master的URL后并更新,开始调用一个定时器,定时的向Master发送心跳信息
任务提交流程
1.Driver端会通过spark-submit脚本启动SaparkSubmit进程,此时创建了一个非常重要的对象(SparkContext),开始向Master发送消息
2.Master接收到发送过来的信息后开始生成任务信息,并把任务信息放到一个对列里
3.Master把所有有效的Worker过滤出来,按照空闲的资源进行排序
4.Master开始向有效的Worker通知拿取任务信息并启动相应的Executor
5.Worker启动Executor并向Driver反向注册
6.Driver开始把生成的task发送给相应的Executor,Executor开始执行任务
集群启动流程
1.首先创建Master类
import akka.actor.{Actor, ActorSystem, Props}
import com.typesafe.config.{Config, ConfigFactory}
import scala.collection.mutable
import scala.concurrent.duration._
class Master(val masterHost: String, val masterPort: Int) extends Actor{
// 用来存储Worker的注册信息
val idToWorker = new mutable.HashMapString, WorkerInfo
// 用来存储Worker的信息
val workers = new mutable.HashSetWorkerInfo
// Worker的超时时间间隔
val checkInterval: Long = 15000
// 生命周期方法,在构造器之后,receive方法之前只调用一次
override def preStart(): Unit = {
// 启动一个定时器,用来定时检查超时的Worker
import context.dispatcher
context.system.scheduler.schedule(0 millis, checkInterval millis, self, CheckTimeOutWorker)
}
// 在preStart方法之后,不断的重复调用
override def receive: Receive = {
// Worker -> Master
case RegisterWorker(id, host, port, memory, cores) => {
if (!idToWorker.contains(id)){
val workerInfo = new WorkerInfo(id, host, port, memory, cores)
idToWorker += (id -> workerInfo)
workers += workerInfo
println(“a worker registered”)
sender ! RegisteredWorker(s"akka.tcp://
M
a
s
t
e
r
.
M
A
S
T
E
R
S
Y
S
T
E
M
"
+
s
"
@
{Master.MASTER_SYSTEM}" + s"@
Master.MASTERSYSTEM"+s"@{masterHost}:
m
a
s
t
e
r
P
o
r
t
/
u
s
e
r
/
{masterPort}/user/
masterPort/user/{Master.MASTER_ACTOR}")
}
}
case HeartBeat(workerId) => {
// 通过传过来的workerId获取对应的WorkerInfo
val workerInfo: WorkerInfo = idToWorker(workerId)
// 获取当前时间
val currentTime = System.currentTimeMillis()
// 更新最后一次心跳时间
workerInfo.lastHeartbeatTime = currentTime
}
case CheckTimeOutWorker => {
val currentTime = System.currentTimeMillis()
val toRemove: mutable.HashSet[WorkerInfo] =
workers.filter(w => currentTime - w.lastHeartbeatTime > checkInterval)
// 将超时的Worker从idToWorker和workers中移除
toRemove.foreach(deadWorker => {
idToWorker -= deadWorker.id
workers -= deadWorker
})
println(s"num of workers: KaTeX parse error: Expected 'EOF', got '}' at position 18: …orkers.size}") }̲ } } object Mas…host"
|akka.remote.netty.tcp.port = “KaTeX parse error: Expected 'EOF', got '}' at position 305: …tTermination() }̲ } 2.创建RemoteMs…{Master.MASTER_SYSTEM}” +
s"@
m
a
s
t
e
r
H
o
s
t
:
{masterHost}:
masterHost:{masterPort}/user/KaTeX parse error: Expected 'EOF', got '}' at position 86: …memory, cores) }̲ override def r…host"
|akka.remote.netty.tcp.port = “$port”
“”".stripMargin
// 配置创建Actor需要的配置信息
val config: Config = ConfigFactory.parseString(configStr)
// 创建ActorSystem
val actorSystem: ActorSystem = ActorSystem(WORKER_SYSTEM, config)
// 用actorSystem实例创建Actor
val worker: ActorRef = actorSystem.actorOf(
Props(new Worker(host, port, masterHost, masterPort, memory, cores)), WORKER_ACTOR)
actorSystem.awaitTermination()
}
}
4.创建初始化类
class WorkerInfo(val id: String, val host: String, val port: Int,
val memory: Int, val cores: Int) {
// 初始化最后一次心跳的时间
var lastHeartbeatTime: Long = _
}
5.本地测试需要传入参数:
大数据分享Spark任务和集群启动流程
最新推荐文章于 2024-01-25 10:39:41 发布