Hive On spark 源码执行过程步骤
sql请求到达HiveStatement后调用execute(String sql)方法
HiveStatement.execute
--> runAsyncOnServer(String sql)
--> ThriftCLIService.ExecuteStatement
--> CLIService.executeStatementAsync
--> HiveSessionImpl.executeStatementAsync
--> executeStatementInternal
--> operation.run()
--> SqlOperation runInternal()
--> runQuery()
--> driver.run()
--> runInternal
--> ret = execute(true);
--> Dirver.execute
--> launchTask
--> tskRun.runSequential()
--> tsk.executeTask()
--> execute(driverContext)
--> SparkTask.execute(driverContext)
--> SparkUtilities.getSparkSession 创建sparksession SparkSessionManagerImpl.getSession setup 会创建rpc server
--> sparkSession.submit 提交任务,提交前会先创建RemoteHiveSparkClient,返回SparkJobRef 对象
--> jobRef.monitorJob(); 加监控
--> 最后会清理hdfs目录已经返还sparksession给sparkSessionManager
可以对照源码一步步点击去细看