Java-Spark系列2-quick-start

大数据和数据仓库 同时被 2 个专栏收录
136 篇文章 5 订阅
24 篇文章 4 订阅

一.idea搭建maven工程

前面已经有博客介绍idea下搭建maven工程了,这里略过,主要介绍下pom.xml的配置。

image.png

<project>
  <groupId>edu.berkeley</groupId>
  <artifactId>simple-project</artifactId>
  <modelVersion>4.0.0</modelVersion>
  <name>Simple Project</name>
  <packaging>jar</packaging>
  <version>1.0</version>
  <dependencies>
    <dependency> <!-- Spark dependency -->
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.12</artifactId>
      <version>2.4.2</version>
      <scope>provided</scope>
    </dependency>
  </dependencies>
</project>

二.Maven编译Java应用程序

2.1 Java代码

代码:

package org.example;

import org.apache.spark.api.java.function.FilterFunction;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.Dataset;

public class SparkTest1 {
    public static void main(String[] args){
        String logFile = "file:///home/pyspark/idcard.txt"; // Should be some file on your system
        SparkSession spark = SparkSession.builder().appName("Simple Application").getOrCreate();
        Dataset<String> logData = spark.read().textFile(logFile).cache();

        long numAs = logData.filter((FilterFunction<String>) s -> s.contains("1")).count();
        long numBs = logData.filter((FilterFunction<String>) s -> s.contains("2")).count();

        System.out.println("Lines with 1: " + numAs + ", lines with 2: " + numBs);

        spark.stop();
    }
}

这个程序只计算在idcard.txt中包含’ 1 ‘和包含’ 2 '的行数。与之前使用Spark shell的示例(初始化自己的SparkSession)不同,我们将初始化SparkSession作为程序的一部分。

三.导出maven工程

通过mvn package命令可以直接导出maven工程

C:\Users\Administrator\IdeaProjects\SparkStudy>mvn package
[INFO] Scanning for projects...
[INFO]
......
Downloaded from nexus-aliyun: http://maven.aliyun.com/nexus/content/groups/public/org/apache/maven/maven-archiver/3.1.1/maven-archiver-3.1.1.jar (0 B at 0 B/s)
Downloaded from nexus-aliyun: http://maven.aliyun.com/nexus/content/groups/public/org/iq80/snappy/snappy/0.4/snappy-0.4.jar (0 B at 0 B/s)
[INFO] Building jar: C:\Users\Administrator\IdeaProjects\SparkStudy\target\SparkStudy-1.0-SNAPSHOT.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  16.025 s
[INFO] Finished at: 2021-08-09T15:04:10+08:00
[INFO] ------------------------------------------------------------------------

C:\Users\Administrator\IdeaProjects\SparkStudy>

将SparkStudy-1.0-SNAPSHOT.jar 文件上传到安装spark的服务器上。
image.png

四.运行jar程序

官网的例子class没有带package名,运行会提示找不到对应的class文件

命令:

spark-submit \
  --class org.example.SparkTest1 \
  --master local[2] \
  /home/javaspark/SparkStudy-1.0-SNAPSHOT.jar

测试记录:

[root@hp2 javaspark]# spark-submit \
>   --class org.example.SparkTest1 \
>   --master local[2] \
>   /home/javaspark/SparkStudy-1.0-SNAPSHOT.jar
21/08/09 15:55:30 INFO spark.SparkContext: Running Spark version 2.4.0-cdh6.3.1
21/08/09 15:55:30 INFO logging.DriverLogger: Added a local log appender at: /tmp/spark-13b50d5e-4efe-42ce-bede-90a470508b7f/__driver_logs__/driver.log
21/08/09 15:55:30 INFO spark.SparkContext: Submitted application: Simple Application
21/08/09 15:55:30 INFO spark.SecurityManager: Changing view acls to: root
21/08/09 15:55:30 INFO spark.SecurityManager: Changing modify acls to: root
21/08/09 15:55:30 INFO spark.SecurityManager: Changing view acls groups to: 
21/08/09 15:55:30 INFO spark.SecurityManager: Changing modify acls groups to: 
21/08/09 15:55:30 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
21/08/09 15:55:30 INFO util.Utils: Successfully started service 'sparkDriver' on port 36387.
21/08/09 15:55:30 INFO spark.SparkEnv: Registering MapOutputTracker
21/08/09 15:55:30 INFO spark.SparkEnv: Registering BlockManagerMaster
21/08/09 15:55:30 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/08/09 15:55:30 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/08/09 15:55:30 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-98e103a5-a565-4313-8c7c-d7edb67c6d39
21/08/09 15:55:30 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
21/08/09 15:55:30 INFO spark.SparkEnv: Registering OutputCommitCoordinator
21/08/09 15:55:30 INFO util.log: Logging initialized @1546ms
21/08/09 15:55:30 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: 2018-09-05T05:11:46+08:00, git hash: 3ce520221d0240229c862b122d2b06c12a625732
21/08/09 15:55:30 INFO server.Server: Started @1622ms
21/08/09 15:55:30 INFO server.AbstractConnector: Started ServerConnector@b93aad{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
21/08/09 15:55:30 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@173b9122{/jobs,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@649f2009{/jobs/json,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@14bb2297{/jobs/job,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1a15b789{/jobs/job/json,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@57f791c6{/stages,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@51650883{/stages/json,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6c4f9535{/stages/stage,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@499b2a5c{/stages/stage/json,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@596df867{/stages/pool,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@c1fca1e{/stages/pool/json,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@241a53ef{/storage,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@344344fa{/storage/json,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2db2cd5{/storage/rdd,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@70e659aa{/storage/rdd/json,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@615f972{/environment,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@285f09de{/environment/json,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@73393584{/executors,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@31500940{/executors/json,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1827a871{/executors/threadDump,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@48e64352{/executors/threadDump/json,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7249dadf{/static,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5be82d43{/,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@600b0b7{/api,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@473b3b7a{/jobs/job/kill,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1734f68{/stages/stage/kill,null,AVAILABLE,@Spark}
21/08/09 15:55:30 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://hp2:4040
21/08/09 15:55:30 INFO spark.SparkContext: Added JAR file:/home/javaspark/SparkStudy-1.0-SNAPSHOT.jar at spark://hp2:36387/jars/SparkStudy-1.0-SNAPSHOT.jar with timestamp 1628495730971
21/08/09 15:55:31 INFO executor.Executor: Starting executor ID driver on host localhost
21/08/09 15:55:31 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33128.
21/08/09 15:55:31 INFO netty.NettyBlockTransferService: Server created on hp2:33128
21/08/09 15:55:31 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/08/09 15:55:31 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, hp2, 33128, None)
21/08/09 15:55:31 INFO storage.BlockManagerMasterEndpoint: Registering block manager hp2:33128 with 366.3 MB RAM, BlockManagerId(driver, hp2, 33128, None)
21/08/09 15:55:31 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, hp2, 33128, None)
21/08/09 15:55:31 INFO storage.BlockManager: external shuffle service port = 7337
21/08/09 15:55:31 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, hp2, 33128, None)
21/08/09 15:55:31 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@245a060f{/metrics/json,null,AVAILABLE,@Spark}
21/08/09 15:55:32 INFO scheduler.EventLoggingListener: Logging events to hdfs://nameservice1/user/spark/applicationHistory/local-1628495731008
21/08/09 15:55:32 INFO spark.SparkContext: Registered listener com.cloudera.spark.lineage.NavigatorAppListener
21/08/09 15:55:32 INFO logging.DriverLogger$DfsAsyncWriter: Started driver log file sync to: /user/spark/driverLogs/local-1628495731008_driver.log
21/08/09 15:55:32 INFO internal.SharedState: loading hive config file: file:/etc/hive/conf.cloudera.hive/hive-site.xml
21/08/09 15:55:32 INFO internal.SharedState: spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('/user/hive/warehouse').
21/08/09 15:55:32 INFO internal.SharedState: Warehouse path is '/user/hive/warehouse'.
21/08/09 15:55:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@630b6190{/SQL,null,AVAILABLE,@Spark}
21/08/09 15:55:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@532e27ab{/SQL/json,null,AVAILABLE,@Spark}
21/08/09 15:55:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@459b6c53{/SQL/execution,null,AVAILABLE,@Spark}
21/08/09 15:55:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@39e69ea7{/SQL/execution/json,null,AVAILABLE,@Spark}
21/08/09 15:55:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4779aae6{/static/sql,null,AVAILABLE,@Spark}
21/08/09 15:55:32 INFO state.StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
21/08/09 15:55:34 INFO datasources.FileSourceStrategy: Pruning directories with: 
21/08/09 15:55:34 INFO datasources.FileSourceStrategy: Post-Scan Filters: 
21/08/09 15:55:34 INFO datasources.FileSourceStrategy: Output Data Schema: struct<value: string>
21/08/09 15:55:34 INFO execution.FileSourceScanExec: Pushed Filters: 
21/08/09 15:55:34 INFO spark.ContextCleaner: Cleaned accumulator 1
21/08/09 15:55:34 INFO codegen.CodeGenerator: Code generated in 175.427618 ms
21/08/09 15:55:35 INFO codegen.CodeGenerator: Code generated in 18.850862 ms
21/08/09 15:55:35 INFO codegen.CodeGenerator: Code generated in 5.146707 ms
21/08/09 15:55:35 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 341.8 KB, free 366.0 MB)
21/08/09 15:55:35 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 32.3 KB, free 365.9 MB)
21/08/09 15:55:35 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hp2:33128 (size: 32.3 KB, free: 366.3 MB)
21/08/09 15:55:35 INFO spark.SparkContext: Created broadcast 0 from count at SparkTest1.java:13
21/08/09 15:55:35 INFO execution.FileSourceScanExec: Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes.
21/08/09 15:55:35 INFO spark.SparkContext: Starting job: count at SparkTest1.java:13
21/08/09 15:55:35 INFO scheduler.DAGScheduler: Registering RDD 7 (count at SparkTest1.java:13)
21/08/09 15:55:35 INFO scheduler.DAGScheduler: Got job 0 (count at SparkTest1.java:13) with 1 output partitions
21/08/09 15:55:35 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (count at SparkTest1.java:13)
21/08/09 15:55:35 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
21/08/09 15:55:35 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0)
21/08/09 15:55:35 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[7] at count at SparkTest1.java:13), which has no missing parents
21/08/09 15:55:35 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 17.5 KB, free 365.9 MB)
21/08/09 15:55:35 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 8.1 KB, free 365.9 MB)
21/08/09 15:55:35 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on hp2:33128 (size: 8.1 KB, free: 366.3 MB)
21/08/09 15:55:35 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1164
21/08/09 15:55:35 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[7] at count at SparkTest1.java:13) (first 15 tasks are for partitions Vector(0))
21/08/09 15:55:35 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
21/08/09 15:55:35 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 8298 bytes)
21/08/09 15:55:35 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
21/08/09 15:55:35 INFO executor.Executor: Fetching spark://hp2:36387/jars/SparkStudy-1.0-SNAPSHOT.jar with timestamp 1628495730971
21/08/09 15:55:35 INFO client.TransportClientFactory: Successfully created connection to hp2/10.31.1.124:36387 after 33 ms (0 ms spent in bootstraps)
21/08/09 15:55:35 INFO util.Utils: Fetching spark://hp2:36387/jars/SparkStudy-1.0-SNAPSHOT.jar to /tmp/spark-13b50d5e-4efe-42ce-bede-90a470508b7f/userFiles-6993a66e-deff-47b3-8a6c-483e72f0d2b0/fetchFileTemp3971439811668465820.tmp
21/08/09 15:55:35 INFO executor.Executor: Adding file:/tmp/spark-13b50d5e-4efe-42ce-bede-90a470508b7f/userFiles-6993a66e-deff-47b3-8a6c-483e72f0d2b0/SparkStudy-1.0-SNAPSHOT.jar to class loader
21/08/09 15:55:35 INFO datasources.FileScanRDD: Reading File path: file:///home/pyspark/idcard.txt, range: 0-209, partition values: [empty row]
21/08/09 15:55:35 INFO codegen.CodeGenerator: Code generated in 9.344517 ms
21/08/09 15:55:35 INFO memory.MemoryStore: Block rdd_2_0 stored as values in memory (estimated size 600.0 B, free 365.9 MB)
21/08/09 15:55:35 INFO storage.BlockManagerInfo: Added rdd_2_0 in memory on hp2:33128 (size: 600.0 B, free: 366.3 MB)
21/08/09 15:55:35 INFO codegen.CodeGenerator: Code generated in 3.731388 ms
21/08/09 15:55:35 INFO codegen.CodeGenerator: Code generated in 36.086228 ms
21/08/09 15:55:35 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1900 bytes result sent to driver
21/08/09 15:55:36 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 389 ms on localhost (executor driver) (1/1)
21/08/09 15:55:36 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
21/08/09 15:55:36 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (count at SparkTest1.java:13) finished in 0.490 s
21/08/09 15:55:36 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/08/09 15:55:36 INFO scheduler.DAGScheduler: running: Set()
21/08/09 15:55:36 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1)
21/08/09 15:55:36 INFO scheduler.DAGScheduler: failed: Set()
21/08/09 15:55:36 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[10] at count at SparkTest1.java:13), which has no missing parents
21/08/09 15:55:36 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 7.1 KB, free 365.9 MB)
21/08/09 15:55:36 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 3.8 KB, free 365.9 MB)
21/08/09 15:55:36 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on hp2:33128 (size: 3.8 KB, free: 366.3 MB)
21/08/09 15:55:36 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1164
21/08/09 15:55:36 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[10] at count at SparkTest1.java:13) (first 15 tasks are for partitions Vector(0))
21/08/09 15:55:36 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
21/08/09 15:55:36 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, executor driver, partition 0, ANY, 7767 bytes)
21/08/09 15:55:36 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 1)
21/08/09 15:55:36 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks including 1 local blocks and 0 remote blocks
21/08/09 15:55:36 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 5 ms
21/08/09 15:55:36 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 1). 1627 bytes result sent to driver
21/08/09 15:55:36 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 44 ms on localhost (executor driver) (1/1)
21/08/09 15:55:36 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
21/08/09 15:55:36 INFO scheduler.DAGScheduler: ResultStage 1 (count at SparkTest1.java:13) finished in 0.067 s
21/08/09 15:55:36 INFO scheduler.DAGScheduler: Job 0 finished: count at SparkTest1.java:13, took 0.639594 s
21/08/09 15:55:36 INFO conf.HiveConf: Found configuration file file:/etc/hive/conf.cloudera.hive/hive-site.xml
21/08/09 15:55:36 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 2.1 using Spark classes.
21/08/09 15:55:36 INFO conf.HiveConf: Found configuration file file:/etc/hive/conf.cloudera.hive/hive-site.xml
21/08/09 15:55:36 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/007fe5f4-8b0d-48d4-bb28-f57e0b952579
21/08/09 15:55:36 INFO session.SessionState: Created local directory: /tmp/root/007fe5f4-8b0d-48d4-bb28-f57e0b952579
21/08/09 15:55:36 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/007fe5f4-8b0d-48d4-bb28-f57e0b952579/_tmp_space.db
21/08/09 15:55:36 INFO client.HiveClientImpl: Warehouse location for Hive client (version 2.1.1) is /user/hive/warehouse
21/08/09 15:55:36 INFO hive.metastore: HMS client filtering is enabled.
21/08/09 15:55:36 INFO hive.metastore: Trying to connect to metastore with URI thrift://hp1:9083
21/08/09 15:55:36 INFO hive.metastore: Opened a connection to metastore, current connections: 1
21/08/09 15:55:36 INFO hive.metastore: Connected to metastore.
21/08/09 15:55:37 INFO metadata.Hive: Registering function getdegree myUdf.getDegree
21/08/09 15:55:37 INFO spark.SparkContext: Starting job: count at SparkTest1.java:14
21/08/09 15:55:37 INFO scheduler.DAGScheduler: Registering RDD 15 (count at SparkTest1.java:14)
21/08/09 15:55:37 INFO scheduler.DAGScheduler: Got job 1 (count at SparkTest1.java:14) with 1 output partitions
21/08/09 15:55:37 INFO scheduler.DAGScheduler: Final stage: ResultStage 3 (count at SparkTest1.java:14)
21/08/09 15:55:37 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 2)
21/08/09 15:55:37 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 2)
21/08/09 15:55:37 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 2 (MapPartitionsRDD[15] at count at SparkTest1.java:14), which has no missing parents
21/08/09 15:55:37 INFO memory.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 17.5 KB, free 365.9 MB)
21/08/09 15:55:37 INFO memory.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 8.1 KB, free 365.9 MB)
21/08/09 15:55:37 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on hp2:33128 (size: 8.1 KB, free: 366.2 MB)
21/08/09 15:55:37 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1164
21/08/09 15:55:37 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 2 (MapPartitionsRDD[15] at count at SparkTest1.java:14) (first 15 tasks are for partitions Vector(0))
21/08/09 15:55:37 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
21/08/09 15:55:37 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, localhost, executor driver, partition 0, PROCESS_LOCAL, 8298 bytes)
21/08/09 15:55:37 INFO executor.Executor: Running task 0.0 in stage 2.0 (TID 2)
21/08/09 15:55:37 INFO storage.BlockManager: Found block rdd_2_0 locally
21/08/09 15:55:37 INFO executor.Executor: Finished task 0.0 in stage 2.0 (TID 2). 1857 bytes result sent to driver
21/08/09 15:55:37 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 19 ms on localhost (executor driver) (1/1)
21/08/09 15:55:37 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 
21/08/09 15:55:37 INFO scheduler.DAGScheduler: ShuffleMapStage 2 (count at SparkTest1.java:14) finished in 0.037 s
21/08/09 15:55:37 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/08/09 15:55:37 INFO scheduler.DAGScheduler: running: Set()
21/08/09 15:55:37 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 3)
21/08/09 15:55:37 INFO scheduler.DAGScheduler: failed: Set()
21/08/09 15:55:37 INFO scheduler.DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[18] at count at SparkTest1.java:14), which has no missing parents
21/08/09 15:55:37 INFO memory.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 7.1 KB, free 365.9 MB)
21/08/09 15:55:37 INFO memory.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 3.8 KB, free 365.9 MB)
21/08/09 15:55:37 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on hp2:33128 (size: 3.8 KB, free: 366.2 MB)
21/08/09 15:55:37 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1164
21/08/09 15:55:37 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 3 (MapPartitionsRDD[18] at count at SparkTest1.java:14) (first 15 tasks are for partitions Vector(0))
21/08/09 15:55:37 INFO scheduler.TaskSchedulerImpl: Adding task set 3.0 with 1 tasks
21/08/09 15:55:37 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 3.0 (TID 3, localhost, executor driver, partition 0, ANY, 7767 bytes)
21/08/09 15:55:37 INFO executor.Executor: Running task 0.0 in stage 3.0 (TID 3)
21/08/09 15:55:37 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks including 1 local blocks and 0 remote blocks
21/08/09 15:55:37 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
21/08/09 15:55:37 INFO executor.Executor: Finished task 0.0 in stage 3.0 (TID 3). 1584 bytes result sent to driver
21/08/09 15:55:37 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 3.0 (TID 3) in 10 ms on localhost (executor driver) (1/1)
21/08/09 15:55:37 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 
21/08/09 15:55:37 INFO scheduler.DAGScheduler: ResultStage 3 (count at SparkTest1.java:14) finished in 0.021 s
21/08/09 15:55:37 INFO scheduler.DAGScheduler: Job 1 finished: count at SparkTest1.java:14, took 0.082477 s
Lines with 1: 11, lines with 2: 10
21/08/09 15:55:37 INFO server.AbstractConnector: Stopped Spark@b93aad{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
21/08/09 15:55:37 INFO ui.SparkUI: Stopped Spark web UI at http://hp2:4040
21/08/09 15:55:37 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/08/09 15:55:37 INFO memory.MemoryStore: MemoryStore cleared
21/08/09 15:55:37 INFO storage.BlockManager: BlockManager stopped
21/08/09 15:55:37 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
21/08/09 15:55:37 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/08/09 15:55:37 INFO spark.SparkContext: Successfully stopped SparkContext
21/08/09 15:55:37 INFO util.ShutdownHookManager: Shutdown hook called
21/08/09 15:55:37 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-f1d84ef3-a294-4488-960d-521fd0c0bbea
21/08/09 15:55:37 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-13b50d5e-4efe-42ce-bede-90a470508b7f
[root@hp2 javaspark]# 

参考:

  1. http://spark.apache.org/docs/2.4.2/quick-start.html
  • 0
    点赞
  • 0
    评论
  • 0
    收藏
  • 一键三连
    一键三连
  • 扫一扫,分享海报

©️2021 CSDN 皮肤主题: 技术黑板 设计师:CSDN官方博客 返回首页
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值