当我们运行spark-shell或者是开发spark项目时,运行结果总是会伴随很多的日志,影响我们对结果的查看
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
22/02/23 11:24:49 INFO SparkContext: Running Spark version 2.4.5
22/02/23 11:24:50 INFO SparkContext: Submitted application: WordCount
22/02/23 11:24:50 INFO SecurityManager: Changing view acls to: 15093
22/02/23 11:24:50 INFO SecurityManager: Changing modify acls to: 15093
22/02/23 11:24:50 INFO SecurityManager: Changing view acls groups to:
22/02/23 11:24:50 INFO SecurityManager: Changing modify acls groups to:
22/02/23 11:24:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(15093); groups with view permissions: Set(); users with modify permissions: Set(15093); groups with modify permissions: Set()
22/02/23 11:24:51 INFO Utils: Successfully started service 'sparkDriver' on port 10799.
22/02/23 11:24:51 INFO SparkEnv: Registering MapOutputTracker
22/02/23 11:24:51 INFO SparkEnv: Registering BlockManagerMaster
22/02/23 11:24:51 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/02/23 11:24:51 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/02/23 11:24:51 INFO DiskBlockManager: Created local directory at C:\Users\15093\AppData\Local\Temp\blockmgr-e1280c81-c88f-4de5-b48e-568262d203f7
22/02/23 11:24:51 INFO MemoryStore: MemoryStore started with capacity 1713.6 MB
22/02/23 11:24:51 INFO SparkEnv: Registering OutputCommitCoordinator
22/02/23 11:24:51 INFO Utils: Successfully started service 'SparkUI' on port 4040.
22/02/23 11:24:51 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.21.25.9:4040
22/02/23 11:24:51 INFO Executor: Starting executor ID driver on host localhost
22/02/23 11:24:51 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 10842.
22/02/23 11:24:51 INFO NettyBlockTransferService: Server created on 10.21.25.9:10842
22/02/23 11:24:51 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
22/02/23 11:24:51 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.21.25.9, 10842, None)
22/02/23 11:24:51 INFO BlockManagerMasterEndpoint: Registering block manager 10.21.25.9:10842 with 1713.6 MB RAM, BlockManagerId(driver, 10.21.25.9, 10842, None)
22/02/23 11:24:51 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.21.25.9, 10842, None)
22/02/23 11:24:51 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.21.25.9, 10842, None)
22/02/23 11:24:52 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 214.6 KB, free 1713.4 MB)
22/02/23 11:24:52 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 20.4 KB, free 1713.4 MB)
22/02/23 11:24:52 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.21.25.9:10842 (size: 20.4 KB, free: 1713.6 MB)
22/02/23 11:24:52 INFO SparkContext: Created broadcast 0 from textFile at test1.scala:13
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/input/word.txt
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:204)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:273)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:273)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:273)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:273)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
at org.apache.spark.Partitioner$.$anonfun$defaultPartitioner$4(Partitioner.scala:78)
at org.apache.spark.Partitioner$.$anonfun$defaultPartitioner$4$adapted(Partitioner.scala:78)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.immutable.List.map(List.scala:298)
at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:78)
at org.apache.spark.rdd.PairRDDFunctions.$anonfun$reduceByKey$4(PairRDDFunctions.scala:326)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:326)
at test1$.main(test1.scala:22)
at test1.main(test1.scala)
22/02/23 11:24:52 INFO SparkContext: Invoking stop() from shutdown hook
22/02/23 11:24:52 INFO SparkUI: Stopped Spark web UI at http://10.21.25.9:4040
22/02/23 11:24:52 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/02/23 11:24:52 INFO MemoryStore: MemoryStore cleared
22/02/23 11:24:52 INFO BlockManager: BlockManager stopped
22/02/23 11:24:52 INFO BlockManagerMaster: BlockManagerMaster stopped
22/02/23 11:24:52 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/02/23 11:24:52 INFO SparkContext: Successfully stopped SparkContext
22/02/23 11:24:52 INFO ShutdownHookManager: Shutdown hook called
22/02/23 11:24:52 INFO ShutdownHookManager: Deleting directory C:\Users\15093\AppData\Local\Temp\spark-8baf1aca-d357-47e2-93d8-2e033dc36104
遇到这种情况,我们需要修改spark可以显示的日志等级,控制spark日志的配置文件名为log4j.properties
在spark的conf文件夹下
在文件中找到rootCategory这一属性,默认等级应该为INFO,这里我们把它设置为ERROR即可
如果你使用的是maven项目,可以再resource文件夹下,新建一个log4j.properties文件,复制一下内容即可生效
log4j.rootCategory=ERROR, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Set the default spark-shell log level to ERROR. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=ERROR
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark_project.jetty=ERROR
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=ERROR
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=ERROR
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR