关闭

小问题可能存在大问题,希望大神帮忙解答。Spark本地运行模式中单线程与多线程问题之setMaster("local")可以运行,但是设置成setMaster("local[3]")或setMaste

标签: Spark线程
1120人阅读 评论(6) 收藏 举报

小问题可能存在大问题,希望大神帮忙解答

求大神帮忙解决同样的代码

setMaster("local")可以运行,但是设置成setMaster("local[3]")或setMaster("local[*]")则报错

一、Spark中本地运行模式

Spark中本地运行模式有3种,如下

(1)local 模式:本地单线程运行;
(2)local[k]模式:本地K个线程运行;
(3)local[*]模式:用本地尽可能多的线程运行。

二、读取的数据如下


三、牵扯数组越界代码如下

 val pieces = line.replaceAll("\"" , "")
      val carid = pieces.split(',')(0)
      val lngstr = pieces.split(',')(1)
      val latstr = pieces.split(',')(2)

      var lng=BigDecimal(0)
      var lat=BigDecimal(0)

      try {
        lng = myround(BigDecimal(lngstr), 3)
        lat = myround(BigDecimal(latstr), 3)
      }catch {
        case e: NumberFormatException => println(".....help......"+lngstr+"....."+latstr)
      }
如果是读取的文件数据脏,读取的原始数据就没有前3列的值,然后导致根据逗号分割后,数组越界,导致数组越界,那为啥Master为local的时候可以允许呢?

四、遇到的local、local[k]、local[*]问题

      Spark 中Master初始化如下:

val sparkConf = new SparkConf().setMaster("local").setAppName("count test")

问题描述如下:

    代码其他部门没有改动,只是改变setMaster(),则出现不同结果:

(1)如果 setMaster()写成setMaster("local"),代码正确运行;

(2)如果写成 setMaster("local[3]")则报错;

(3)如果写成 setMaster("local[*]")则报错,且与(2)中报错内容一样;

 setMaster("local[3]")报错内容与setMaster("local[*]")报错内容一样,报错内容如下:

17/07/31 13:39:01 INFO HadoopRDD: Input split: file:/E:/data/gps201608.csv:0+7683460
17/07/31 13:39:02 INFO BlockManagerInfo: Removed broadcast_1_piece0 on localhost:50541 in memory (size: 1848.0 B, free: 133.6 MB)
17/07/31 13:39:05 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 2)
java.lang.ArrayIndexOutOfBoundsException
	at java.lang.System.arraycopy(Native Method)
	at scala.collection.mutable.ResizableArray$class.ensureSize(ResizableArray.scala:100)
	at scala.collection.mutable.ArrayBuffer.ensureSize(ArrayBuffer.scala:47)
	at scala.collection.mutable.ArrayBuffer.$plus$eq(ArrayBuffer.scala:83)
	at count$.count$$mystatistics$1(count.scala:76)
	at count$$anonfun$2.apply(count.scala:87)
	at count$$anonfun$2.apply(count.scala:87)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:277)
	at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
	at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
	at org.apache.spark.scheduler.Task.run(Task.scala:70)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
17/07/31 13:39:05 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 2, localhost): java.lang.ArrayIndexOutOfBoundsException
	at java.lang.System.arraycopy(Native Method)
	at scala.collection.mutable.ResizableArray$class.ensureSize(ResizableArray.scala:100)
	at scala.collection.mutable.ArrayBuffer.ensureSize(ArrayBuffer.scala:47)
	at scala.collection.mutable.ArrayBuffer.$plus$eq(ArrayBuffer.scala:83)
	at count$.count$$mystatistics$1(count.scala:76)
	at count$$anonfun$2.apply(count.scala:87)
	at count$$anonfun$2.apply(count.scala:87)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:277)
	at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
	at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
	at org.apache.spark.scheduler.Task.run(Task.scala:70)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

17/07/31 13:39:05 ERROR TaskSetManager: Task 0 in stage 1.0 failed 1 times; aborting job
17/07/31 13:39:05 INFO TaskSchedulerImpl: Cancelling stage 1
17/07/31 13:39:05 INFO TaskSchedulerImpl: Stage 1 was cancelled
17/07/31 13:39:05 INFO Executor: Executor is trying to kill task 1.0 in stage 1.0 (TID 3)
17/07/31 13:39:05 INFO DAGScheduler: ResultStage 1 (saveAsTextFile at count.scala:90) failed in 3.252 s
17/07/31 13:39:05 INFO Executor: Executor killed task 1.0 in stage 1.0 (TID 3)
17/07/31 13:39:05 INFO DAGScheduler: Job 1 failed: saveAsTextFile at count.scala:90, took 3.278665 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 2, localhost): java.lang.ArrayIndexOutOfBoundsException
	at java.lang.System.arraycopy(Native Method)
	at scala.collection.mutable.ResizableArray$class.ensureSize(ResizableArray.scala:100)
	at scala.collection.mutable.ArrayBuffer.ensureSize(ArrayBuffer.scala:47)
	at scala.collection.mutable.ArrayBuffer.$plus$eq(ArrayBuffer.scala:83)
	at count$.count$$mystatistics$1(count.scala:76)
	at count$$anonfun$2.apply(count.scala:87)
	at count$$anonfun$2.apply(count.scala:87)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:277)
	at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
	at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
	at org.apache.spark.scheduler.Task.run(Task.scala:70)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
	at scala.Option.foreach(Option.scala:236)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
17/07/31 13:39:05 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 3, localhost): TaskKilled (killed intentionally)
17/07/31 13:39:05 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
17/07/31 13:39:05 INFO SparkContext: Invoking stop() from shutdown hook
报错截屏如下:



五、奇怪问题

    过了一段时间,我什么也没动,发现把Master设置成local[3]可以运行,生成2个结果文件,但都为空,如下图所示:









0
0
查看评论
发表评论
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场

Intellij IDEA开发(local模式)提交运行Spark代码

这里我来说一下Intellij IDEA(local模式) 如何在windows上运行spark Pi的程序: 1 首先下载Intellij IDEA 地址: https://www.jetbr...
  • u012429555
  • u012429555
  • 2016-05-02 21:20
  • 4635

Local模式下开发第一个Spark程序并运行于集群环境

第一阶段(1-3月):会从浅入深,基于大量案例实战,深度剖析和讲解Spark,并且会包含完全从企业真实复杂业务需求中抽取出的案例实战。课程会涵盖Scala编程详解、Spark核心编程、Spark SQ...
  • kxr0502
  • kxr0502
  • 2016-01-10 08:09
  • 1522

spark-submit local本地运行问题

spark-submit本地运行模式遇到的问题,第一次运行本地模式成功,再次运行数据却写到了HDFS目录下
  • RiverCode
  • RiverCode
  • 2017-05-10 16:29
  • 1832

使用单线程还是多线程的问题

对于处理时间短的服务或者启动频率高的要用单线程,相反用多线程!  不论什么时候只要能用单线程就不用多线程,只有在需要响应时间要求比较高的情况下用多线程 某此操作允许并发而且该操作有可能阻塞时, 用多...
  • SHENNONGZHAIZHU
  • SHENNONGZHAIZHU
  • 2016-06-01 22:05
  • 2411

servlet的生命周期,servlet是单线程还是多线程(基础问题)

Servlet的生命周期有三个阶段: 1.初始化阶段,调用init()方法; 2.响应客户请求阶段,调用service()方法; 3.终止阶段,调用destroy()方法; Servlet初始化阶段:...
  • Future_IT_Daniel
  • Future_IT_Daniel
  • 2016-06-28 09:33
  • 2841

多线程和单线程的执行效率问题

多线程和单线程的执行效率问题 转:平凡之路的博客 一提到多线程一般大家的第一感觉就是可以提升程序性能,在实际的操作中往往遇到性能的问题,都尝试使用多线程来解决问题,但多线程程序并不...
  • SpadgerZ
  • SpadgerZ
  • 2016-10-12 21:46
  • 2180

IOS 单线程 多线程问题

文章转自:http://www.cnblogs.com/ygm900/p/3169706.html   向作者致敬! ----------------------------------------...
  • niejinhao052011
  • niejinhao052011
  • 2015-01-28 10:19
  • 364

黑马程序员--07.集合框架--并发访问异常理解:一个单线程程序的多线程运行思想【个人总结】

集合中并发访问异常的理解:       一个单线程程序的多线程理解 ----------- android培训、java培训、java学习型技术博客、期待与您交流! ------------ 1...
  • u011406124
  • u011406124
  • 2013-07-24 15:20
  • 753

Servlet及JSP中的多线程同步问题及servlet单线程模式

Servlet/JSP技术和ASP、PHP等相比,由于其多线程运行而具有很高的执行效率。由于Servlet/JSP默认是以多线程模式执行的,所以,在编写代码时需要非常细致地考虑多线程的安全性问题。然而...
  • xinyewdz
  • xinyewdz
  • 2011-11-26 21:14
  • 752

通过python threading Thread理解多线程和单线程的运行机制

原文地址:http://www.cnblogs.com/fnng/p/3670789.html   多线程和多进程是什么自行google补脑   对于python 多线程的理解...
  • l281865263
  • l281865263
  • 2017-01-10 15:22
  • 872
    个人资料
    • 访问:45510次
    • 积分:755
    • 等级:
    • 排名:千里之外
    • 原创:28篇
    • 转载:1篇
    • 译文:0篇
    • 评论:24条
    最新评论