es+spark 读取es中的自定义格式日期时报异常

问题描述:spark读取指定索引/类型的数据,其中有自定义格式的日期数据,读取该日期时报异常

User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 54, mini3, executor 3): org.elasticsearch.hadoop.rest.EsHadoopParsingException: Cannot parse value [1989-04-16] for field [comBornDate]

at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:903)

at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:1047)

at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:889)

at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:602)

at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:426)

at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:292)

at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:262)

at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:331)

at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:115)

at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:61)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)

at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:148)

at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)

at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)

at org.apache.spark.scheduler.Task.run(Task.scala:109)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Caused by: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot invoke method public org.joda.time.DateTime org.joda.time.format.DateTimeFormatter.parseDateTime(java.lang.String)

at org.elasticsearch.hadoop.util.ReflectionUtils.invoke(ReflectionUtils.java:93)

at org.elasticsearch.hadoop.util.DateUtils$JodaTime.parseDate(DateUtils.java:105)

at org.elasticsearch.hadoop.util.DateUtils.parseDate(DateUtils.java:122)

at org.elasticsearch.spark.serialization.ScalaValueReader.createDate(ScalaValueReader.scala:179)

at org.elasticsearch.spark.serialization.ScalaValueReader.parseDate(ScalaValueReader.scala:170)

at org.elasticsearch.spark.serialization.ScalaValueReader$$anonfun$date$1.apply(ScalaValueReader.scala:163)

at org.elasticsearch.spark.serialization.ScalaValueReader$$anonfun$date$1.apply(ScalaValueReader.scala:163)

at org.elasticsearch.spark.serialization.ScalaValueReader.checkNull(ScalaValueReader.scala:117)

at org.elasticsearch.spark.serialization.ScalaValueReader.date(ScalaValueReader.scala:163)

at org.elasticsearch.spark.serialization.ScalaValueReader.readValue(ScalaValueReader.scala:93)

at org.elasticsearch.hadoop.serialization.ScrollReader.parseValue(ScrollReader.java:950)

at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:900)

... 18 more

Caused by: java.lang.reflect.InvocationTargetException

at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.elasticsearch.hadoop.util.ReflectionUtils.invoke(ReflectionUtils.java:91)

... 29 more

Caused by: org.joda.time.IllegalInstantException: Cannot parse "1989-04-16": Illegal instant due to time zone offset transition (Asia/Shanghai)

at org.joda.time.format.DateTimeParserBucket.computeMillis(DateTimeParserBucket.java:471)

at org.joda.time.format.DateTimeParserBucket.computeMillis(DateTimeParserBucket.java:411)

at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:928)

... 33 more

Driver stacktrace:

at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$failJobAndIndependentStages(DAGScheduler.scala:1609)

at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1597)

at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1596)

at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)

at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1596)

at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)

at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)

at scala.Option.foreach(Option.scala:257)

at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)

at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1830)

at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1779)

at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1768)

at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)

at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)

at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)

at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)

at org.elasticsearch.spark.rdd.EsSpark$.doSaveToEs(EsSpark.scala:108)

at org.elasticsearch.spark.rdd.EsSpark$.saveToEs(EsSpark.scala:79)

at org.elasticsearch.spark.rdd.EsSpark$.saveToEs(EsSpark.scala:74)

at com.chuangxin.data.process.crunchbaseFormat.organizationFundingToEs$.main(organizationFundingToEs.scala:95)

at com.chuangxin.data.process.crunchbaseFormat.organizationFundingToEs.main(organizationFundingToEs.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:688)

Caused by: org.elasticsearch.hadoop.rest.EsHadoopParsingException: Cannot parse value [1989-04-16] for field [comBornDate]

at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:903)

at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:1047)

at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:889)

at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:602)

at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:426)

at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:292)

at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:262)

at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:331)

at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:115)

at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:61)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)

at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:148)

at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)

at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)

at org.apache.spark.scheduler.Task.run(Task.scala:109)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Caused by: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot invoke method public org.joda.time.DateTime org.joda.time.format.DateTimeFormatter.parseDateTime(java.lang.String)

at org.elasticsearch.hadoop.util.ReflectionUtils.invoke(ReflectionUtils.java:93)

at org.elasticsearch.hadoop.util.DateUtils$JodaTime.parseDate(DateUtils.java:105)

at org.elasticsearch.hadoop.util.DateUtils.parseDate(DateUtils.java:122)

at org.elasticsearch.spark.serialization.ScalaValueReader.createDate(ScalaValueReader.scala:179)

at org.elasticsearch.spark.serialization.ScalaValueReader.parseDate(ScalaValueReader.scala:170)

at org.elasticsearch.spark.serialization.ScalaValueReader$$anonfun$date$1.apply(ScalaValueReader.scala:163)

at org.elasticsearch.spark.serialization.ScalaValueReader$$anonfun$date$1.apply(ScalaValueReader.scala:163)

at org.elasticsearch.spark.serialization.ScalaValueReader.checkNull(ScalaValueReader.scala:117)

at org.elasticsearch.spark.serialization.ScalaValueReader.date(ScalaValueReader.scala:163)

at org.elasticsearch.spark.serialization.ScalaValueReader.readValue(ScalaValueReader.scala:93)

at org.elasticsearch.hadoop.serialization.ScrollReader.parseValue(ScrollReader.java:950)

at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:900)

... 18 more

Caused by: java.lang.reflect.InvocationTargetException

at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.elasticsearch.hadoop.util.ReflectionUtils.invoke(ReflectionUtils.java:91)

... 29 more

Caused by: org.joda.time.IllegalInstantException: Cannot parse "1989-04-16": Illegal instant due to time zone offset transition (Asia/Shanghai)

at org.joda.time.format.DateTimeParserBucket.computeMillis(DateTimeParserBucket.java:471)

at org.joda.time.format.DateTimeParserBucket.computeMillis(DateTimeParserBucket.java:411)

at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:928)

... 33 more

 

解决:

在sparkconf中设置es.mapping.date.rich为false不生效,在命令行提交时设置spark.es.mapping.date.rich为false生效,可以不解析为date,直接返回string。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值