报错详细日志信息:
es报错org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 55.0 failed 4 times, most recent failure: Lost task 0.3 in stage 55.0 (TID 4643, 192.168.1.203, executor 3): org.elasticsearch.hadoop.EsHadoopException: Could not write all entries for bulk operation [47/10813]. Error sample (first [5] error messages):
failed to parse [edgeEasyPovertyStartDate]
failed to parse [edgeEasyPovertyStartDate]
failed to parse [edgeEasyPovertyStartDate]
failed to parse [edgeEasyPovertyStartDate]
failed to parse [edgeEasyPovertyStartDate]
Bailing out...
at org.elasticsearch.hadoop.rest.bulk.BulkProcessor.flush(BulkProcessor.java:475)
at org.elasticsearch.hadoop.rest.bulk.BulkProcessor.add(BulkProcessor.java:106)
at org.elasticsearch.hadoop.rest.RestRepository.doWriteToIndex(RestRepository.java:187)
at org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository.java:168)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:67)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:100)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
从日志中分析得知:failed to parse [edgeEasyPovertyStartDate] spark在写入es的时候,解析
"edgeEasyPovertyStartDate" 这个字段出现问题,于是我查了下这个es索引这个字段类型
是integer 类型,而这个字段数据类型实际上是 date 类型,从新创建es索引,把 edgeEasyPovertyStartDate 字段类型integer--->date 就没有问题了。
spark 写 es 报错有许多是由于es索引字段和数据类型不匹配导致的。