20/03/27 10:50:53 WARN scheduler.TaskSetManager: Lost task 27.2 in stage 0.0 (TID 71, xxx11, executor 5): org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [xxx] returned Bad Request(400) - failed to parse; Bailing out..
at org.elasticsearch.hadoop.rest.RestClient.processBulkResponse(RestClient.java:251)
at org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:203)
at org.elasticsearch.hadoop.rest.RestRepository.tryFlush(RestRepository.java:222)
at org.elasticsearch.hadoop.rest.RestRepository.flush(RestRepository.java:244)
at org.elasticsearch.hadoop.rest.RestRepository.doWriteToIndex(RestRepository.java:198)
at org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository.java:161)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:67)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:107)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:107)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:407)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1408)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:413)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
以上是报错信息
网上找半天没找到什么解决方案
下面是scala代码,将HDFS上的数据通过spark存到ES,这里用到了EsSpark类
val sc = SparkSession.builder().appName("step1")
//.master("local[*]")
.config("es.index.auto.create", "true")
.config("pushdown", "true")
.config("es.nodes", "xxx0,xxx1,xxx2,xxx3")
.config("es.port", "9200")
.config("es.nodes.wan.only", "true")
.config("es.batch.write.retry.wait", "500")
.config("es.batch.write.retry.count", "50")
.config("es.batch.size.bytes", "300000000")
.config("es.batch.size.entries", "10000")
.config("es.batch.write.refresh", "false")
.config("es.batch.write.retry.count", "60")
.config("es.http.timeout", "10m")
.config("es.http.retries", "50")
.config("es.action.heart.beat.lead", "50")
.getOrCreate()
val today = args(0)
val jsonRdd = sc.sparkContext.textFile(s"xxx")
.map(line=>{
val arr = line.replace("\"", "").split(",", -1)
val color = arr(0).split("_")(1) match {
case "0" => "蓝色"
case "1" => "黄色"
case "2" => "黑色"
case "3" => "白色"
case "4" => "渐变绿色"
case "5" => "黄绿双拼色"
case "6" => "蓝白渐变色"
case _ => "未确定"
}
Map(
"vehicle_color"->color,
"vehicle_number"->arr(0).split("_")(0),
"total_count" -> arr(1),
"average_park_time" -> arr(2),
"average_park_fee" -> arr(3),
"park1" -> arr(4),
"park2" -> arr(5),
"park3" -> arr(6),
"pay_way" -> arr(7)
).filter(_._2 != "")
})
EsSpark.saveToEs(jsonRdd, "xxx") //用这个方法不会报错
// EsSpark.saveJsonToEs(jsonRdd, "xxx") //用这个方法会报错
saveToEs会把Map格式的数据直接导入es
saveJsonToEs希望我将Json格式的数据导入es,但我这里是Map类型的,所以类型不一致报错了