报错信息
报错信息很长,这里截取了最先的一个错误信息,发现是写入文件时出错
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in stage 2.0 failed 4 times, most recent failure: Lost task 9.3 in stage 2.0 (TID 1121, emr-worker-3.cluster-138513, executor 23): org.apache.spark.SparkException: Task failed while writing rows.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
解决方法
代码中原来是
create table ccid_registered.temp_ccid_need_clean stored as parquet tblproperties('parquet.compression'='snappy') as
(sql语句)
需要先建表,再塞数据
insert overwrite table ccid_registered.temp_ccid_need_clean