一、错误信息:
Exception in thread "main" org.apache.spark.sql.streaming.StreamingQueryException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 6, emr-worker-1.cluster-210018, executor 1): java.lang.NoSuchMethodError: org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(Lorg/apache/avro/generic/GenericRecord;Ljava/lang/String;Z)Ljava/lang/Object;
分析:乍一看应该jar冲突,
查看jar冲突神命令:
less /opt/apps/ecm/service/spark/2.4.5-hadoop3.1-1.0.3/package/spark-2.4.5-hadoop3.1-1.0.3/jars/* | grep 'HoodieAvroUtils'
发现了两个 org/apache/hudi/avro/HoodieAvroUtils.class,问题应该出现了这里
要去掉其中一个。这里我放了hudi-hadoop-mr-bundle-0.9.0-SNAPSHOT.jar和hudi-spark-bundle_2.11-0.9.0-SNAPSHOT.jar,把hudi-hadoop-mr-bundle-0.9.0-SNAPSHOT.jar删了,spark structstreaming就能正常写入hudi了