sparkSQL写数据到hdfs中出现Snappy报错

今天spark在写数据到hive中出现错误


Job aborted due to stage failure: Task 7 in stage 5.0 failed 4 times, most recent failure: Lost task 7.3 in stage 5.0 (TID 68,
NM-304-SA5212M4-BIGDATA-2016-356.BIGDATA.CHINATELECOM.CN):
java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
at org.apache.hadoop.io.compress.Codec.checkNativeCodeLoaded(SnappyCodec.java:63)
at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:133)
at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
at org.apache.hadoop.io.compress.CompressionCodec$Util.createOutputStreamWithCodecPool(CompressionCodec.java:131)
at org.apache.hadoop.io.compress.SnappyCodec.createOutputStream(SnappyCodec.java:98)
at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:136)
at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1191)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1183)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

乍看之下,是Snappy压缩的问题。但是,在/etc/spark/conf下面的spark-default.conf里面已经配置了
extraClassPath
spark.driver.extraClassPath /usr/lib/hadoop/lib/*:/usr/lib/spark/hlib/*
spark.driver.extraLibraryPath /usr/lib/hadoop/lib/native/
spark.executor.extraClassPath /usr/lib/hadoop/lib/
spark.executor.extraLibraryPath /usr/lib/hadoop/lib/native/*

但是问题就出在这里,最后一行有个*号。
导致Snappy jar包重复,出现这里写图片描述
可以看到,有个duplicate的字样,说明有上传重复的的文件。去掉*号,正常运行,不再报错

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值