记一次ARM-鲲鹏服务器读写parquet报错解决过程

背景:

最近客户现场使用华为提供的ARM-鲲鹏服务器集群,使用spark2.4.0,输出数据格式为parquet时,下游流程再使用该输出作为输入时出现报错,报错日志如下:

Caused by: java.io.IOException: could not decode the dictionary for [IMSI] optional int64 IMSI
  at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.<init>(VectorizedColumnReader.java:121)
  at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:312)
  at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:256)
  at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:159)
  at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
  at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
  at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181)
  at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
  at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.scan_nextBatch_0$(Unknown Source)
  at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
  at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
  at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:619)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:255)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
  at org.apache.spark.scheduler.Task.run(Task.scala:121)
  at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
  ... 3 more
Caused by: java.io.IOException: FAILED_TO_UNCOMPRESS(5)
  at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:98)
  at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
  at org.xerial.snappy.Snappy.uncompress(Snappy.java:547)
  at org.apache.parquet.hadoop.codec.SnappyDecompressor.decompress(SnappyDecompressor.java:69)
  at org.apache.parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:51)
  at java.io.DataInputStream.readFully(DataInputStream.java:195)
  at java.io.DataInputStream.readFully(DataInputStream.java:169)
  at org.apache.parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:263)
  at org.apache.parquet.bytes.BytesInput.toByteBuffer(BytesInput.java:214)
  at org.apache.parquet.bytes.BytesInput.toInputStream(BytesInput.java:223)
  at org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainLongDictionary.<init>(PlainValuesDictionary.java:154)
  at org.apache.parquet.column.Encoding$1.initDictionary(Encoding.java:94)
  at org.apache.parquet.column.Encoding$4.initDictionary(Encoding.java:147)
  at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.<init>(VectorizedColumnReader.java:118)
  ... 29 more

解决过程:

从日志中可以看出个大概是系统库的问题,经过上网查看资料发现,这个问题是出在spark自带的snappy-java包上,通过https://github.com/xerial/snappy-java/issues/209 可以看到这个是xerial/snappy-java 在snappy-java-1.1.7.1.jar版本对AARCH64系统支持的一个BUG;问题再snappy-java-1.1.7.2.jar上已经解决。所以赶紧冲mvn中央仓库下载了snappy-java-1.1.7.2.jar替换到SPARK_HOME/jars下,重新运行,发现又报了一个新的错误。


错误日志如下:

Caused by: org.apache.spark.SparkException: Task failed while writing rows.
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:254)
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:168)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
  at org.apache.spark.scheduler.Task.run(Task.scala:121)
  at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.UnsatisfiedLinkError: /tmp/snappy-1.1.7-d8bf1808-a287-417f-824b-58a17a512bc9-libsnappyjava.so: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /tmp/snappy-1.1.7-d8bf1808-a287-417f-824b-58a17a512bc9-libsnappyjava.so) (Possible cause: architecture word width mismatch)
  at java.lang.ClassLoader$NativeLibrary.load(Native Method)
  at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1934)
  at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1817)
  at java.lang.Runtime.load0(Runtime.java:809)
  at java.lang.System.load(System.java:1086)
  at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:179)
  at org.xerial.snappy.SnappyLoader.loadSnappyApi(SnappyLoader.java:154)
  at org.xerial.snappy.Snappy.<clinit>(Snappy.java:47)
  at org.apache.parquet.hadoop.codec.SnappyCompressor.compress(SnappyCompressor.java:67)
  at org.apache.hadoop.io.compress.CompressorStream.compress

从这个错误日志中可以看出来,是当前系统的libstdc++.so.6需要GLIBCXX_3.4.21,但是系统库里没有GLIBCXX_3.4.21
于是使用命令查看了一下:

[merce@testarm02 ~]$ strings /usr/lib64/libstdc++.so.6 | grep GLIBC
GLIBCXX_3.4
GLIBCXX_3.4.1
GLIBCXX_3.4.2
GLIBCXX_3.4.3
GLIBCXX_3.4.4
GLIBCXX_3.4.5
GLIBCXX_3.4.6
GLIBCXX_3.4.7
GLIBCXX_3.4.8
GLIBCXX_3.4.9
GLIBCXX_3.4.10
GLIBCXX_3.4.11
GLIBCXX_3.4.12
GLIBCXX_3.4.13
GLIBCXX_3.4.14
GLIBCXX_3.4.15
GLIBCXX_3.4.16
GLIBCXX_3.4.17
GLIBCXX_3.4.18
GLIBCXX_3.4.19
GLIBC_2.17
GLIBCXX_DEBUG_MESSAGE_LENGTH

果然,当前系统GLIBCXX的最高版本才到3.4.19;没办法,name只能升级libstdc++.so.6库了。


先查看下当前系统libstdc++.so.6是什么版本的:

[merce@testarm02 ~]$ ll /usr/lib64/libstdc++*
lrwxrwxrwx  1 root root      19 Jul 29 10:06 /usr/lib64/libstdc++.so.6 -> libstdc++.so.6.0.19
-rwxr-xr-x. 1 root root 1057320 Oct 30  2018 /usr/lib64/libstdc++.so.6.0.19

可以看到,当前系统使用的是libstdc++.so.6.0.19 ;那我需要升级到跟高版本,有序系统是ARM架构,所以,需要下载 arm64对应的libstdc++.so库;国能用ARM架构系统的很少,网上能找到的资源基本上都是ReadHat x86版本的,很少有arm64的;
后来通过华为论坛找到了资源地址:http://ftp.de.debian.org/debian/pool/main/g/gcc-6 这里提供了很多版本可以下载。

//下载:
wget http://ftp.de.debian.org/debian/pool/main/g/gcc-6/libstdc++6_6.3.0-18+deb9
//解压:u1_arm64.deb
ar -x  libstdc++6_6.3.0-18+deb9u1_arm64.deb
tar xvf data.tar.xz

解压完后在目录usr/lib/aarch64-linux-gnu/ 下可以看到libstdc++.so.6.0.22安装包;
同时我们在通过strings usr/lib/aarch64-linux-gnu/libstdc++.so.6 |grep GLIBC 可以看到GLIBC库最高为3.4.22包含3.4.21版本了,可以满足需求:

[merce@testarm02 test]$ ll usr/lib/aarch64-linux-gnu/
total 1528
lrwxrwxrwx 1 merce merce      19 Feb 15  2018 libstdc++.so.6 -> libstdc++.so.6.0.22
-rw-r--r-- 1 merce merce 1562248 Feb 15  2018 libstdc++.so.6.0.22

[merce@testarm02 test]$ strings usr/lib/aarch64-linux-gnu/libstdc++.so.6 |grep GLIBC
GLIBCXX_3.4
GLIBCXX_3.4.1
GLIBCXX_3.4.2
GLIBCXX_3.4.3
GLIBCXX_3.4.4
GLIBCXX_3.4.5
GLIBCXX_3.4.6
GLIBCXX_3.4.7
GLIBCXX_3.4.8
GLIBCXX_3.4.9
GLIBCXX_3.4.10
GLIBCXX_3.4.11
GLIBCXX_3.4.12
GLIBCXX_3.4.13
GLIBCXX_3.4.14
GLIBCXX_3.4.15
GLIBCXX_3.4.16
GLIBCXX_3.4.17
GLIBCXX_3.4.18
GLIBCXX_3.4.19
GLIBCXX_3.4.20
GLIBCXX_3.4.21
GLIBCXX_3.4.22
GLIBC_2.17
GLIBC_2.18
GLIBCXX_DEBUG_MESSAGE_LENGTH

安装:

cp usr/lib/aarch64-linux-gnu/libstdc++.so.6.0.22 /usr/lib64
cd /usr/lib64
rm -rf libstdc++.so.6
ln -s libstdc++.so.6.0.22 libstdc++.so.6
#检查连接是否生效
strings /usr/lib64/libstdc++.so.6 | grep GLIBC

到这里libstdc++就升级完成了;接下来赶紧起了一个Local模式的spark-shell进行读写测试(因为现在只升级了这一台机器,集群别的机器还没修过,只能在当前机器测试)。测试结果成功了,完美解决。


参考文章:

https://www.cnblogs.com/yipiaoqingshui/p/12838124.html

https://bbs.huaweicloud.com/forum/thread-46787-1-1.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值