写入hdfs是没有问题,但是读取的时候会报这个错
Caused by: org.apache.parquet.io.ParquetDecodingException: Can't read value in column [result, label_id] INT64 at value 2678 out of 2678, 2678 out of 2678 in currentPage. repetition level: 1, definition level: 1
和这个错
java.lang.IllegalArgumentException: Reading past RLE/BitPacking stream.
解决办法:写入的时候,加个配置就可以了
conf.set("spark.sql.parquet.writeLegacyFormat", "true");
如果设置为true,Spark或java将使用与Hive相同的约定来编写Parquet数据。