最近碰到一个很奇葩的问题,RDD成功产生(rdd.take(100).mkstring("\n")成功打出),但是dump到hdfs失败,报如下的错误
org.apache.parquet.io.InvalidRecordException: could not get child 3 from [GroupColumnIO user r:0 d:1 [user], GroupColumnIO items r:0 d:1 [items], PrimitiveColumnIO expId r:0 d:1 [expId]]
at org.apache.parquet.io.GroupColumnIO.getChild(GroupColumnIO.java:114)
at org.apache.parquet.thrift.ParquetWriteProtocol$StructWriteProtocol.(ParquetWriteProtocol.java:322)
at org.apache.parquet.thrift.ParquetWriteProtocol$MessageWriteProtocol.(ParquetWriteProtocol.java:397)
at org.apache.parquet.thrift.ParquetWriteProtocol.(ParquetWriteProtocol.java:431)
at org.apache.parquet.hadoop.thrift.AbstractThriftWriteSupport.prepareForWrite(AbstractThriftWriteSupport.java:121)
at org.apache.parquet.hadoop.t
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
at java.util.ArrayList.rangeCheck(ArrayList.java:657)
at java.util.ArrayList.get(ArrayList.java:433)
at org.apache.parquet.io.GroupColumnIO.getChild(GroupColumnIO.java:112)
... 19 more
RDD的类型定义定义
struct MCRecommendSamplesBatch {
1: optional MCRankerUserInfo.MCRankerUserInfo user;
2: optional MCRankerUserInfo.RankerContextInfo context; // 暂时没用到
3: optional list<RecommendImpression> items;
4: optional string expId;
}
看错误的显示就是没找到context,这个太扯淡了,搞了半天发现是 RankerContextInfo 定义的结构体为空导致的,加了个字段就没事了。
struct RankerContextInfo { // 原来因为没用到context特征,啥也没定义,导致报错
}
struct RankerContextInfo { // 改成这样就可以了
1:optional string debug;
}