转发请注明出处:https://blog.csdn.net/LvShuiLanTian/article/details/89308240
- 读取普通parquet格式文件
ParquetReader.builder(new GroupReadSupport(), new Path("xxxx")).build()
- 读取parquet存储格式的thrift对象
def createReader[V <: TBase[_,_]](path: String, vClass: Class[V]): ParquetReader[V] = {
ThriftParquetReader.build[V](new Path(path)).withThriftClass(vClass).build()
}
- spark读取parquet存储格式的thrift对象
def createRDD[V <: TBase[_,_]](spark: SparkContext, path: String, vClass: Class[V]): RDD[(Void, V)] = {
spark.newAPIHadoopFile(path, classOf[ParquetThriftInputFormat[V]], classOf[Void], vClass)
}
转发请注明出处:https://blog.csdn.net/LvShuiLanTian/article/details/89308240