sparkContextManager.setSparkConf("spark.serializer", "org.apache.spark.serializer.KryoSerializer"); sparkContextManager.setSparkConf("io.compression.codecs", "org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,com.hadoop.compression.lzo.LzopCodec"); sparkContextManager.setSparkConf("io.compression.codec.lzo.class", "com.hadoop.compression.lzo.LzoCodec"); sparkContextManager.setSparkConf("lzo.text.input.format.ignore.nonlzo","false");
然后再将路径写入用textfile()去读就行了
JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
JavaRDD<String> strRdd = jsc.textFile(filePath,100);
不需要再用
sc.newAPIHadoopFile(this.inputPath, LzoTextInputFormat.class, LongWritable.class, Text.class, hadoopConfiguration);
这样去读了。