Can somebody share example of reading avro using java in spark?
Found scala examples but no luck with java.
Here is the code snippet which is part of code but running into compilation issues with the method ctx.newAPIHadoopFile.
JavaSparkContext ctx = new JavaSparkContext(sparkConf);
Configuration hadoopConf = new Configuration();
JavaRDD lines = ctx.newAPIHadoopFile(path, AvroInputFormat.class, AvroKey.class, NullWritable.class, new Configuration());
Regards
解决方案
You can use the spark-avro connector library by Databricks.
The recommended way to read or write Avro data from Spark SQL is by using Spark's DataFrame APIs.
The connector enables both reading and writing Avro data from Spark SQL:
import org.apache.spark.sql.*;
SQLContext sqlContext = new SQLContext(sc);
// Creates a DataFrame from a specified file
DataFrame df = sqlContext.read().format("com.databricks.spark.avro")
.load("src/test/resources/episodes.avro");
// Saves the subset of the Avro records read in
df.filter($"age > 5").write()
.format("com.databricks.spark.avro")
.save("/tmp/output");
Note that this connector has different versions for Spark 1.2, 1.3, and 1.4+:
Spark verconnector
1.2
0.2.0
1.3
1.0.0
1.4+
2.0.1
Using Maven:
com.databricks
spark-avro_2.10
{AVRO_CONNECTOR_VERSION}