I'm trying the mongodb hadoop integration with spark but can't figure out how to make the jars accessible to an IPython notebook.
Here what I'm trying to do:
# set up parameters for reading from MongoDB via Hadoop input format
config = {"mongo.input.uri": "mongodb://localhost:27017/db.collection"}
inputFormatClassName = "com.mongodb.hadoop.MongoInputFormat"
# these values worked but others might as well
keyClassName = "org.apache.hadoop.io.Text"
valueClassName = "org.apache.hadoop.io.MapWritable"
# Do some reading from mongo
items = sc.newAPIHadoopRDD(inputFormatClassName, keyClassName, valueClassName, None, None, config)
This code works fine when I launch it in pyspark using the following command:
spark-1.4.1/bin/pyspark --jars 'mong