最近折腾sparksql,正好有需求,需要读取MongoDB的数据,在网上查找后,能顺利用sparksql读取MongoDB的数据.记录下
- 添加依赖
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.1.3</version>
</dependency>
<dependency>
<groupId>org.mongodb.spark</groupId>
<artifactId>mongo-spark-connector_2.11</artifactId>
<version>2.1.3</version>
</dependency>
</dependencies>
- 代码,这里只写了demo,能用spark顺利读取MongoDB的数据
package com.demo
import com.mongodb.spark.MongoSpark
import org.apache.spark.SparkConf
import org.apache.spark.sql.{DataFrame, SparkSession}
object Test {
def main(args: Array[String]): Unit = {
val sparkConf: SparkConf = new SparkConf()
val uri: String = "mongodb://username:password@mongodb-host:port/database.table"
sparkConf.setMaster("local[*]").setAppName(this.getClass.getName)
sparkConf.set("spark.mongodb.input.uri",uri)
val sparkSession: SparkSession = SparkSession.builder().config(sparkConf).getOrCreate()
sparkSession.sparkContext.setLogLevel("warn")
val tb_test: DataFrame = MongoSpark.load(sparkSession)
tb_test.show()
sparkSession.close()
}
}
在自己测试的时候将MongoDB的uri换成自己的uri就ok了.
顺利读取成功.
参考:MongoDB on SparkSql的读取和写入操作(Scala版本)
MongoDB官方文档