依赖
<dependency>
<groupId>ru.yandex.clickhouse</groupId>
<artifactId>clickhouse-jdbc</artifactId>
<version>0.3.1</version>
</dependency>
预过滤加载
val tableName = s"(SELECT CAST(longitude AS DOUBLE) longitude , CAST(latitude AS DOUBLE) latitude FROM location_log WHERE acquisition_time BETWEEN '$beginTime' and '$endTime') tempTable"
val location: DataFrame = spark.read
.format("jdbc")
.option("url", "jdbc:clickhouse://xxx.xxx.xxx.xxx:8123")
.option("fetchsize", "500000")
.option("driver", "ru.yandex.clickhouse.ClickHouseDriver")
.option("user", "default")
.option("password", "default")
.option("dbtable", tableName)
.load()
全表加载
val prop = new Properties
prop.setProperty("user", "default")
prop.setProperty("password", "default")
prop.setProperty("driver", "ru.yandex.clickhouse.ClickHouseDriver")
val location = spark
.read
.jdbc("jdbc:clickhouse://xxx.xx.xx.xxx:8123", "location_log", prop)
.where("acquisition_time >= $beginTime AND acquisition_time <= $endTime ")
推荐使用第一种,预过滤减少数据量。