Spark读ClickHouse——dbtable sql

最新推荐文章于 2024-09-18 14:40:03 发布

团子Yui

最新推荐文章于 2024-09-18 14:40:03 发布

阅读量1.5k

点赞数 2

分类专栏： Spark ClickHouse 文章标签： spark sql scala

本文链接：https://blog.csdn.net/qq_39182815/article/details/120440150

版权

Spark 同时被 2 个专栏收录

7 篇文章 1 订阅

订阅专栏

ClickHouse

6 篇文章 0 订阅

订阅专栏

spark读取clickhouse数据时存在着诸多限制
如:

不支持array数组类型的读取
clickhouse存在着需要用final修饰的表
clickhouse按照分区过滤
……

在不造轮子的情况下，可以用spark jdbc的一些骚操作实现clickhouse sql

以下是sparksql jdbc获取数据结构的源码

  /**
   * Get the SQL query that should be used to find if the given table exists. Dialects can
   * override this method to return a query that works best in a particular database.
   * @param table  The name of the table.
   * @return The SQL query to use for checking the table.
   */
  def getTableExistsQuery(table: String): String = {
    s"SELECT * FROM $table WHERE 1=0"
  }

  /**
   * The SQL query that should be used to discover the schema of a table. It only needs to
   * ensure that the result set has the same schema as the table, such as by calling
   * "SELECT * ...". Dialects can override this method to return a query that works best in a
   * particular database.
   * @param table The name of the table.
   * @return The SQL query to use for discovering the schema.
   */
  @Since("2.1.0")
  def getSchemaQuery(table: String): String = {
    s"SELECT * FROM $table WHERE 1=0"
  }

spark jdbc采用sql获取数据结构，而table取值自option内的dbtable,因此我们通过修改dbtable别名的方式可以实现spark操作clickhouse sql

示例如下:

spark.read
.format("jdbc")
.option("driver","ru.yandex.clickhouse.ClickHouseDriver")
.option("url", "jdbc:clickhouse://test:8123/")
.options(Map("user"->"test","password"->"test"))
.option("dbtable", "(select name from dw.test fianl)a")
.load()
.show()

spark sql获取表结构的sql为

SELECT * FROM (select name from dw.test fianl)a

团子Yui

关注

2
点赞
踩
3

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录