Spark读ClickHouse——dbtable sql

spark读取clickhouse数据时存在着诸多限制
如:

  • 不支持array数组类型的读取
  • clickhouse存在着需要用final修饰的表
  • clickhouse按照分区过滤
  • ……

在不造轮子的情况下,可以用spark jdbc的一些骚操作实现clickhouse sql

以下是sparksql jdbc获取数据结构的源码

  /**
   * Get the SQL query that should be used to find if the given table exists. Dialects can
   * override this method to return a query that works best in a particular database.
   * @param table  The name of the table.
   * @return The SQL query to use for checking the table.
   */
  def getTableExistsQuery(table: String): String = {
    s"SELECT * FROM $table WHERE 1=0"
  }

  /**
   * The SQL query that should be used to discover the schema of a table. It only needs to
   * ensure that the result set has the same schema as the table, such as by calling
   * "SELECT * ...". Dialects can override this method to return a query that works best in a
   * particular database.
   * @param table The name of the table.
   * @return The SQL query to use for discovering the schema.
   */
  @Since("2.1.0")
  def getSchemaQuery(table: String): String = {
    s"SELECT * FROM $table WHERE 1=0"
  }

spark jdbc采用sql获取数据结构,而table取值自option内的dbtable,因此我们通过修改dbtable别名的方式可以实现spark操作clickhouse sql

示例如下:

spark.read
.format("jdbc")
.option("driver","ru.yandex.clickhouse.ClickHouseDriver")
.option("url", "jdbc:clickhouse://test:8123/")
.options(Map("user"->"test","password"->"test"))
.option("dbtable", "(select name from dw.test fianl)a")
.load()
.show()

spark sql获取表结构的sql为

SELECT * FROM (select name from dw.test fianl)a
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值