Apache Calcite——新增动态UDF支持

最新推荐文章于 2024-09-09 15:44:12 发布

NobiGo

最新推荐文章于 2024-09-09 15:44:12 发布

阅读量2.4k

点赞数 1

分类专栏： Apache Calcite 文章标签：大数据

本文链接：https://blog.csdn.net/it_dx/article/details/117948590

版权

Apache Calcite 专栏收录该内容

12 篇文章 8 订阅

订阅专栏

本文详细介绍了Apache Calcite中动态注册用户自定义函数(UDF)的两种方式：动态支持和静态支持。通过示例代码展示了如何在特定Schema或全局范围内注册UDF，并提供了UDF的实现示例。同时，文章还强调了在JDBC连接中大小写敏感性的问题以及验证UDF时的函数查找流程。最后，提到了配置连接属性来控制大小写敏感性以避免函数名匹配错误。

摘要由CSDN通过智能技术生成

简介

通过Apache Calcite支持UDF有两种方式：

动态支持：直接通过执行时候将UDF的函数名以及类名传进去，这种方式解析的时候可以在schema中获取到UDF的信息，进而通过Sql校验（这次主要描述这种用法的使用过程以及注意事项）。
静态支持：类似Sql中已有的sum、abs等内置函数，通过侵入式的方式将函数添加到其中。

使用过程

这里以JdbcExample为准，介绍整体的使用方式。

加在特定的Schema中

为特定的Schema添加UDF支持，调用时统一使用

"schemaName".udfName(parameter...)

代码实例

Class.forName("org.apache.calcite.jdbc.Driver");
Connection connection =DriverManager.getConnection("jdbc:calcite:");
// 这里是扩展Connection的一种方式，值得学习，能够突破JDBC的传统接口的限制
CalciteConnection calciteConnection = connection.unwrap(CalciteConnection.class);
SchemaPlus rootSchema = calciteConnection.getRootSchema();
rootSchema.add("hr", new ReflectiveSchema(new Hr()));
// 这里将UDF注册到Schema Footmart 中
SchemaPlus schemaPlus = rootSchema.add("foodmart", new ReflectiveSchema(new Foodmart()));
schemaPlus.add("DATETOLOCALDATE", ScalarFunctionImpl.create(Udf.class, "dateToLocalDate"));

加在整体的Schema中

整体的Schema添加UDF后，调用时可以直接调用

udfName(parameter...)

代码实例

Class.forName("org.apache.calcite.jdbc.Driver");
Connection connection =DriverManager.getConnection("jdbc:calcite:");
// 这里是扩展Connection的一种方式，值得学习，能够突破JDBC的传统接口的限制
CalciteConnection calciteConnection = connection.unwrap(CalciteConnection.class);
SchemaPlus rootSchema = calciteConnection.getRootSchema();
rootSchema.add("hr", new ReflectiveSchema(new Hr()));
SchemaPlus schemaPlus = rootSchema.add("foodmart", new ReflectiveSchema(new Foodmart()));
// 这里将UDF注册到所有Schema 中
rootSchema.add("DATETOLOCALDATE", ScalarFunctionImpl.create(Udf.class, "dateToLocalDate"));

整体Schema执行Sql

"select \"DATETOLOCALDATE\"(121)"

附加UDF代码.内置函数的实现

public class Udf {
  public static LocalDate dateToLocalDate(int date) {
    int y0 = (int) DateTimeUtils.unixDateExtract(TimeUnitRange.YEAR, date);
    int m0 = (int) DateTimeUtils.unixDateExtract(TimeUnitRange.MONTH, date);
    int d0 = (int) DateTimeUtils.unixDateExtract(TimeUnitRange.DAY, date);
    return LocalDate.of(y0, m0, d0);
  }
}

注意事项

在默认的JdbcExample中如果是小写的Schema与Udf名，需要将名字用“”引用起来，否则会被转化为大写，导致函数名匹配不到，或者通过properties设置连接属性：CalciteConnectionProperty所有支持的属性都在这个类中，将connection设置为大小写不敏感的或者设置QUOTED相关属性不去将关键字转化为特定大小写。

原理分析

当执行UDF注册，其实是通过CalciteSchema将相应的UDF属性保存在Schema的functionNames以及functionMap中。

  private FunctionEntry add(String name, Function function) {
    final FunctionEntryImpl entry =
        new FunctionEntryImpl(this, name, function);
    functionMap.put(name, entry);
    functionNames.add(name);
    // 这个是无参构造函数
    if (function.getParameters().isEmpty()) {
      nullaryFunctionMap.put(name, entry);
    }
    return entry;
  }

具体的Sql通过Statement执行，最终调用Connection的方法，这里不做详述，后续有机会另写一遍记录一下。这里具体看Sql验证流程，因为Connecion连接的时候参数配置可能导致找不到特定方法。提示如：

Caused by: org.apache.calcite.runtime.CalciteContextException: From line 1, column 8 to line 1, column 29: No match found for function signature datetolocaldate(<NUMERIC>)

具体看一下如何找到相应的UDF定义并通过验证的，Calcite是通过CalciteCatalogReader提供元数据信息进行Sql验证的：

这个CalciteCatalogReader在初始化时指定了这个是否是大小写敏感的,看一下构造方法：
在这里插入图片描述
查看config.caseSensitive的实现，可以发现、

  @Override public <T> @PolyNull T parserFactory(Class<T> parserFactoryClass,
      @PolyNull T defaultParserFactory) {
    return CalciteConnectionProperty.PARSER_FACTORY.wrap(properties)
        .getPlugin(parserFactoryClass, defaultParserFactory);
  }

上述的Properties就是建立connection传递进来的参数，因此可以通过上述通过properties的形式，将相应的参数传递进来，例如：

    Class.forName("org.apache.calcite.jdbc.Driver");
    Properties properties = new Properties();
    // 传递这个conntion是否是大小写敏感的
    properties.setProperty("caseSensitive","false");
    Connection connection =
        DriverManager.getConnection("jdbc:calcite:",properties);

接下来继续看是如何做函数名校验的。

在这里插入图片描述
这里可以看到在做函数验证时，从两个地方进行验证，第一个是Calcite的内置函数（静态函数），第二个是Schema的元数据（也就是通过动态注册的UDF）。这里我们主要看一下CalciteCatalogReader的lookupOperatorOverloads实现,主要的：

// 主要通过getFunctionsFrom获取元数据
getFunctionsFrom(opName.names)
        .stream()
        .filter(predicate)
        .map(function -> toOp(opName, function))
        .forEachOrdered(operatorList::add);

具体的实现如下：

private Collection<org.apache.calcite.schema.Function> getFunctionsFrom(
      List<String> names) {
      // 用来保存结果集
    final List<org.apache.calcite.schema.Function> functions2 =
        new ArrayList<>();
    final List<List<String>> schemaNameList = new ArrayList<>();
    // function调用是否前面使用了schemaName
    if (names.size() > 1) {
      // Name qualified: ignore path. But we do look in "/catalog" and "/",
      // the last 2 items in the path.
      if (schemaPaths.size() > 1) {
        schemaNameList.addAll(Util.skip(schemaPaths));
      } else {
        schemaNameList.addAll(schemaPaths);
      }
    } else {
      for (List<String> schemaPath : schemaPaths) {
        CalciteSchema schema =
            SqlValidatorUtil.getSchema(rootSchema, schemaPath, nameMatcher);
        if (schema != null) {
          schemaNameList.addAll(schema.getPath());
        }
      }
    }
    // schemaNameList为所属的Schema
    for (List<String> schemaNames : schemaNameList) {
      CalciteSchema schema =
          SqlValidatorUtil.getSchema(rootSchema,
              Iterables.concat(schemaNames, Util.skipLast(names)), nameMatcher);
      // 从schema中获取所属的函数
      if (schema != null) {
        final String name = Util.last(names);
        boolean caseSensitive = nameMatcher.isCaseSensitive();
        functions2.addAll(schema.getFunctions(name, caseSensitive));
      }
    }
    // 如果functions2 size为0.则表示没有获取到相应的方法，验证失败。
    return functions2;
  }