Flink SQL 功能介绍


Flink Version : 1.8.2
Kafka Version : 0.10
Scala Version : 2.12

SQL Job Support

  • 注意事项(请务必仔细阅读此注意事项)
  • 单个CSV测试文件不得超过30行
  • 对于聚合和去重操作,本地CSV调试时的Sink端可以支持,但提交YARN运行时Kafka的Sink端并不支持(KafkaSink属于Append)。
  • 使用CSV文件测试时可使用From Field时间属性(ROW_TIME)和PROC_TIME
  • 如果Source端数据的格式为CSV,请按照顺序定义其字段名及类型。
    • 字段定义:
      fieldNames:userId、itemId、categoryId、behavior、timestemps
      fieldTypes:bigint、bigint、bigint、varchar、bigint
    • 数据样版:
      543462,1715,1464116,pv,1511658000
  • 如Source端数据的格式为JSON,定义字段名时请与JSON数据的属性名相对应(顺序可忽略)。否则无法对应的字段将显示为空或null。
    • 字段定义:
      fieldNames:userId、itemId、categoryId、timestemps、behavior
      fieldTypes:bigint、bigint、bigint、bigint、varchar
    • 数据样版:
      {“userId”:”148725”, “itemId”:”1798692”, “categoryId”:”737184”, “timestemps”:”1511667184”, “behavior”:”pv”]
  • 时间特性为ROW_TIME时source端不能使用"rowtime"命名字段(后台会根据时间戳指端自动生成),同理时间特性为PROC_TIME时不可使用proctime。否则会抛出ValidationException:Duplicate field name rowtime
  • 表名、字段名区分大小写
  • 查询(Select)出来的字段在数量和顺序上需要和Sink端的表相对应
    • 例如:
      SQL:Insert into tableName2 select id, age, name from tableName1;
      Sink端Table注册时的字段顺序:{“id”:”bigint”,”age”:”int”,”name”:”varchar”}
  • 如果KafkaSink端的数据格式为JSON,请使用cast(rowtime as varchar) as rowtime或其它方法进行类型转换,否则进入Sink端的数据依然是bigint型数据。
  • 使用From Source或者使用ProcessingTime(处理时间)后,rowTime字段会出现8小时的时间差,此时可以调用TO_LOCAL_DATA / TO_LOCAL_TIMESTAMP(函数)来处理。
  • 表名的组成包括支持数字、字母及下划线。
  • udtf和聚合窗口搭配时下游没有输出,暂时没有找到正确的写法。
  • 时间字段的源字段必须是Long、Timestamp或String类型。
  • 在时间窗口内使用的时间字段必须是程序自动生成的rowtime或proctime字段。
  • 目前使用ES Sink端时group 的字段需显式指定在select后面,源码中会去匹配。
  • 不支持维表JOIN。

Processing Mode :

  • Streaming

Flink Sources :

  • Kafka
  • Csv(本地测试)

Flink Sinks :

  • Kafka(AppendMode)
  • Elasticsearcha(AppendMode、UpsertMode)

本地调试与提交部署的区别

  • 在时间属性上,本地调试时本地调试时支持PROC_TIME、ROW_TIME,,但不包括KAFKA_TIME(数据进入kafka的时间,属于ROW_TIME范畴)。
  • 本地调试时只支持CSV格式的数据文件作为数据源,KAFKA可以支持CSV及JSON。
  • 本地调试时同时支持了AppendMode和UpsertMode两种模式,所以要注意sql语法属于哪一种模式。即使sql语句在调试时通过,提交部署后也可能会出错。
    • 例如GroupBy、 Having、Distinct、UDAF这几个操作都属于属于UpsertMode。
    • UpsertMode操作在单独使用的情况下,Sink端只能是Elasticsearch(目前),但搭配TUMBLE、HOP、SESSION这些窗口函数使用时可以支持AppendMode。

支持的数据格式

CSV、JSON(不包括本地调试)

受支持的Time Attributes(时间属性)

  • rocessing Time : 处理时间
  • Event Time : 数据本身自带的时间
    • From Field : 数据中自带的时间
    • From Source : 数据进入source端时的时间(目前只支持Kafka)

WaterMark

为rowtime属性设置一个内置的WaterMark,数据在一定的时间间隔内是有可能是无序的。这样会导致部分流数据丢失,可以使用水位来设置数据的最大延迟时间来尽可能的防止数据丢失(单位/毫秒)

窗口函数时间单位

YEAR

MONTH

DAY

HOUR

MINUTE

SECOND

目前支持的数据类型

字符型:varchar、boolean(布尔)、tinyint(字节)

数值型:bigint、int、float、smallint 、decimal

日期型:date、time、timestamp

Field Type Mapping

SQLvarcharbooleantinyintsmallintintbigintfloatdoubledecimaldatetimetimestamp
Javalang.Stringlang.Booleanlang.Bytelang.Shortlang.Integerlang.Longlang.Floatlang.Doublemath.BigDecimalsql.Datesql.Timesql.Timestamp

目前不支持的SQL操作(以官网为参考)

  • 流表查询单独使用Order By时必须按照升序时间属性排序;
  • Rowtime属性不能在常规联接的输入行中(Rowtime attributes must not be in the input rows of a regular join. As a workaround you can cast the time attributes of input tables to TIMESTAMP before.);
-Opertation
OrderBy & LimitLimit
Set OperationsIn、Union、Intersect、Except、Exists(不存在)
AggregationsGrouping Sets
JoinsFull Outer、Right、Left、cross join unnest、Time-windowed Join

Reserved Keywords

虽然还没有实现所有的SQL特性,但是一些字符串组合已经被保留为将来使用的关键字。如果想使用下列字符串中的一个作为字段名,确保使用反引号(例如 value, count)

A, ABS, ABSOLUTE, ACTION, ADA, ADD, ADMIN, AFTER, ALL, ALLOCATE, ALLOW, ALTER, ALWAYS, AND, ANY, ARE, ARRAY, AS, ASC, ASENSITIVE, ASSERTION, ASSIGNMENT, ASYMMETRIC, AT, ATOMIC, ATTRIBUTE, ATTRIBUTES, AUTHORIZATION, AVG, BEFORE, BEGIN, BERNOULLI, BETWEEN, BIGINT, BINARY, BIT, BLOB, BOOLEAN, BOTH, BREADTH, BY, C, CALL, CALLED, CARDINALITY, CASCADE, CASCADED, CASE, CAST, CATALOG, CATALOG_NAME, CEIL, CEILING, CENTURY, CHAIN, CHAR, CHARACTER, CHARACTERISTICS, CHARACTERS, CHARACTER_LENGTH, CHARACTER_SET_CATALOG, CHARACTER_SET_NAME, CHARACTER_SET_SCHEMA, CHAR_LENGTH, CHECK, CLASS_ORIGIN, CLOB, CLOSE, COALESCE, COBOL, COLLATE, COLLATION, COLLATION_CATALOG, COLLATION_NAME, COLLATION_SCHEMA, COLLECT, COLUMN, COLUMN_NAME, COMMAND_FUNCTION, COMMAND_FUNCTION_CODE, COMMIT, COMMITTED, CONDITION, CONDITION_NUMBER, CONNECT, CONNECTION, CONNECTION_NAME, CONSTRAINT, CONSTRAINTS, CONSTRAINT_CATALOG, CONSTRAINT_NAME, CONSTRAINT_SCHEMA, CONSTRUCTOR, CONTAINS, CONTINUE, CONVERT, CORR, CORRESPONDING, COUNT, COVAR_POP, COVAR_SAMP, CREATE, CROSS, CUBE, CUME_DIST, CURRENT, CURRENT_CATALOG, CURRENT_DATE, CURRENT_DEFAULT_TRANSFORM_GROUP, CURRENT_PATH, CURRENT_ROLE, CURRENT_SCHEMA, CURRENT_TIME, CURRENT_TIMESTAMP, CURRENT_TRANSFORM_GROUP_FOR_TYPE, CURRENT_USER, CURSOR, CURSOR_NAME, CYCLE, DATA, DATABASE, DATE, DATETIME_INTERVAL_CODE, DATETIME_INTERVAL_PRECISION, DAY, DEALLOCATE, DEC, DECADE, DECIMAL, DECLARE, DEFAULT, DEFAULTS, DEFERRABLE, DEFERRED, DEFINED, DEFINER, DEGREE, DELETE, DENSE_RANK, DEPTH, DEREF, DERIVED, DESC, DESCRIBE, DESCRIPTION, DESCRIPTOR, DETERMINISTIC, DIAGNOSTICS, DISALLOW, DISCONNECT, DISPATCH, DISTINCT, DOMAIN, DOUBLE, DOW, DOY, DROP, DYNAMIC, DYNAMIC_FUNCTION, DYNAMIC_FUNCTION_CODE, EACH, ELEMENT, ELSE, END, END-EXEC, EPOCH, EQUALS, ESCAPE, EVERY, EXCEPT, EXCEPTION, EXCLUDE, EXCLUDING, EXEC, EXECUTE, EXISTS, EXP, EXPLAIN, EXTEND, EXTERNAL, EXTRACT, FALSE, FETCH, FILTER, FINAL, FIRST, FIRST_VALUE, FLOAT, FLOOR, FOLLOWING, FOR, FOREIGN, FORTRAN, FOUND, FRAC_SECOND, FREE, FROM, FULL, FUNCTION, FUSION, G, GENERAL, GENERATED, GET, GLOBAL, GO, GOTO, GRANT, GRANTED, GROUP, GROUPING, HAVING, HIERARCHY, HOLD, HOUR, IDENTITY, IMMEDIATE, IMPLEMENTATION, IMPORT, IN, INCLUDING, INCREMENT, INDICATOR, INITIALLY, INNER, INOUT, INPUT, INSENSITIVE, INSERT, INSTANCE, INSTANTIABLE, INT, INTEGER, INTERSECT, INTERSECTION, INTERVAL, INTO, INVOKER, IS, ISOLATION, JAVA, JOIN, K, KEY, KEY_MEMBER, KEY_TYPE, LABEL, LANGUAGE, LARGE, LAST, LAST_VALUE, LATERAL, LEADING, LEFT, LENGTH, LEVEL, LIBRARY, LIKE, LIMIT, LN, LOCAL, LOCALTIME, LOCALTIMESTAMP, LOCATOR, LOWER, M, MAP, MATCH, MATCHED, MAX, MAXVALUE, MEMBER, MERGE, MESSAGE_LENGTH, MESSAGE_OCTET_LENGTH, MESSAGE_TEXT, METHOD, MICROSECOND, MILLENNIUM, MIN, MINUTE, MINVALUE, MOD, MODIFIES, MODULE, MONTH, MORE, MULTISET, MUMPS, NAME, NAMES, NATIONAL, NATURAL, NCHAR, NCLOB, NESTING, NEW, NEXT, NO, NONE, NORMALIZE, NORMALIZED, NOT, NULL, NULLABLE, NULLIF, NULLS, NUMBER, NUMERIC, OBJECT, OCTETS, OCTET_LENGTH, OF, OFFSET, OLD, ON, ONLY, OPEN, OPTION, OPTIONS, OR, ORDER, ORDERING, ORDINALITY, OTHERS, OUT, OUTER, OUTPUT, OVER, OVERLAPS, OVERLAY, OVERRIDING, PAD, PARAMETER, PARAMETER_MODE, PARAMETER_NAME, PARAMETER_ORDINAL_POSITION, PARAMETER_SPECIFIC_CATALOG, PARAMETER_SPECIFIC_NAME, PARAMETER_SPECIFIC_SCHEMA, PARTIAL, PARTITION, PASCAL, PASSTHROUGH, PATH, PERCENTILE_CONT, PERCENTILE_DISC, PERCENT_RANK, PLACING, PLAN, PLI, POSITION, POWER, PRECEDING, PRECISION, PREPARE, PRESERVE, PRIMARY, PRIOR, PRIVILEGES, PROCEDURE, PUBLIC, QUARTER, RANGE, RANK, READ, READS, REAL, RECURSIVE, REF, REFERENCES, REFERENCING, REGR_AVGX, REGR_AVGY, REGR_COUNT, REGR_INTERCEPT, REGR_R2, REGR_SLOPE, REGR_SXX, REGR_SXY, REGR_SYY, RELATIVE, RELEASE, REPEATABLE, RESET, RESTART, RESTRICT, RESULT, RETURN, RETURNED_CARDINALITY, RETURNED_LENGTH, RETURNED_OCTET_LENGTH, RETURNED_SQLSTATE, RETURNS, REVOKE, RIGHT, ROLE, ROLLBACK, ROLLUP, ROUTINE, ROUTINE_CATALOG, ROUTINE_NAME, ROUTINE_SCHEMA, ROW, ROWS, ROW_COUNT, ROW_NUMBER, SAVEPOINT, SCALE, SCHEMA, SCHEMA_NAME, SCOPE, SCOPE_CATALOGS, SCOPE_NAME, SCOPE_SCHEMA, SCROLL, SEARCH, SECOND, SECTION, SECURITY, SELECT, SELF, SENSITIVE, SEQUENCE, SERIALIZABLE, SERVER, SERVER_NAME, SESSION, SESSION_USER, SET, SETS, SIMILAR, SIMPLE, SIZE, SMALLINT, SOME, SOURCE, SPACE, SPECIFIC, SPECIFICTYPE, SPECIFIC_NAME, SQL, SQLEXCEPTION, SQLSTATE, SQLWARNING, SQL_TSI_DAY, SQL_TSI_FRAC_SECOND, SQL_TSI_HOUR, SQL_TSI_MICROSECOND, SQL_TSI_MINUTE, SQL_TSI_MONTH, SQL_TSI_QUARTER, SQL_TSI_SECOND, SQL_TSI_WEEK, SQL_TSI_YEAR, SQRT, START, STATE, STATEMENT, STATIC, STDDEV_POP, STDDEV_SAMP, STREAM, STRUCTURE, STYLE, SUBCLASS_ORIGIN, SUBMULTISET, SUBSTITUTE, SUBSTRING, SUM, SYMMETRIC, SYSTEM, SYSTEM_USER, TABLE, TABLESAMPLE, TABLE_NAME, TEMPORARY, THEN, TIES, TIME, TIMESTAMP, TIMESTAMPADD, TIMESTAMPDIFF, TIMEZONE_HOUR, TIMEZONE_MINUTE, TINYINT, TO, TOP_LEVEL_COUNT, TRAILING, TRANSACTION, TRANSACTIONS_ACTIVE, TRANSACTIONS_COMMITTED, TRANSACTIONS_ROLLED_BACK, TRANSFORM, TRANSFORMS, TRANSLATE, TRANSLATION, TREAT, TRIGGER, TRIGGER_CATALOG, TRIGGER_NAME, TRIGGER_SCHEMA, TRIM, TRUE, TYPE, UESCAPE, UNBOUNDED, UNCOMMITTED, UNDER, UNION, UNIQUE, UNKNOWN, UNNAMED, UNNEST, UPDATE, UPPER, UPSERT, USAGE, USER, USER_DEFINED_TYPE_CATALOG, USER_DEFINED_TYPE_CODE, USER_DEFINED_TYPE_NAME, USER_DEFINED_TYPE_SCHEMA, USING, VALUE, VALUES, VARBINARY, VARCHAR, VARYING, VAR_POP, VAR_SAMP, VERSION, VIEW, WEEK, WHEN, WHENEVER, WHERE, WIDTH_BUCKET, WINDOW, WITH, WITHIN, WITHOUT, WORK, WRAPPER, WRITE, XML, YEAR, ZONE
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值