Elasticsearch在Basic授权中支持以SQL语句的形式检索文档,SQL语句在执行时会被翻译为DSL执行。从语法的角度来看,Elastisearch中的SQL语句与RDBMS中的SQL语句基本一致,所以对于有数据库编程基础的人来说大大降低了使用 Elaticsearch的学习成本。
Elasticsearch提供了多种执行SQL语句的方法,可使用类似_search样的REST接口执行也可以通过命令行执行。它甚至还提供了JDBC和ODBC驱动来执行SQL语句,但JDBC和ODBC属于Platinum(白金版)授权需要付费,所以这里只介绍_sql接口。
sql接口
在早期版本中,Elasticsearch执行SQL的REST接口为_xpack/sql,但在版本7以后这个接口已经被废止而推荐使用_sql接口。
例如:
POST _sql?format=txt
{
"query": """
select DestCountry, OriginCountry,AvgTicketPrice
from kibana_sample_data_flights
where Carrier = 'Kibana Airlines' order by AvgTicketPrice desc limit 3
"""
}
输出结果:
DestCountry | OriginCountry | AvgTicketPrice
---------------+---------------+------------------
US |PR |1199.109130859375
JP |IN |1196.7706298828125
AR |CO |1195.72509765625
在示例中,_sql接口通过query参数接收SQL语句,而SQL语句也包含有select、from、where、order by、limit等子句。_sql接口的URL请求参数format定义了返回结果格式。比如在示例中定义了返回结果格式为txt。除了txt以外,_sqI接口还支持csv、json、tsv、yaml等等格式。示例中的请求会将所有航空公同为Kibana Airines的航班文档检索出来,并以文本表格的形式返回。
对于总量比较大的SOL查询,sql接口还支持以游标的形式实现分页。当_sql接口的请求参数中添加了fetch_size参数,_sql接口在返回结来时就会根据fetch_size参数设置的大小返回相应的条数,并在返回结果中添加游标标识。具体来说,当请求_sql接口时设置的forma为json时,返回结果中会包含cursor属性;而其他情况下则会在响应中添加Cursor报头。
例如还是执行示例中的SOL ,但是加入分页支持:
POST _sql?format=json
{
"query": """
select DestCountry, OriginCountry,AvgTicketPrice
from kibana_sample_data_flights
where Carrier = 'Kibana Airlines' order by AvgTicketPrice desc
""",
"fetch_size": 3
}
输出结果:
{
"columns" : [
{
"name" : "DestCountry",
"type" : "keyword"
},
{
"name" : "OriginCountry",
"type" : "keyword"
},
{
"name" : "AvgTicketPrice",
"type" : "float"
}
],
"rows" : [
[
"US",
"PR",
1199.109130859375
],
[
"JP",
"IN",
1196.7706298828125
],
[
"AR",
"CO",
1195.72509765625
]
],
"cursor" : "49itAwFaAXNARFhGMVpYSjVRVzVrUm1WMFkyZ0JBQUFBQUFBUl9xNFdlbFYzTWxWaGFXaFRlV1Z6UlhONGFtNXhOVzlWVVE9Pf8PAwFmC0Rlc3RDb3VudHJ5AQtEZXN0Q291bnRyeQEHa2V5d29yZAEAAAFmDU9yaWdpbkNvdW50cnkBDU9yaWdpbkNvdW50cnkBB2tleXdvcmQBAAABZg5BdmdUaWNrZXRQcmljZQEOQXZnVGlja2V0UHJpY2UBBWZsb2F0AAAAAQc="
}
在示例的请求中,将format设置为json,这样在返回结果中就能直接看到cursor值,当需要请求下一页的数据时,只需传递cursor值即可进行查询:
POST _sql?format=json
{
"cursor": "49itAwFaAXNARFhGMVpYSjVRVzVrUm1WMFkyZ0JBQUFBQUFBU0JQNFdlbFYzTWxWaGFXaFRlV1Z6UlhONGFtNXhOVzlWVVE9Pf8PAwFmC0Rlc3RDb3VudHJ5AQtEZXN0Q291bnRyeQEHa2V5d29yZAEAAAFmDU9yaWdpbkNvdW50cnkBDU9yaWdpbkNvdW50cnkBB2tleXdvcmQBAAABZg5BdmdUaWNrZXRQcmljZQEOQXZnVGlja2V0UHJpY2UBBWZsb2F0AAAAAQc="
}
输出结果:
{
"rows" : [
[
"IT",
"CA",
1195.3363037109375
],
[
"KR",
"NL",
1194.945556640625
],
[
"JP",
"CO",
1194.386962890625
]
],
"cursor" : "49itAwFaAXNARFhGMVpYSjVRVzVrUm1WMFkyZ0JBQUFBQUFBU0JQNFdlbFYzTWxWaGFXaFRlV1Z6UlhONGFtNXhOVzlWVVE9Pf8PAwFmC0Rlc3RDb3VudHJ5AQtEZXN0Q291bnRyeQEHa2V5d29yZAEAAAFmDU9yaWdpbkNvdW50cnkBDU9yaWdpbkNvdW50cnkBB2tleXdvcmQBAAABZg5BdmdUaWNrZXRQcmljZQEOQXZnVGlja2V0UHJpY2UBBWZsb2F0AAAAAQc="
}
反复执行请求,Elasticsearch就会将第一次请求的全部内容以每次10个的数量全部迭代出来。在请求完所有数据后,应该使用_sql/close接口将游标关闭以释放资源。
POST _sql/close?format=json
{
"cursor": "49itAwFaAXNARFhGMVpYSjVRVzVrUm1WMFkyZ0JBQUFBQUFBU0JQNFdlbFYzTWxWaGFXaFRlV1Z6UlhONGFtNXhOVzlWVVE9Pf8PAwFmC0Rlc3RDb3VudHJ5AQtEZXN0Q291bnRyeQEHa2V5d29yZAEAAAFmDU9yaWdpbkNvdW50cnkBDU9yaWdpbkNvdW50cnkBB2tleXdvcmQBAAABZg5BdmdUaWNrZXRQcmljZQEOQXZnVGlja2V0UHJpY2UBBWZsb2F0AAAAAQc="
}
除了fetch_size以外还有些可以在_sql接口请求体中使用的参数,如下:
- query:需要执行的SQL语句,必须要设置的参数
- fetch_size:默认1000,每次返回的行数
- filter:默认none,使用DSL设置过滤器
- request_timeout:默认90s,请求超时时间
- page_timeout:默认45s,分页超时时间
- tume_zone:默认Z,时区
- field_multi_value_leniency:默认false,如果一个字段返回多个值时是否忽略
在这些参数中,fiter可以使用DSL对文档做过滤,支持DSL中介绍的所有查询条件。query中的SQL语句在翻译为DSL后,会与filter中的DSL查询语句共同组合到bool查询中。其中SQL语句生成的DSL将出现在must子句,而filter中的DSL则出现在filter子句中。来想要查看SQL语句翻译后的DSL可以使用_sql/translate执行相同的请求,在返回结果中就可以看到翻译后的DSL了。
post _sql/translate
{
"query": """
select DestCountry, OriginCountry,AvgTicketPrice
from kibana_sample_data_flights
where Carrier = 'Kibana Airlines' order by AvgTicketPrice desc
""",
"fetch_size": 3
}
输出结果:
{
"size" : 3,
"query" : {
"term" : {
"Carrier" : {
"value" : "Kibana Airlines",
"boost" : 1.0
}
}
},
"_source" : {
"includes" : [
"AvgTicketPrice"
],
"excludes" : [ ]
},
"docvalue_fields" : [
{
"field" : "DestCountry"
},
{
"field" : "OriginCountry"
}
],
"sort" : [
{
"AvgTicketPrice" : {
"order" : "desc",
"missing" : "_first",
"unmapped_type" : "float"
}
}
]
}
SQL语法
Elasticsearch支持传统关系型数据库SQL语句中的查询语句,但并不支持DML、DCL句。换句话说,它只支持SELECT语句,不支持INSERT、UPDATE、DELETE语句。SELECT语句以外,Elaticsarch还支持DESCRIBE和SHOW语句。
SELECT语句
SELECT语句用于查询文档,基本语法格式如下:
SELECT select_expr,
[ FROM table_name」
[WHERE condtion]
[ GROUP BY grouping_element]
[ HAVING condition J
[ ORDER BY expression [ASC|DESC]
[ LIMIT[ count]]
通过示例可以看出,Elastiesearch的SELECT语句跟普通SQL几乎没有什么区别,支持SELECT、FROM、WHERE、 GROUP BY、HAVING、ORDER BY及LIMTT子句。
SELECT子句中可以使用星号或文档字段名称列表,FROM子句则指定要检索的索引名称,而WHERE子句则设定了检索的条件。一般的SQL查询使用这三个子句就足够了,而GROUP BY和HAVING子句则用于分组,ORDER BY子句用于排
序,而LIMIT一般则可以用于分页。和传统SQL语句非常接近。
DESCRIBE语句
DESCRIBE语句用于查看一个索引的基础信息,在返回结果中一般会包含column、type、mapping三个列,分别对应文档的字段名称、传统数据库类型及文档字段中的类型。
例如要查看索引的基本信息:
POST _sql?format=txt
{
"query": "describe kibana_sample_data_flights"
}
输出结果:
column | type | mapping
------------------+---------------+---------------
AvgTicketPrice |REAL |float
Cancelled |BOOLEAN |boolean
Carrier |VARCHAR |keyword
Dest |VARCHAR |keyword
DestAirportID |VARCHAR |keyword
DestCityName |VARCHAR |keyword
DestCountry |VARCHAR |keyword
DestLocation |GEOMETRY |geo_point
DestRegion |VARCHAR |keyword
DestWeather |VARCHAR |keyword
DistanceKilometers|REAL |float
DistanceMiles |REAL |float
FlightDelay |BOOLEAN |boolean
FlightDelayMin |INTEGER |integer
FlightDelayType |VARCHAR |keyword
FlightNum |VARCHAR |keyword
FlightTimeHour |VARCHAR |keyword
FlightTimeMin |REAL |float
Origin |VARCHAR |keyword
OriginAirportID |VARCHAR |keyword
OriginCityName |VARCHAR |keyword
OriginCountry |VARCHAR |keyword
OriginLocation |GEOMETRY |geo_point
OriginRegion |VARCHAR |keyword
OriginWeather |VARCHAR |keyword
dayOfWeek |INTEGER |integer
timestamp |TIMESTAMP |datetime
SHOW语句
SHOW语句包括三种形式,即SHOW COLUMNS、SHOW FUNCTIONS和SHOW TABLES。
SHOW COLUMNS用于查看一个索引中的字段情况,它的作用与DESCRIBE语句完全一样,甚至连返回结果都是一样的。
POST _sql?format=txt
{
"query": "show columns in kibana_sample_data_flights"
}
输出结果:
column | type | mapping
------------------+---------------+---------------
AvgTicketPrice |REAL |float
Cancelled |BOOLEAN |boolean
Carrier |VARCHAR |keyword
Dest |VARCHAR |keyword
DestAirportID |VARCHAR |keyword
DestCityName |VARCHAR |keyword
DestCountry |VARCHAR |keyword
DestLocation |GEOMETRY |geo_point
DestRegion |VARCHAR |keyword
DestWeather |VARCHAR |keyword
DistanceKilometers|REAL |float
DistanceMiles |REAL |float
FlightDelay |BOOLEAN |boolean
FlightDelayMin |INTEGER |integer
FlightDelayType |VARCHAR |keyword
FlightNum |VARCHAR |keyword
FlightTimeHour |VARCHAR |keyword
FlightTimeMin |REAL |float
Origin |VARCHAR |keyword
OriginAirportID |VARCHAR |keyword
OriginCityName |VARCHAR |keyword
OriginCountry |VARCHAR |keyword
OriginLocation |GEOMETRY |geo_point
OriginRegion |VARCHAR |keyword
OriginWeather |VARCHAR |keyword
dayOfWeek |INTEGER |integer
timestamp |TIMESTAMP |datetime
SHOW FUNCTIONS用于返回在Elastiesearch SQL中支持的所有函数,返回结果中包括MIN、MAX、COUNT等常用的聚集函数。
POST _sql?format=txt
{
"query": "show functions"
}
输出结果:
name | type
-----------------+---------------
AVG |AGGREGATE
COUNT |AGGREGATE
FIRST |AGGREGATE
FIRST_VALUE |AGGREGATE
LAST |AGGREGATE
LAST_VALUE |AGGREGATE
MAX |AGGREGATE
MIN |AGGREGATE
SUM |AGGREGATE
KURTOSIS |AGGREGATE
MAD |AGGREGATE
PERCENTILE |AGGREGATE
PERCENTILE_RANK |AGGREGATE
SKEWNESS |AGGREGATE
STDDEV_POP |AGGREGATE
SUM_OF_SQUARES |AGGREGATE
VAR_POP |AGGREGATE
HISTOGRAM |GROUPING
CASE |CONDITIONAL
COALESCE |CONDITIONAL
GREATEST |CONDITIONAL
IFNULL |CONDITIONAL
IIF |CONDITIONAL
ISNULL |CONDITIONAL
LEAST |CONDITIONAL
NULLIF |CONDITIONAL
NVL |CONDITIONAL
CURDATE |SCALAR
CURRENT_DATE |SCALAR
CURRENT_TIME |SCALAR
CURRENT_TIMESTAMP|SCALAR
CURTIME |SCALAR
DAY |SCALAR
DAYNAME |SCALAR
DAYOFMONTH |SCALAR
DAYOFWEEK |SCALAR
DAYOFYEAR |SCALAR
DAY_NAME |SCALAR
DAY_OF_MONTH |SCALAR
DAY_OF_WEEK |SCALAR
DAY_OF_YEAR |SCALAR
DOM |SCALAR
DOW |SCALAR
DOY |SCALAR
HOUR |SCALAR
HOUR_OF_DAY |SCALAR
IDOW |SCALAR
ISODAYOFWEEK |SCALAR
ISODOW |SCALAR
ISOWEEK |SCALAR
ISOWEEKOFYEAR |SCALAR
ISO_DAY_OF_WEEK |SCALAR
ISO_WEEK_OF_YEAR |SCALAR
IW |SCALAR
IWOY |SCALAR
MINUTE |SCALAR
MINUTE_OF_DAY |SCALAR
MINUTE_OF_HOUR |SCALAR
MONTH |SCALAR
MONTHNAME |SCALAR
MONTH_NAME |SCALAR
MONTH_OF_YEAR |SCALAR
NOW |SCALAR
QUARTER |SCALAR
SECOND |SCALAR
SECOND_OF_MINUTE |SCALAR
TODAY |SCALAR
WEEK |SCALAR
WEEK_OF_YEAR |SCALAR
YEAR |SCALAR
ABS |SCALAR
ACOS |SCALAR
ASIN |SCALAR
ATAN |SCALAR
ATAN2 |SCALAR
CBRT |SCALAR
CEIL |SCALAR
CEILING |SCALAR
COS |SCALAR
COSH |SCALAR
COT |SCALAR
DEGREES |SCALAR
E |SCALAR
EXP |SCALAR
EXPM1 |SCALAR
FLOOR |SCALAR
LOG |SCALAR
LOG10 |SCALAR
MOD |SCALAR
PI |SCALAR
POWER |SCALAR
RADIANS |SCALAR
RAND |SCALAR
RANDOM |SCALAR
ROUND |SCALAR
SIGN |SCALAR
SIGNUM |SCALAR
SIN |SCALAR
SINH |SCALAR
SQRT |SCALAR
TAN |SCALAR
TRUNCATE |SCALAR
ASCII |SCALAR
BIT_LENGTH |SCALAR
CHAR |SCALAR
CHARACTER_LENGTH |SCALAR
CHAR_LENGTH |SCALAR
CONCAT |SCALAR
INSERT |SCALAR
LCASE |SCALAR
LEFT |SCALAR
LENGTH |SCALAR
LOCATE |SCALAR
LTRIM |SCALAR
OCTET_LENGTH |SCALAR
POSITION |SCALAR
REPEAT |SCALAR
REPLACE |SCALAR
RIGHT |SCALAR
RTRIM |SCALAR
SPACE |SCALAR
SUBSTRING |SCALAR
UCASE |SCALAR
CAST |SCALAR
CONVERT |SCALAR
DATABASE |SCALAR
USER |SCALAR
ST_ASTEXT |SCALAR
ST_ASWKT |SCALAR
ST_DISTANCE |SCALAR
ST_GEOMETRYTYPE |SCALAR
ST_GEOMFROMTEXT |SCALAR
ST_WKTTOSQL |SCALAR
ST_X |SCALAR
ST_Y |SCALAR
ST_Z |SCALAR
SCORE |SCORE
最后,SHOW TABLES用看Elaticecearch中所有的索引。
POST _sql?format=txt
{
"query": "show tables"
}
输出结果:
name | type | kind
--------------------------+---------------+---------------
.apm-agent-configuration |BASE TABLE |INDEX
.kibana |VIEW |ALIAS
.kibana_1 |BASE TABLE |INDEX
.kibana_task_manager |VIEW |ALIAS
.kibana_task_manager_1 |BASE TABLE |INDEX
articles |BASE TABLE |INDEX
colleges |BASE TABLE |INDEX
employees |BASE TABLE |INDEX
employees1 |BASE TABLE |INDEX
kibana_sample_data_flights|BASE TABLE |INDEX
kibana_sample_data_logs |BASE TABLE |INDEX
这三种形式都支持使用LIKE子句过滤返回结果,LIKE子句在用法上与SQL语句中的LIKE类似。
例如,show tables like 'a%'
将只返回以a开头的索引。
POST _sql?format=txt
{
"query": "show tables like 'a%'"
}
输出结果:
name | type | kind
---------------+---------------+---------------
articles |BASE TABLE |INDEX
操作符与函数
Elasticsearch SQL中支持的操作符与函数有100多种,这些操作符大多与普通SQL语言一致,所以这里只介绍一些与普通 SQL语句不一样的地方。
先来看一下比较操作符。一般等于比较在SQL中使用等号“=”,这在ElasticsearchSQL中也成立。但是Elasticseareh SQL 还引人了另一个等号比较“< = >”,这种等号可以在左值为null时不出现异常。
LIKE操作符,在LIKE子句中可以使用%代表任意多个字符,而使用下划线_代表单个字符。Elasticsearch SQL不仅支持 LIKE子句,还支持通过RLIKE子句以正则表达式的形式做匹配,这大大扩展了SQL语句模糊匹配的能力。
尽管使用LIKE和RLIKE可以实现模糊匹配,但它离全文检索还差得很远。SQL语句的WHERE子句一般都是使用字段整体值做比较,而没有使用词项做匹配的能力。为此Elasticsearch SQL提供了MATCH和QUERY两个函数,以实现在SQL做全文检索。
例如下面的两个请求分别使用match和query函数,它们的作用都是检索DestCounty字段为CN的文档:
POST _sql?format=txt
{
"query": """
select DestCountry, OriginCountry,AvgTicketPrice,score()
from kibana_sample_data_flights
where match(DestCountry,'CN') limit 3
"""
}
输出结果:
DestCountry | OriginCountry | AvgTicketPrice | SCORE()
---------------+---------------+-----------------+---------------
CN |MX |730.041748046875 |2.4774308
CN |MX |922.4990844726562|2.4774308
CN |RU |277.4297180175781|2.4774308
POST _sql?format=txt
{
"query": """
select DestCountry, OriginCountry,AvgTicketPrice,score()
from kibana_sample_data_flights
where query('DestCountry:CN') limit 3
"""
}
输出结果:
DestCountry | OriginCountry | AvgTicketPrice | SCORE()
---------------+---------------+-----------------+---------------
CN |MX |730.041748046875 |2.4774308
CN |MX |922.4990844726562|2.4774308
CN |RU |277.4297180175781|2.4774308
在示例中的两个请求的selet子句中都使用了SCORE函数,它的作用是获取检索的相关度评分值。
Elasticsearch SQL支持传统SQL中的聚集函数,这包括MAX、MIN、AVG、COUNT、SUM等。同时,它还支持一些 Elasticsearch特有的聚集函数,这些聚集函数与Elasticsearch聚集查询相对应。这包括FIRST/FIRST_ VALUE和 LAST/LAST_VALUE,可用于查看某个字段首个和最后一个非空值;PERCENTILE和PERCENTILE RANK用于百分位聚集,KURTOSIS、SKEWNESS、STDDEV_ POP、SUM_OF SQUARES和VAR_ POP可用于运算其他统计聚集。除了以上这些函数和操作符,Elasticsearch SQL还定义了一组用于日期、数值以及字符串运算的函数。