一次Elasticsearch的问题分析

    在工作中经历过不少使用es的场景。es作为搜索引擎数据库用作全文搜索是不错的,但是真正用到es的不多,不少场景可能用MySQL、PG甚至Oracle都可以很好的完成,只是缺乏专人的设计和把控。不过我的ES水平也有限,只能处理简单的场景。这里要是有不对的,请指出一下。今天就是分享一个故事。

     es数据库又名全文索引,它的索引其实就是传统数据库上的表。至于es的原理今天不说了。就说问题的分析和定位以及重现。

     现象是CPU高负荷。其实任何一个数据库的CPU高消耗,去查查SQL都能差不多定位。ES也是一样,它的命令虽然不能叫做SQL,但是也可以这样叫。因为现在的ES似乎是从6开始是支持的SQL。

     即可以像下面这样查询:

root@xue-test1-148154:[/root]elasticsearch-sql-cli http://10.60.148.154:9200    (使用支持SQL的工具elasticsearch-sql-cli登录)


.sssssss.` .sssssss.
.:sXXXXXXXXXXo` `ohXXXXXXXXXho.
.yXXXXXXXXXXXXXXo` `oXXXXXXXXXXXXXXX-
.XXXXXXXXXXXXXXXXXXo` `oXXXXXXXXXXXXXXXXXX.
.XXXXXXXXXXXXXXXXXXXXo. .oXXXXXXXXXXXXXXXXXXXXh
.XXXXXXXXXXXXXXXXXXXXXXo``oXXXXXXXXXXXXXXXXXXXXXXy
`yXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.
`oXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXo`
`oXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXo`
`oXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXo`
`oXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXo`
`oXXXXXXXXXXXXXXXXXXXXXXXXXXXXo`
.XXXXXXXXXXXXXXXXXXXXXXXXXo`
.oXXXXXXXXXXXXXXXXXXXXXXXXo`
`oXXXXXXXXXXXXXXXXXXXXXXXXo` `odo`
`oXXXXXXXXXXXXXXXXXXXXXXXXo` `oXXXXXo`
`oXXXXXXXXXXXXXXXXXXXXXXXXo` `oXXXXXXXXXo`
`oXXXXXXXXXXXXXXXXXXXXXXXXo` `oXXXXXXXXXXXXXo`
`yXXXXXXXXXXXXXXXXXXXXXXXo` oXXXXXXXXXXXXXXXXX.
.XXXXXXXXXXXXXXXXXXXXXXo` `oXXXXXXXXXXXXXXXXXXXy
.XXXXXXXXXXXXXXXXXXXXo` /XXXXXXXXXXXXXXXXXXXXX
.XXXXXXXXXXXXXXXXXXo` `oXXXXXXXXXXXXXXXXXX-
-XXXXXXXXXXXXXXXo` `oXXXXXXXXXXXXXXXo`
.oXXXXXXXXXXXo` `oXXXXXXXXXXXo.
`.sshXXyso` SQL `.sshXhss.`


进来以后就像MySQL一样的使用。


sql> show tables;
name | type
----------------------+---------------
website |BASE TABLE
.kibana |BASE TABLE
centos7 |BASE TABLE
ceshi |BASE TABLE
ceshi2 |BASE TABLE
demo |BASE TABLE
fulltext001 |BASE TABLE
gg |BASE TABLE
index |BASE TABLE
logstash-2015.05.18 |BASE TABLE
medcl |BASE TABLE
megacorp |BASE TABLE
my_index |BASE TABLE
mysql_13 |BASE TABLE
mysql_kafka |BASE TABLE
mysql_new |BASE TABLE
name |BASE TABLE
new |BASE TABLE
oo |BASE TABLE
ouyeel |BASE TABLE
people |BASE TABLE
personal_info_100000 |BASE TABLE
personal_info_10000000|BASE TABLE
productindex |BASE TABLE
school_index |BASE TABLE
test |BASE TABLE
testindex-slowlogs |BASE TABLE
tt |BASE TABLE
user_web_info |BASE TABLE
w |BASE TABLE
website |BASE TABLE
wo |BASE TABLE
ww |BASE TABLE
www |BASE TABLE
x2 |BASE TABLE
xuexiaogang |BASE TABLE
xxg |BASE TABLE

sql> desc x2;
column | type
----------------+---------------
@timestamp |TIMESTAMP
@version |VARCHAR
@version.keyword|VARCHAR
host |VARCHAR
host.keyword |VARCHAR
message |VARCHAR
message.keyword |VARCHAR
path |VARCHAR
path.keyword |VARCHAR
type |VARCHAR
type.keyword |VARCHAR


SQL查询


sql> select * from x2 limit 5;
@timestamp | @version | host | message | path | type
------------------------+---------------+----------------+-----------------------------------------------+---------------+---------------
2019-10-11T07:41:00.172Z|1 |xue-test1-148154| 'T30072' |/root/slow.log |system-log
2019-10-11T07:40:51.129Z|1 |xue-test1-148154| d.NET_PRICE_AT AS netPriceAt, |/root/slow.log |system-log
2019-10-11T07:41:11.357Z|1 |xue-test1-148154| |/root/slow.log |system-log
2019-10-11T07:41:05.869Z|1 |xue-test1-148154| FROM t_dd_order_event AS e |/root/slow.log |system-log
2019-10-11T07:40:53.356Z|1 |xue-test1-148154| d.PRODUCT_TYPE_NAME AS productTypeName,|/root/slow.log |system-log

sql> select * from x2 where message like '%T30072%' limit 5;
@timestamp | @version | host | message | path | type
------------------------+---------------+----------------+---------------+---------------+---------------
2019-10-11T07:40:57.648Z|1 |xue-test1-148154| 'T30072' |/root/slow.log |system-log
2019-10-11T07:41:00.172Z|1 |xue-test1-148154| 'T30072' |/root/slow.log |system-log
2019-10-11T07:41:00.180Z|1 |xue-test1-148154| 'T30072' |/root/slow.log |system-log
2019-10-11T07:40:57.547Z|1 |xue-test1-148154| 'T30072' |/root/slow.log |system-log
2019-10-11T07:40:59.134Z|1 |xue-test1-148154| 'T30072' |/root/slow.log |system-log

sql> select * from x2 where message like '%T30072%' order by message desc limit 5;
@timestamp | @version | host | message | path | type
------------------------+---------------+----------------+---------------+---------------+---------------
2019-10-11T07:41:00.172Z|1 |xue-test1-148154| 'T30072' |/root/slow.log |system-log
2019-10-11T07:40:57.547Z|1 |xue-test1-148154| 'T30072' |/root/slow.log |system-log
2019-10-11T07:41:00.180Z|1 |xue-test1-148154| 'T30072' |/root/slow.log |system-log
2019-10-11T07:40:57.648Z|1 |xue-test1-148154| 'T30072' |/root/slow.log |system-log
2019-10-11T07:40:59.134Z|1 |xue-test1-148154| 'T30072' |/root/slow.log |system-log

然后在慢日志中看到

[2023-10-07T15:22:39,449][WARN ][index.search.slowlog.query] [xue-es1] [x2][4] took[80.5ms], took_millis[80], total_hits[0], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"size":5,"timeout":"45000ms","quey":{"query_string":{"query":"*T30072*","fields":["message^1.0"],"type":"best_fields","default_operator":"or","max_determinized_states":10000,"enable_position_increments":true,"fuzziness":"AUTO","fuzzy_prefix_length":0,"fuzzy_max_expnsions":50,"phrase_slop":0,"escape":false,"auto_generate_synonyms_phrase_query":true,"fuzzy_transpositions":true,"boost":1.0}},"_source":{"includes":["@version","host","message","path","type"],"excludes":[]},"docvalue_fields":["@timstamp"],"sort":[{"_doc":{"order":"asc"}}]}],

 这里的size 5是返回5行。limit 5的意思。*T30072*  是前后%的意思。不少开发人员使用es觉得快,其实这样写谁都不会快。只是MySQL一个数据库给了8C  24G。 而一套es数据库3个节点起,CPU和内存给的毫不吝啬。甚至是远大于实际数据的。由于都在内存中且比MySQL多几十倍的CPU(N个节点总和)和内存(N个节点总和)感觉就是比关系型数据库快啊。因为条件不对等。

 下面的的慢日志可以看到 order 的desc  ,因为排序了。所以这个慢日志还是很详细的。
[2023-10-07T15:24:33,285][WARN ][index.search.slowlog.query] [xue-es1] [x2][3] took[62.2ms], took_millis[62], total_hits[2], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"size":5,"timeout":"45000ms","quey":{"query_string":{"query":"*T30072*","fields":["message^1.0"],"type":"best_fields","default_operator":"or","max_determinized_states":10000,"enable_position_increments":true,"fuzziness":"AUTO","fuzzy_prefix_length":0,"fuzzy_max_expnsions":50,"phrase_slop":0,"escape":false,"auto_generate_synonyms_phrase_query":true,"fuzzy_transpositions":true,"boost":1.0}},"_source":{"includes":["@version","host","message","path","type"],"excludes":[]},"docvalue_fields":["@timstamp"],"sort":[{"message.keyword":{"order":"desc"}}]}],

[2023-10-07T15:24:33,299][WARN ][index.search.slowlog.fetch] [xue-es1] [x2][3] took[12.1ms], took_millis[12], total_hits[2], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"size":5,"timeout":"45000ms","quey":{"query_string":{"query":"*T30072*","fields":["message^1.0"],"type":"best_fields","default_operator":"or","max_determinized_states":10000,"enable_position_increments":true,"fuzziness":"AUTO","fuzzy_prefix_length":0,"fuzzy_max_expnsions":50,"phrase_slop":0,"escape":false,"auto_generate_synonyms_phrase_query":true,"fuzzy_transpositions":true,"boost":1.0}},"_source":{"includes":["@version","host","message","path","type"],"excludes":[]},"docvalue_fields":["@timstamp"],"sort":[{"message.keyword":{"order":"desc"}}]}],

而打开慢日志是这样的。

      以上交代完毕,那么就看实际的问题。例如这里被我涂掉的是一模一样的。说明有一个精确查询,还带一个模糊查询。这就是典型的前后%在关系型数据库中禁止的,到这里来用硬件资源抗了。

还有这里看到should了吧?


 

es中,# must:如果有多个条件,这些条件都必须满足  and与

# should:如果有多个条件,满足一个或多个即可 or或
# must_not:和must相反,必须都不满足条件才可以匹配到 !非

说到这里就基本知道问题了。

   那么问题就是大量的或运算。即 OR。

    可能有人问,那么OR了大量难道使用者不能发现吗? 别忘了最开始看到的size  即limit 。即使是薛晓刚  or  男,由于只显示前面,后面7亿都看不到。谁知道错了?再加上排序。那么势必就慢了。

   我们来重现一下。第一句是and 没有or。12毫秒。第二句是or,必然返回数据量多。要81毫秒。  第三句,在第二句基础上,增加排序。结果101毫秒。

      违反关系型数据库规范的,在NoSQL数据库上同样会遭到凡是。如果不信,那么将NoSQL数据库缩容到和关系型数据库一样的配置来看看。一样宕机,因为这个NoSQL的配置已经不低了,但是还是出现了问题。如果用MySQL的配置,早就瘫痪了。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值