用 StarRocks on ES 实现 分词

目录

部署ES

下载

Config

系统参数

启动

Verify

常规用法

创建Index

_cat

查看settings

删除index

Bulk导入数据

Search

SR 外表

测试1:分词

ES

SR


部署ES

sr@cs02:~/app/elasticsearch-7.16.2$grep -v ^# config/elasticsearch.yml
node.name: node-2
network.host: 172.26.194.185
cluster.initial_master_nodes: ["node-2"]
  • 系统参数

vm.max_map_count = 655360
sr@cs02:~/app/elasticsearch-7.16.2$sudo vim /etc/sysctl.conf
sr@cs02:~/app/elasticsearch-7.16.2$sudo sysctl -p
vm.swappiness = 0
kernel.sysrq = 1
net.ipv4.neigh.default.gc_stale_time = 120
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.default.arp_announce = 2
net.ipv4.conf.lo.arp_announce = 2
net.ipv4.conf.all.arp_announce = 2
net.ipv4.tcp_max_tw_buckets = 5000
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 1024
net.ipv4.tcp_synack_retries = 2
vm.max_map_count = 655360
  • 启动

sr@cs02:~/app/elasticsearch-7.16.2$bin/elasticsearch -d
  • Verify

sr@cs02:~/app/elasticsearch-7.16.2$sudo netstat -lnpt |grep 9[2-3]00
tcp6       0      0 172.26.194.185:9200     :::*                    LISTEN      10442/java
tcp6       0      0 172.26.194.185:9300     :::*                    LISTEN      10442/java

常规用法

  • 创建Index

sr@cs02:~$curl -sH "Content-Type: application/json" -XPUT "cs02:9200/test"    | python -m json.tool
{
    "acknowledged": true,
    "index": "test",
    "shards_acknowledged": true
}
  • _cat

sr@cs02:~$curl -s  "cs02:9200/_cat/indices?v"
health status index uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   test  poJIPAAJQpW4hq8zXGAg3Q   1   1          0            0       226b           226b
  • 查看settings

sr@cs02:~$curl -sH "Content-Type: application/json" -XGET "cs02:9200/test/_settings/" | python -m json.tool
{
    "test": {
        "settings": {
            "index": {
                "creation_date": "1641910831087",
                "number_of_replicas": "1",
                "number_of_shards": "1",
                "provided_name": "test",
                "routing": {
                    "allocation": {
                        "include": {
                            "_tier_preference": "data_content"
                        }
                    }
                },
                "uuid": "JQKoH7sKRmi34chGt1n1jg",
                "version": {
                    "created": "7160299"
                }
            }
        }
    }
}
  • 删除index

sr@cs02:~$curl -s -XDELETE "cs02:9200/test" |  python -m json.tool
{
    "acknowledged": true
}
  • Bulk导入数据

  • Note: 每行要换行

curl -XPOST "http://cs02:9200/_bulk" -H 'Content-Type: application/json' -d'
{"index":{"_index":"test"}}
{ "k1" : 1, "k2": "2022-01-01", "k3": "Trying out Elasticsearch", "k4": "Trying out Elasticsearch", "k5": 10.0}
{"index":{"_index":"test"}}
{ "k1" : 2, "k2": "2022-01-02", "k3": "Trying out StarRocks", "k4": "Trying out StarRocks", "k5": 20.0}
{"index":{"_index":"test"}}
{ "k1" : 3, "k2": "2022-01-03", "k3": "StarRocks On ES", "k4": "StarRocks On ES", "k5": 30.0}
{"index":{"_index":"test"}}
{ "k1" : 4, "k2": "2022-01-04", "k3": "StarRocks", "k4": "StarRocks", "k5": 40.0}
{"index":{"_index":"test"}}
{ "k1" : 5, "k2": "2022-01-05", "k3": "ES", "k4": "ES", "k5": 50.0}
'
sr@cs02:~$curl -s -XGET cs02:9200/test/_search?pretty
{
  "took" : 667,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "uW6rSX4B8p-MWLQevhhA",
        "_score" : 1.0,
        "_source" : {
          "k1" : 1,
          "k2" : "2022-01-01",
          "k3" : "Trying out Elasticsearch",
          "k4" : "Trying out Elasticsearch",
          "k5" : 10.0
        }
      },
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "um6rSX4B8p-MWLQevhhA",
        "_score" : 1.0,
        "_source" : {
          "k1" : 2,
          "k2" : "2022-01-02",
          "k3" : "Trying out StarRocks",
          "k4" : "Trying out StarRocks",
          "k5" : 20.0
        }
      },
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "u26rSX4B8p-MWLQevhhA",
        "_score" : 1.0,
        "_source" : {
          "k1" : 3,
          "k2" : "2022-01-03",
          "k3" : "StarRocks On ES",
          "k4" : "StarRocks On ES",
          "k5" : 30.0
        }
      },
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "vG6rSX4B8p-MWLQevhhA",
        "_score" : 1.0,
        "_source" : {
          "k1" : 4,
          "k2" : "2022-01-04",
          "k3" : "StarRocks",
          "k4" : "StarRocks",
          "k5" : 40.0
        }
      },
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "vW6rSX4B8p-MWLQevhhA",
        "_score" : 1.0,
        "_source" : {
          "k1" : 5,
          "k2" : "2022-01-05",
          "k3" : "ES",
          "k4" : "ES",
          "k5" : 50.0
        }
      }
    ]
  }
}

SR 外表

  • 测试1:分词

  • ES

  • Create index
curl -sH "Content-Type: application/json" -XPUT "cs02:9200/test"  -d'
{
  "mappings": {

      "properties": {
        "k1": {
          "type": "long"
        },
        "k2": {
          "type": "date"
        },
        "k3": {
          "type": "keyword"
        },
        "k4": {
          "analyzer": "standard",
          "type": "text"
        },
        "k5": {
          "type": "float"
        }

    }
  },
  "settings": {
    "index": {
      "number_of_replicas": "0",
      "number_of_shards": "1"
    }
  }
}
'
  • 数据导入ES
curl -XPOST "http://cs02:9200/_bulk" -H 'Content-Type: application/json' -d'
{"index":{"_index":"test"}}
{ "k1" : 1, "k2": "2022-01-01", "k3": "Trying out Elasticsearch", "k4": "Trying out Elasticsearch", "k5": 10.0}
{"index":{"_index":"test"}}
{ "k1" : 2, "k2": "2022-01-02", "k3": "Trying out StarRocks", "k4": "Trying out StarRocks", "k5": 20.0}
{"index":{"_index":"test"}}
{ "k1" : 3, "k2": "2022-01-03", "k3": "StarRocks On ES", "k4": "StarRocks On ES", "k5": 30.0}
{"index":{"_index":"test"}}
{ "k1" : 4, "k2": "2022-01-04", "k3": "StarRocks", "k4": "StarRocks", "k5": 40.0}
{"index":{"_index":"test"}}
{ "k1" : 5, "k2": "2022-01-05", "k3": "ES", "k4": "ES", "k5": 50.0}
'
  • SR

mysql -uroot -hcs01 -P 9013
USE simon;

-- 对ES中字符串类型分词类型(text) fields 进行探测
CREATE EXTERNAL TABLE `soe_t1` (
 `k1` bigint(20) NULL COMMENT "",
 `k2` datetime NULL COMMENT "",
 `k3` varchar(20) NULL COMMENT "",
 `k4` varchar(100) NULL COMMENT "",
 `k5` float NULL COMMENT ""
) ENGINE=ELASTICSEARCH
COMMENT "ELASTICSEARCH"
PROPERTIES (
"hosts" = "cs02:9200",
"index" = "test",
"type" = "_doc",
"transport" = "http",
"enable_docvalue_scan" = "true",
"max_docvalue_fields" = "20",
"enable_keyword_sniff" = "true"
);
  • 需要稍等几秒,同步后再查询
MySQL [simon]> select * from soe_t1;
ERROR 1064 (HY000): EsTable metadata has not been synced, Try it later
MySQL [simon]> desc soe_t1;
+-------+--------------+------+-------+---------+-------+
| Field | Type         | Null | Key   | Default | Extra |
+-------+--------------+------+-------+---------+-------+
| k1    | BIGINT       | Yes  | true  | NULL    |       |
| k2    | DATETIME     | Yes  | true  | NULL    |       |
| k3    | VARCHAR(20)  | Yes  | true  | NULL    |       |
| k4    | VARCHAR(100) | Yes  | false | NULL    | NONE  |
| k5    | FLOAT        | Yes  | false | NULL    | NONE  |
+-------+--------------+------+-------+---------+-------+
5 rows in set (0.00 sec)

mysql> select * from soe_t1;
+------+---------------------+--------------------------+--------------------------+------+
| k1   | k2                  | k3                       | k4                       | k5   |
+------+---------------------+--------------------------+--------------------------+------+
|    1 | 2022-01-01 00:00:00 | Trying out Elasticsearch | Trying out Elasticsearch |   10 |
|    2 | 2022-01-02 00:00:00 | Trying out StarRocks     | Trying out StarRocks     |   20 |
|    3 | 2022-01-03 00:00:00 | StarRocks On ES          | StarRocks On ES          |   30 |
|    4 | 2022-01-04 00:00:00 | StarRocks                | StarRocks                |   40 |
|    5 | 2022-01-05 00:00:00 | ES                       | ES                       |   50 |
+------+---------------------+--------------------------+--------------------------+------+
5 rows in set (0.01 sec)

mysql> select * from soe_t1 where k5 > 30;
+------+---------------------+-----------+-----------+------+
| k1   | k2                  | k3        | k4        | k5   |
+------+---------------------+-----------+-----------+------+
|    4 | 2022-01-04 00:00:00 | StarRocks | StarRocks |   40 |
|    5 | 2022-01-05 00:00:00 | ES        | ES        |   50 |
+------+---------------------+-----------+-----------+------+
2 rows in set (0.01 sec)

-- 非分词列,精确匹配
mysql> select * from soe_t1 where k3 = 'ES';
+------+---------------------+------+------+------+
| k1   | k2                  | k3   | k4   | k5   |
+------+---------------------+------+------+------+
|    5 | 2022-01-05 00:00:00 | ES   | ES   |   50 |
+------+---------------------+------+------+------+
1 row in set (0.38 sec)

-- 分词类型(text),按小写分词
mysql> select * from soe_t1 where k4 = 'es';
+------+---------------------+-----------------+-----------------+------+
| k1   | k2                  | k3              | k4              | k5   |
+------+---------------------+-----------------+-----------------+------+
|    3 | 2022-01-03 00:00:00 | StarRocks On ES | StarRocks On ES |   30 |
|    5 | 2022-01-05 00:00:00 | ES              | ES              |   50 |
+------+---------------------+-----------------+-----------------+------+
2 rows in set (0.01 sec)

mysql> select * from soe_t1 where k4 = 'starrocks';
+------+---------------------+----------------------+----------------------+------+
| k1   | k2                  | k3                   | k4                   | k5   |
+------+---------------------+----------------------+----------------------+------+
|    2 | 2022-01-02 00:00:00 | Trying out StarRocks | Trying out StarRocks |   20 |
|    3 | 2022-01-03 00:00:00 | StarRocks On ES      | StarRocks On ES      |   30 |
|    4 | 2022-01-04 00:00:00 | StarRocks            | StarRocks            |   40 |
+------+---------------------+----------------------+----------------------+------+
3 rows in set (0.01 sec)

-- 标准分词器,按小写分词
mysql> select * from soe_t1 where k4 = 'ES';
Empty set (0.01 sec)

-- esquery
mysql> select * from soe_t1 where esquery(k4, '{
        "match": {
           "k4": "es"
        }
    }');
+------+---------------------+-----------------+-----------------+------+
| k1   | k2                  | k3              | k4              | k5   |
+------+---------------------+-----------------+-----------------+------+
|    3 | 2022-01-03 00:00:00 | StarRocks On ES | StarRocks On ES |   30 |
|    5 | 2022-01-05 00:00:00 | ES              | ES              |   50 |
+------+---------------------+-----------------+-----------------+------+
2 rows in set (0.01 sec)

下一篇:

玩转StarRocks on ES-2-全文检索

StarRocks招聘:

招解决方案,DBA,数据库研发,测试,前后端开发等岗位,

有意者请投递简历到 hr@starrocks.com

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值