62.修改分词器及手动创建分词器

最新推荐文章于 2022-09-07 14:22:17 发布

anlanmo0960

最新推荐文章于 2022-09-07 14:22:17 发布

阅读量207

点赞数

文章标签：数据库

原文链接：http://www.cnblogs.com/liuqianli/p/8475474.html

版权

主要知识点

修改分词器
手动创建分词器

一、修改分词器

1、默认的分词器standard，主要有以下四个功能

standard tokenizer：以单词边界进行切分
standard token filter：什么都不做
lowercase token filter：将所有字母转换为小写
stop token filer（默认被禁用）：移除停用词，比如a the it等等

2、修改分词器的设置

启用english的停用词token filter

PUT /my_index

{

"settings": {

"analysis": {

"analyzer": {

"es_std": {

"type": "standard",

"stopwords": "_english_"

}

}

}

}

}

测试修改后的分词器

GET /my_index/_analyze

{

"analyzer": "standard",

"text": "a dog is in the house"

}

GET /my_index/_analyze

{

"analyzer": "es_std",

"text":"a dog is in the house"

}

二、定制化自己的分词器

PUT /my_index

{

"settings": {

"analysis": {

"char_filter": {

"&_to_and": {

"type": "mapping",

"mappings": ["&=> and"]

}

},

"filter": {

"my_stopwords": {

"type": "stop",

"stopwords": ["the", "a"]

}

},

"analyzer": {

"my_analyzer": {

"type": "custom",

"char_filter": ["html_strip", "&_to_and"],

"tokenizer": "standard",

"filter": ["lowercase", "my_stopwords"]

}

}

}

}

}

测试手动创建的分词器

GET /my_index/_analyze

{

"text": "tom&jerry are a friend in the house, <a>, HAHA!!",

"analyzer": "my_analyzer"

}

PUT /my_index/_mapping/my_type

{

"properties": {

"content": {

"type": "text",

"analyzer": "my_analyzer"

}

}

}

转载于:https://www.cnblogs.com/liuqianli/p/8475474.html

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
62.修改分词器及手动创建分词器

主要知识点修改分词器手动创建分词器一、修改分词器 1、默认的分词器standard，主要有以下四个功能 standard tokenizer：以单词边界进行切分 standard token filter：什么都不做 lowercase token filter：将所有字母转换为小写 stop token file...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。