一、_termvectors
1、查看文档中某一个字段的分词结果
GET /{index}/{type}/{_id}/_termvectors?fields=[field]
2、样例:
text的值为:https://www.b4d99.com/html/202204/45672.html
GET http://IP:POST/textcontent_2022/textcontent/20220422191235893045256250/_termvectors?fields=text
得到的结果:
"terms": {
"202204": {
"term_freq": 1,
"tokens": [
{
"position": 4,
"start_offset": 27,
"end_offset": 33
}
]
},
"45672": {
"term_freq": 1,
"tokens": [
{
"position": 5,
"start_offset": 34,
"end_offset": 39
}
]
},
"com": {
"term_freq": 1,
"tokens": [
{
"position": 2,
"start_offset": 18,
"end_offset": 21
}
]
},
"html": {
"term_freq": 2,
"tokens": [
{
"position": 3,
"start_offset": 22,
"end_offset": 26
},
{
"position": 6,
"start_offset": 40,
"end_offset": 44
}
]
},
"https": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 5
}
]
},
"www.b4d99": {
"term_freq": 1,
"tokens": [
{
"position": 1,
"start_offset": 8,
"end_offset": 17
}
]
}
}
二、_analyze
1、语法
POST _analyze
{
"analyzer": "具体的分词器",
"text": "待分词的内容"
}
2、样例:
text的值为:https://www.b4d99.com/html/202204/45672.html
POST _analyze
{
"analyzer": "standard",
"text": "https://www.b4d99.com/html/202204/45672.html"
}
得到的结果:
{
"tokens": [
{
"token": "https",
"start_offset": 0,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "www.b4d99",
"start_offset": 8,
"end_offset": 17,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "com",
"start_offset": 18,
"end_offset": 21,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "html",
"start_offset": 22,
"end_offset": 26,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "202204",
"start_offset": 27,
"end_offset": 33,
"type": "<NUM>",
"position": 4
},
{
"token": "45672",
"start_offset": 34,
"end_offset": 39,
"type": "<NUM>",
"position": 5
},
{
"token": "html",
"start_offset": 40,
"end_offset": 44,
"type": "<ALPHANUM>",
"position": 6
}
]
}