1.全局分析
2.索引库分析
3. standard 分析的结果,大写会转换成小写
返回的分析结果:
{
"tokens": [
{
"token": "my",
"start_offset": 0,
"end_offset": 2,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "name",
"start_offset": 3,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "is",
"start_offset": 8,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "peter",
"start_offset": 11,
"end_offset": 16,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "parker",
"start_offset": 17,
"end_offset": 23,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "i",
"start_offset": 24,
"end_offset": 25,
"type": "<ALPHANUM>",
"position": 5
},
{
"token": "am",
"start_offset": 26,
"end_offset": 28,
"type": "<ALPHANUM>",
"position": 6
},
{
"token": "a",
"start_offset": 29,
"end_offset": 30,
"type": "<ALPHANUM>",
"position": 7
},
{
"token": "super",
"start_offset": 31,
"end_offset": 36,
"type": "<ALPHANUM>",
"position": 8
},
{
"token": "hero.i",
"start_offset": 37,
"end_offset": 43,
"type": "<ALPHANUM>",
"position": 9
},
{
"token": "don't",
"start_offset": 44,
"end_offset": 49,
"type": "<ALPHANUM>",
"position": 10
},
{
"token": "like",
"start_offset": 50,
"end_offset": 54,
"type": "<ALPHANUM>",
"position": 11
},
{
"token": "the",
"start_offset": 55,
"end_offset": 58,
"type": "<ALPHANUM>",
"position": 12
},
{
"token": "criminals",
"start_offset": 59,
"end_offset": 68,
"type": "<ALPHANUM>",
"position": 13
}
]
}
4. simple 会按照非字母进行拆分,也会将大写转为小写
don't被拆分成don和t
给词汇里加数字,分析结果也会去掉这些数字。因为simple 会按照非字母进行拆分。
5. whitespace 根据空格进行拆分 Parker,I 会被认为是一个单词,大写不会被转换成小写
分析的结果:
{
"tokens": [
{
"token": "My",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
"token": "name",
"start_offset": 3,
"end_offset": 7,
"type": "word",
"position": 1
},
{
"token": "is",
"start_offset": 8,
"end_offset": 10,
"type": "word",
"position": 2
},
{
"token": "Peter",
"start_offset": 11,
"end_offset": 16,
"type": "word",
"position": 3
},
{
"token": "Parker,I",
"start_offset": 17,
"end_offset": 25,
"type": "word",
"position": 4
},
{
"token": "am",
"start_offset": 26,
"end_offset": 28,
"type": "word",
"position": 5
},
{
"token": "a",
"start_offset": 29,
"end_offset": 30,
"type": "word",
"position": 6
},
{
"token": "Super",
"start_offset": 31,
"end_offset": 36,
"type": "word",
"position": 7
},
{
"token": "Hero.",
"start_offset": 37,
"end_offset": 42,
"type": "word",
"position": 8
},
{
"token": "I",
"start_offset": 43,
"end_offset": 44,
"type": "word",
"position": 9
},
{
"token": "don't",
"start_offset": 45,
"end_offset": 50,
"type": "word",
"position": 10
},
{
"token": "like",
"start_offset": 51,
"end_offset": 55,
"type": "word",
"position": 11
},
{
"token": "the",
"start_offset": 56,
"end_offset": 59,
"type": "word",
"position": 12
},
{
"token": "Criminals.",
"start_offset": 60,
"end_offset": 70,
"type": "word",
"position": 13
}
]
}
6. stop 像the、a、is 这种没有意义的词会被去掉
返回结果:
{
"tokens": [
{
"token": "my",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
"token": "name",
"start_offset": 3,
"end_offset": 7,
"type": "word",
"position": 1
},
{
"token": "peter",
"start_offset": 11,
"end_offset": 16,
"type": "word",
"position": 3
},
{
"token": "parker",
"start_offset": 17,
"end_offset": 23,
"type": "word",
"position": 4
},
{
"token": "i",
"start_offset": 24,
"end_offset": 25,
"type": "word",
"position": 5
},
{
"token": "am",
"start_offset": 26,
"end_offset": 28,
"type": "word",
"position": 6
},
{
"token": "super",
"start_offset": 31,
"end_offset": 36,
"type": "word",
"position": 8
},
{
"token": "hero",
"start_offset": 37,
"end_offset": 41,
"type": "word",
"position": 9
},
{
"token": "i",
"start_offset": 43,
"end_offset": 44,
"type": "word",
"position": 10
},
{
"token": "don",
"start_offset": 45,
"end_offset": 48,
"type": "word",
"position": 11
},
{
"token": "t",
"start_offset": 49,
"end_offset": 50,
"type": "word",
"position": 12
},
{
"token": "like",
"start_offset": 51,
"end_offset": 55,
"type": "word",
"position": 13
},
{
"token": "criminals",
"start_offset": 60,
"end_offset": 69,
"type": "word",
"position": 15
}
]
}
7.keyword 不做分词。把整个文本作为一个单独的关键词。