text html char,HTML Strip Char Filter

HTML Strip Char Filteredit

The html_strip character filter strips HTML elements from the text and

replaces HTML entities with their decoded value (e.g. replacing & with

&).

Example outputedit

POST _analyze

{

"tokenizer": "keyword",

"char_filter": [ "html_strip" ],

"text": "

I'm so happy!

"

}

The keyword tokenizer returns a single term.

The above example returns the term:

[ \nI'm so happy!\n ]

The same example with the standard tokenizer would return the following terms:

[ I'm, so, happy ]

Configurationedit

The html_strip character filter accepts the following parameter:

escaped_tags

An array of HTML tags which should not be stripped from the original text.

Example configurationedit

In this example, we configure the html_strip character filter to leave

tags in place:

PUT my_index

{

"settings": {

"analysis": {

"analyzer": {

"my_analyzer": {

"tokenizer": "keyword",

"char_filter": ["my_char_filter"]

}

},

"char_filter": {

"my_char_filter": {

"type": "html_strip",

"escaped_tags": ["b"]

}

}

}

}

}

POST my_index/_analyze

{

"analyzer": "my_analyzer",

"text": "

I'm so happy!

"

}

The above example produces the following term:

[ \nI'm so happy!\n ]

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值