这款插件主要是处理ES中文档解析,如果博友们对ES感兴趣欢迎相互交流:(该文默认你已经具备Es的IK分词器能力)
安装:
elasticsearch-plugin install ingest-attachment
介绍: 基于Apache 文本扩展库 Tika插件之上开发的一款适合elasticsearch文本解析插件。在ES5之前使用的是mapper-accachment。
配置文档:
field : 指定某个字段作为附件内容字段(需要用base64进行加密)
target_field:指定某个字段作为附件信息字段(作者、时间、类型)
indexed_chars : 指定解析文件管道流的最大大小,默认是100000。如果不想限制设置为-1(注意设置为-1的时候如果上传文件过大会而内存不够会导致文件上传不完全)
indexed_chars_field:指定某个字段能覆盖index_chars字段属性,这样子可以通过文件的大小去指定indexed_chars值。
properties: 选择需要存储附件的属性值可以为:content,title,name,author,keyword,date,content_type,content_length,language
ignore_missing: 默认为false,如果设置为true表示,如果上面指定的field字段不存在这不对附件进行解析,文档还能继续保留
Http API使用:
1、单文件上传
创建流
PUT _ingest/pipeline/simple_attachment
{
"description" : "lhy-单文件管道流",
"processors":[
{
"attachment":{
"field":"data",
"indexed_chars" : -1,
"properties":["content", "title","content_type"],
"ignore_missing":true
}
}]
}
创建Book索引
PUT book/
{
"settings": {
"index":{
"number_of_shards":1,
"number_of_replicas":0
}
},
"mappings": {
"info":{
"properties": {
"id": {
"type": "keyword"
},
"title": {
"type": "text",
"analyzer": "ik_max_word"
},
"filename": {
"type": "text",
"analyzer": "ik_max_word"
},
"data": {
"type": "keyword",
"store": true
},
"attachment":{
"properties":{
"content":{
"type":"text",
"analyzer": "ik_max_word"
},
"content_type":{
"type":"text",
"analyzer": "ik_max_word"
},
"title":{
"type":"text",
"analyzer": "ik_max_word"
}
}
}
}
}
}
}
导入数据
PUT book/info/1?pipeline=simple_attachment
{
"id": "book001",
"title": "这是一个测试文本",
"filename": "测试附件",
"data": "5paH5pys77yIVGV4dO+8ie+8jOaYr+S5pumdouivreiogOeahOihqOeOsOW9ouW8j++8m+iuoeeul+acuueahOS4gOenjeaWh+aho+exu+Wei++8m+aMh+S7u+S9leaWh+Wtl+adkOaWmeOAguaWh+acrO+8jOaYr+aMh+S5pumdouivreiogOeahOihqOeOsOW9ouW8j++8jOS7juaWh+WtpueahOinkuW6puivtO+8jOmAmuW4uOaYr+WFt+acieWujOaVtOOAgeezu+e7n+WQq+S5ie+8iE1lc3NhZ2XvvInnmoTkuIDkuKrlj6XlrZDmiJblpJrkuKrlj6XlrZDnmoTnu4TlkIjjgILkuIDkuKrmlofmnKzlj6/ku6XmmK/kuIDkuKrlj6XlrZDvvIhTZW50ZW5jZe+8ieOAgeS4gOS4quauteiQve+8iFBhcmFncmFwaO+8ieaIluiAheS4gOS4quevh+eroO+8iERpc2NvdXJzZe+8ieOAgg=="
}
PUT book/info/2?pipeline=simple_attachment
{
"id": "book002",
"title": "这是一个测试文本2",
"filename": "测试附件2",
"data": "SGFkb29w5a6e546w5LqG5LiA5Liq5YiG5biD5byP5paH5Lu257O757uf77yISGFkb29wIERpc3RyaWJ1dGVkIEZpbGUgU3lzdGVt77yJ77yM566A56ewSERGU+OAgkhERlPmnInpq5jlrrnplJnmgKfnmoTnibnngrnvvIzlubbkuJTorr7orqHnlKjmnaXpg6jnvbLlnKjkvY7lu4nnmoTvvIhsb3ctY29zdO+8ieehrOS7tuS4iu+8m+iAjOS4lOWug+aPkOS+m+mrmOWQnuWQkOmHj++8iGhpZ2ggdGhyb3VnaHB1dO+8ieadpeiuv+mXruW6lOeUqOeoi+W6j+eahOaVsOaNru+8jOmAguWQiOmCo+S6m+acieedgOi2heWkp+aVsOaNrumbhu+8iGxhcmdlIGRhdGEgc2V077yJ55qE5bqU55So56iL5bqP44CCSERGU+aUvuWuveS6hu+8iHJlbGF477yJUE9TSVjnmoTopoHmsYLvvIzlj6/ku6Xku6XmtYHnmoTlvaLlvI/orr/pl67vvIhzdHJlYW1pbmcgYWNjZXNz77yJ5paH5Lu257O757uf5Lit55qE5pWw5o2u44CC"
}
查询数据:
1、查询所有
GET book/_search
{
"query": {
"match_all": {}
}
}
结果: