ElasticSearchServer(扩展索引结构)读书笔记

最新推荐文章于 2023-11-01 13:38:56 发布

icool_ali

最新推荐文章于 2023-11-01 13:38:56 发布

阅读量203

点赞数

本文链接：https://blog.csdn.net/icool_ali/article/details/81534894

版权

1 索引树形结构：

通过以下代码创建一个简单的索引结构

可以看到，我们创建了一个类型： category。我们将使用它在树型结构中存储文档位置的
信息。

curl -XPUT 'localhost:9200/path' -d '
{
	"settings": {
		"index": {
			"analysis": {
				"analyzer": {
					"path_analyzer": {
						"tokenizer": "path_hierarchy"
					}
				}
			}
		}
	},
	"mappings": {
		"category": {
			"properties": {
				"category": {
					"type": "string",
					"fields": {
						"name": {
							"type": "string",
							"index": "not_analyzed"
						},
						"path": {
							"type": "string",
							"analyzer": "path_analyzer",
							"store": true
						}
					}
				}
			}
		}
	}
}

curl -XGET 'localhost:9200/path/_analyze?field=category.path&pretty' -d'/cars/passenger/sport'
可以看到， Elasticsearch把类别路径/cars/passenger/sport处理并分解成三个标记。归功于此，
我们很容易通过词条过滤器找到每个属于指定类别或子类别的文档。举例如下：

{
    "filter" : {
        "term" : { "category.path" : "/cars" }
    }
}

2 索引非扁平数据：

数据：可以看到，在上面的代码中，数据并非是扁平的；它包含了数组和嵌套对象。如果要运用目
前为止学到的知识创建映射，我们不得不将数据变为扁平。然而， Elasticsearch允许文档中存在
一定程度的结构，可以创建能够处理上述示例的映射。

对象：示例文档的根对象是book，具备一些额外、简单的属性，比如englishTitle
现在，让我们关注author对象，可以看到，它还嵌套了另一个有两个属性firstName和lastName的对象name。

数组：我们已经使用过数组类型的数据，但未详细讨论。默认情况下，在Lucene中的所有字段都是
多值的，因此在Elasticsearch中也是一样，这意味着它们可以存储多个值。为了索引这些字段，
我们使用JSON数组类型，嵌套在中括号[]中。上述示例里，我们对book中的characters使用了
数组类型。
映射：为索引数组，只需要在数组名称中指定字段的属性。因此，在我们的例子中，可添加以下映
射来索引characters的数据：

{
	"book": {
		"author": {
			"name": {
				"firstName": "Fyodor",
				"lastName": "Dostoevsky"
			}
		},
		"isbn": "123456789",
		"englishTitle": "Crime and Punishment",
		"year": 1886,
		"characters": [{
				"name": "Raskolnikov"
			},
			{
				"name": "Sofia"
			}
		],
		"copies": 0
	}
}

3 使用嵌套对象
基本上，通过使用嵌套对象， Elasticsearch允许我们连接
一个主文档和多个附属文档。主文档及嵌套文档一同被索引，放置于索引的同一段上（实际在同
一块上），确保为该数据结构获取最佳性能。更改文档也是一样的，除非使用更新API，你需要同
时索引父文档和其他所有嵌套文档

{
	"cloth": {
		"properties": {
			"name": {
				"type": "string",
				"index": "analyzed"
			},
			"variation": {
				"type": "nested",
				"properties": {
					"size": {
						"type": "string",
						"index": "not_analyzed"
					},
					"color": {
						"type": "string",
						"index": "not_analyzed"
					}
				}
			}
		}
	}
}

4 使用父子关系
索引结构和数据索引：假想的服装店。然而我们希望的是：在每次变更后，无需索引整个文档即可更新尺寸和颜色
父文档映射：在父文档中， name是我们需要的唯一字段。因此，在shop索引中创建cloth类型，执行如
下命令；

curl -XPOST 'localhost:9200/shop'
curl -XPUT 'localhost:9200/shop/cloth/_mapping' -d '{
    "cloth" : {
        "properties" : {
        "name" : {"type" : "string"}
        }
    }
}'

子文档映射：为创建子文档映射，要在_parent属性中添加父类型的名称，在我们的示例中为cloth。因
此，创建类型variation的命令；

curl -XPUT 'localhost:9200/shop/variation/_mapping' -d '{
    "variation" : {
        "_parent" : { "type" : "cloth" },
        "properties" : {
            "size" : {"type" : "string", "index" : "not_analyzed"},
            "color" : {"type" : "string", "index" : "not_analyzed"}
        }
    }
}

父文档：我们来索引父文档。操作很简单，只要执行索引命令

curl -XPOST 'localhost:9200/shop/cloth/1' -d '{
"name" : "Test shirt"
}

子文档：为索引子文档，需要使用parent参数提供父文档的相关信息，将该参数设置为父文档的标
识符。所以，为索引父文档中的两个子文档，执行下面的命令：

curl -XPOST 'localhost:9200/shop/variation/1000?parent=1' -d '{
"color" : "red",
"size" : "XXL"
}'

同样，执行如下命令行索引第二个子文档：

curl -XPOST 'localhost:9200/shop/variation/1001?parent=1' -d '{
"color" : "black",
"size" : "XL"
}'

这样，我们索引了两个附加文档，它们是新类型，但是我们已为其指定标识符为1的父文档

查询：除has_child查询之外， Elasticsearch还公开了top_children查询，它查询子文档但返回
父文档。此查询可针对特定数量的子文档，如果想要返回与父文档中指定数据匹配的子文档，可使用类似于has_child的查询：
has_parent。然而，我们用父文档类型的值指定parent_type属性，而不是type属性。这么这个查询将返回索引的子文档，而不是父文档

父子关系和过滤：如果想要将父子查询作为过滤器使用，可以用过滤器has_child和has_parent，它们具备
了与has_child和has_parent查询相同的功能。实际上， Elasticsearch将那些过滤器封装为常数
得分查询，使其可作为查询使用。

性能考虑：使用Elasticsearch父子的功能时，必须注意它的性能影响。需要记住的第一件事是父子文档
需要存储在相同的分片中，查询才能够工作。如果单一父文档有大量的子文档，可能导致分片上
的文档数量不平均。因此，其中的一个节点的性能会降低，造成整个查询速度变慢。另外，请记
住，比起查询无任何关联的文档，父子查询的速度较慢。注意执行has_child等查询时， Elasticsearch需要预加载并缓存文档
标识符。这些标识符将存储在内存中，必须确保Elasticsearch有足够的内存。否则，你将得到
OutOfMemory异常，节点或整个集群将无法运作。最后，我们提到过，首次查询将花一定时间预加载和缓存文档标识符。为了提升首次查询父子关系文档的性能，可以使用预热API。

5 使用更新 API 修改索引结构
为映射添加新字段:我们假设要为每个存储的用户添加一个电话号码。为此，
需要将HTTP PUT命令发送到带有合适主体的/index_name/type_name/_mapping REST端
点，该主体中包含我们的新字段。例如，为添加phone字段，执行以下命令

curl -XPUT 'http://localhost:9200/users/user/_mapping' -d '
{
	"user": {
		"properties": {
			"phone": {
				"type": "string",
				"store": "yes",
				"index": "not_analyzed"
			}
		}
	}
}

修改字段：现在，我们的索引结构包含两个字段： name和phone。我们索引了一些数据，但之后又决定
搜索phone字段，并希望更改index属性，从not_analyzed改为analyzed，为此，执行以下命令：

curl -XPUT 'http://localhost:9200/users/user/_mapping' -d '{
 "user" : {
    "properties" : {
        "phone" : {"type" : "string",
                "store" : "yes",
                "index" : "analyzed"}
            }
    }
}

icool_ali

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
ElasticSearchServer(扩展索引结构)读书笔记

1 索引树形结构：通过以下代码创建一个简单的索引结构可以看到，我们创建了一个类型： category。我们将使用它在树型结构中存储文档位置的信息。curl -XPUT 'localhost:9200/path' -d '{ "settings": { "index": { "analysis": { "analyzer": { "path_analy...
复制链接

扫一扫