索引
索引如同字典的目录一样,是用来加速查询的。具有正确索引的查询会比没有索引的查询快几个数量级,当随着数据量级变大时,愈发明显。
索引简介
首先我们准备一个集合author_test_collection
,集合中的文档主要包括了,以下几个字段:name
和age
,其中name
的类型是字符串,而age
是整型。
记住,任何文档都会包含一个
_id
字段
集合中包含了100万个元素。
> db.author_test_collection.count();
1000000
查找name
为Author-3000
的文档,通过explain
命令,可以输出执行计划。
> db.author_test_collection.find({"name":"Author-3000"}).explain("executionStats")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.author_test_collection",
"indexFilterSet" : false,
"parsedQuery" : {
"name" : {
"$eq" : "Author-3000"
}
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"name" : {
"$eq" : "Author-3000"
}
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 652,
"totalKeysExamined" : 0,
"totalDocsExamined" : 1000000,
"executionStages" : {
"stage" : "COLLSCAN",
"filter" : {
"name" : {
"$eq" : "Author-3000"
}
},
"nReturned" : 1,
"executionTimeMillisEstimate" : 570,
"works" : 1000002,
"advanced" : 1,
"needTime" : 1000000,
"needYield" : 0,
"saveState" : 7812,
"restoreState" : 7812,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 1000000
}
},
"serverInfo" : {
"host" : "c40e51fb2caa",
"port" : 27017,
"version" : "3.6.9",
"gitVersion" : "167861a164723168adfaaa866f310cb94010428f"
},
"ok" : 1
}
我们先不用关注执行计划中输出的内容细节,看几个主要的数据指标。
问题 | 分析 |
---|---|
查询走全表扫描 | stage的描述是COLLSCAN ,这个表示全表扫描 |
执行耗时比较长 | executionTimeMillis 执行耗时在652毫秒,这个已经很长了 |
工作开销很大 | works 是MongoDB将操作分解为更细力度的操作单元,这里要耗费1000002个 |
通过使用ensureIndex
可以创建集合的索引,这里根据name
属性创建对应的索引:
> db.author_test_collection.ensureIndex({"name": 1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
ensureIndex
会在指定的key上建立索引,同时第二个参数表示该索引的方向,其中1
表示升序,而-1
表示降序。对于大多数查询场景来说,区别不大,因为对于索引的执行总是从中间开始,但是升序会保证最开始的数据会在内存中,这是一个细微的差别。
MongoDB的索引和MySQL索引类似,都是左匹配。
使用getIndexes
命令,可以查看当前集合上的索引情况。
> db.author_test_collection.getIndexes()
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.author_test_collection"
},
{
"v" : 2,
"key" : {
"name" : 1
},
"name" : "name_1",
"ns" : "test.author_test_collection"
}
]
可以看到,集合author_test_collection
上有两个索引,其中name_1
是新添加的。新增索引后,再次执行先前的执行计划,看一下效果。
> db.author_test_collection.find({"name":"Author-3000"}).explain("executionStats")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.author_test_collection",
"indexFilterSet" : false,
"parsedQuery" : {
"name" : {
"$eq" : "Author-3000"
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"name" : 1
},
"indexName" : "name_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"name" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"name" : [
"[\"Author-3000\", \"Author-3000\"]"
]
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 1,
"totalKeysExamined" : 1,
"totalDocsExamined" : 1,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 1,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"name" : 1
},
"indexName" : "name_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"name" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"name" : [
"[\"Author-3000\", \"Author-3000\"]"
]
},
"keysExamined" : 1,
"seeks" : 1,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
}
},
"serverInfo" : {
"host" : "c40e51fb2caa",
"port" : 27017,
"version" : "3.6.9",
"gitVersion" : "167861a164723168adfaaa866f310cb94010428f"
},
"ok" : 1
}
可以看到stage
已经从 COLLSCAN 变为了 IXSCAN,也就是说从全表扫描变为了按照索引扫描。有了索引,就好比查字典时有了字母的顺序,使得不用翻遍整个字典才能找到需要的内容,工作量也会减少。可以看到work
只有4(2 + 2)
,从1000002下降到4,是非常显著的,相差了百万个量级。
但是并不是我们将需要查询的字段都放到索引中就可以,有时我们的查询需要考虑顺序,比如:
db.author_test_collection.find({"name":"Author-2000"}).sort({"age":1})
该查询用来查询名称为Author-2000
的所有作者,同时按照age
正序(小到大)排序返回。可以输出以下执行计划(重点部分)。
"executionStages" : {
"stage" : "SORT",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 5,
"advanced" : 1,
"needTime" : 3,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"sortPattern" : {
"age" : 1
},
"memUsage" : 118,
"memLimit" : 33554432,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 3,
"advanced" : 1,
"needTime" : 1,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"inputStage" : {
"stage" : "FETCH",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 1,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"name" : 1
},
"indexName" : "name_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"name" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"name" : [
"[\"Author-2000\", \"Author-2000\"]"
]
},
"keysExamined" : 1,
"seeks" : 1,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
}
}
},
"allPlansExecution" : [ ]
可以看到首先查询的数据会消耗2个work
通过IXSCAN
索引扫描命中数据,然后经过FETCH
阶段,消耗2个work
将数据取出,取出的数据需要经历SORT_KEY_GENERATOR
消耗1个work
进行排序器的生成(虽然只有一个数据),最终通过SORT
阶段消耗5个work
将数据完成排序后返回。
总计开销是10个
work
对于数据的查询输入,有name
和age
两个参数,在age
上有排序,下面在name
和age
上建立索引。
db.author_test_collection.ensureIndex({"name":1, "age":1})
然后再次执行,可以看到执行计划有所变化。
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 3,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 1,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"name" : 1,
"age" : 1
},
"indexName" : "name_1_age_1",
"isMultiKey" : false,