周末打扰下哈: 在做导出时遇到 mongo读取蛮大数据量性能问题想请教下; 问题描述: 操作是提前查出数据总量计算分批,再利用mongodb的limit(M)和skip(N) 分页功能 来并发地去取出 dataList,由于(部分线程)skip的值N越大,mongo性能反应越来越慢(类似地mysql这种操作也会很低),导出性能也随之降低 换了一种思路是 利用mongodb本身对于where查询和limit()的效率高些,但不得不多传递上次查询(单线程)的那个“边界”条件,增加了工作量,也不够优雅, 借机会请教下接触大数据量的大牛哥们,你们的大数据量导出方案,能不能指点一二,谢谢
官网也是第二种思路:详情见: https://docs.mongodb.com/manual/reference/method/cursor.skip/
The cursor.skip() method requires the server to scan from the beginning of the input results set before beginning to return results. As the offset increases, cursor.skip() will become slower.
Using Range Queries
Range queries can use indexes to avoid scanning unwanted documents, typically yielding better performance as the offset grows compared to using cursor.skip() for pagination.
Descending Order
Use this procedure to implement pagination with range queries:
Choose a field such as _id which generally changes in a consistent direction over time and has a unique index to prevent duplicate values,
Query for documents whose field is less than the start value using the $lt and cursor.sort() operators, and
Store the last-seen field value for the next query.
For example, the following function uses the above procedure to print pages of student names from a collection, sorted approximately in order of newest documents first using the _id field (that is, in descending order):
function printStudents(startValue, nPerPage) {
let endValue = null;
db.students.find( { _id: { $lt: startValue } } )
.sort( { _id: -1 } )
.limit( nPerPage )
.forEach( student => {
print( student.name );
endValue = student._id;
} );
return endValue;
}
You may then use the following code to print all student names using this pagination function, using MaxKey to start from the largest possible key:
let currentKey = MaxKey;
while (currentKey !== null) {
currentKey = printStudents(currentKey, 10);
}
NOTE
While ObjectId values should increase over time, they are not necessarily monotonic. This is because they:
Only contain one second of temporal resolution, so ObjectId values created within the same second do not have a guaranteed ordering, and
Are generated by clients, which may have differing system clocks.
Ascending Order
Returning paginated results in ascending order is similar to the previous, but uses $gt with an ascending sort order:
function printStudents(startValue, nPerPage) {
let endValue = null;
db.students.find( { _id: { $gt: startValue } } )
.sort( { _id: 1 } )
.limit( nPerPage )
.forEach( student => {
print( student.name );
endValue = student._id;
} );
return endValue;
}
Using this function is likewise similar, but with MinKey as the starting key:
let currentKey = MinKey;
while (currentKey !== null) {
currentKey = printStudents(currentKey, 10);
}
待完善..............
总之:好的设计胜于调优