mongodb大数据查询通过skip会非常慢的问题(未解决)

周末打扰下哈: 在做导出时遇到 mongo读取蛮大数据量性能问题想请教下; 问题描述: 操作是提前查出数据总量计算分批,再利用mongodb的limit(M)和skip(N) 分页功能 来并发地去取出 dataList,由于(部分线程)skip的值N越大,mongo性能反应越来越慢(类似地mysql这种操作也会很低),导出性能也随之降低 换了一种思路是 利用mongodb本身对于where查询和limit()的效率高些,但不得不多传递上次查询(单线程)的那个“边界”条件,增加了工作量,也不够优雅, 借机会请教下接触大数据量的大牛哥们,你们的大数据量导出方案,能不能指点一二,谢谢

官网也是第二种思路:详情见: https://docs.mongodb.com/manual/reference/method/cursor.skip/

 

The cursor.skip() method requires the server to scan from the beginning of the input results set before beginning to return results. As the offset increases, cursor.skip() will become slower.

Using Range Queries
Range queries can use indexes to avoid scanning unwanted documents, typically yielding better performance as the offset grows compared to using cursor.skip() for pagination.

Descending Order
Use this procedure to implement pagination with range queries:

Choose a field such as _id which generally changes in a consistent direction over time and has a unique index to prevent duplicate values,
Query for documents whose field is less than the start value using the $lt and cursor.sort() operators, and
Store the last-seen field value for the next query.
For example, the following function uses the above procedure to print pages of student names from a collection, sorted approximately in order of newest documents first using the _id field (that is, in descending order):

function printStudents(startValue, nPerPage) {
  let endValue = null;
  db.students.find( { _id: { $lt: startValue } } )
             .sort( { _id: -1 } )
             .limit( nPerPage )
             .forEach( student => {
               print( student.name );
               endValue = student._id;
             } );

  return endValue;
}
You may then use the following code to print all student names using this pagination function, using MaxKey to start from the largest possible key:

let currentKey = MaxKey;
while (currentKey !== null) {
  currentKey = printStudents(currentKey, 10);
}
NOTE

While ObjectId values should increase over time, they are not necessarily monotonic. This is because they:

Only contain one second of temporal resolution, so ObjectId values created within the same second do not have a guaranteed ordering, and
Are generated by clients, which may have differing system clocks.
Ascending Order
Returning paginated results in ascending order is similar to the previous, but uses $gt with an ascending sort order:

function printStudents(startValue, nPerPage) {
  let endValue = null;
  db.students.find( { _id: { $gt: startValue } } )
             .sort( { _id: 1 } )
             .limit( nPerPage )
             .forEach( student => {
               print( student.name );
               endValue = student._id;
             } );

  return endValue;
}
Using this function is likewise similar, but with MinKey as the starting key:

let currentKey = MinKey;
while (currentKey !== null) {
  currentKey = printStudents(currentKey, 10);
}

待完善..............

总之:好的设计胜于调优

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值