ElasticSearch48:初识搜索引擎_上机动手实战基于scroll技术滚动搜索大量数据


1.为什么使用srcoll滚动搜索
问题:如果一次性查询出100000条数据,那么性能会很差,此时一般会采用scroll滚动查询,一批一批的查,知道所有数据都查询完。

使用scroll滚动搜索,可以先搜索一批数据,然后下次再搜索一批数据,以此类推,知道搜索出全部的数据。
scroll搜索会在第一次搜索的时候,保存一个当时的视图快照,之后只会基于该旧的视图快照提供数据搜索,如果这个期间数据变更,是不会让用户看到的
采用基于_doc进行排序的方式,性能较高
每次发送scroll请求,我们还需要指定一个scroll参数,指定一个时间窗口,每次搜索请求只要在这个时间窗口内能完成就可以



2.scroll技术,看起来和分页很像,但是其实使用场景不一样,分页主要是一页一页搜索,展示给用户看,而scroll是主要是用来一批一批检索数据,让系统进行处理的。

3.例子:

第一次查询:查询出3条数据,scroll指定为1分钟

GET /test_index/test_type/_search?scroll=1m
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "_doc": {
        "order": "desc"
      }
    }
  ],
  "size":3
}

执行结果:

{
  "_scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAxqbFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManBY1SmNaRlRvOFRNR0FjQlI1cHNXS21nAAAAAAADGp4WNUpjWkZUbzhUTUdBY0JSNXBzV0ttZwAAAAAAAxqdFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManxY1SmNaRlRvOFRNR0FjQlI1cHNXS21n",
  "took": 40,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 11,
    "max_score": null,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "8",
        "_score": null,
        "_source": {
          "test_field": "jiaxing"
        },
        "sort": [
          0
        ]
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "2",
        "_score": null,
        "_source": {
          "test_field": "world xxxxxxxxxxxxxxxxxxxxxxxxxxxx"
        },
        "sort": [
          0
        ]
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "11",
        "_score": null,
        "_source": {
          "test_field": "shaoxing"
        },
        "sort": [
          0
        ]
      }
    ]
  }
}


第二次查询:在第一次查询的结果中,返回了执行的_scroll_id,复制下来,在第二次查询中带上
这个_scroll_id 是快照的id,这里面记录了上一次查询的信息,如查询排序,size等

GET /_search/scroll
{
  "scroll":"1m",
  "scroll_id":"DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAxqbFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManBY1SmNaRlRvOFRNR0FjQlI1cHNXS21nAAAAAAADGp4WNUpjWkZUbzhUTUdBY0JSNXBzV0ttZwAAAAAAAxqdFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManxY1SmNaRlRvOFRNR0FjQlI1cHNXS21n"
}

执行结果:

{
  "_scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAxqbFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManBY1SmNaRlRvOFRNR0FjQlI1cHNXS21nAAAAAAADGp4WNUpjWkZUbzhUTUdBY0JSNXBzV0ttZwAAAAAAAxqdFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManxY1SmNaRlRvOFRNR0FjQlI1cHNXS21n",
  "took": 15,
  "timed_out": false,
  "terminated_early": true,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 11,
    "max_score": null,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "9",
        "_score": null,
        "_source": {
          "test_field": "zhoushan"
        },
        "sort": [
          1
        ]
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "4",
        "_score": null,
        "_source": {
          "test_field": "hello hello"
        },
        "sort": [
          1
        ]
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "7",
        "_score": null,
        "_source": {
          "test_field": "huzhou"
        },
        "sort": [
          1
        ]
      }
    ]
  }
}

第三次搜索:重复第二次,下面的N次搜索只需要重复第二次搜索的操作即可


GET /_search/scroll
{
  "scroll":"1m",
  "scroll_id":"DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAxqbFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManBY1SmNaRlRvOFRNR0FjQlI1cHNXS21nAAAAAAADGp4WNUpjWkZUbzhUTUdBY0JSNXBzV0ttZwAAAAAAAxqdFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManxY1SmNaRlRvOFRNR0FjQlI1cHNXS21n"
}

执行结果:

{
  "_scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAxqbFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManBY1SmNaRlRvOFRNR0FjQlI1cHNXS21nAAAAAAADGp4WNUpjWkZUbzhUTUdBY0JSNXBzV0ttZwAAAAAAAxqdFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManxY1SmNaRlRvOFRNR0FjQlI1cHNXS21n",
  "took": 4,
  "timed_out": false,
  "terminated_early": true,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 11,
    "max_score": null,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "3",
        "_score": null,
        "_source": {
          "test_field": "test world"
        },
        "sort": [
          1
        ]
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "10",
        "_score": null,
        "_source": {
          "test_field": "lishui"
        },
        "sort": [
          2
        ]
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "1",
        "_score": null,
        "_source": {
          "test_field": "test hello001"
        },
        "sort": [
          2
        ]
      }
    ]
  }
}

第4次查询:查询到了剩下的最后两条数据,查询完毕
GET /_search/scroll
{
  "scroll":"1m",
  "scroll_id":"DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAxqbFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManBY1SmNaRlRvOFRNR0FjQlI1cHNXS21nAAAAAAADGp4WNUpjWkZUbzhUTUdBY0JSNXBzV0ttZwAAAAAAAxqdFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManxY1SmNaRlRvOFRNR0FjQlI1cHNXS21n"
}

执行结果:

{
  "_scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAxqbFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManBY1SmNaRlRvOFRNR0FjQlI1cHNXS21nAAAAAAADGp4WNUpjWkZUbzhUTUdBY0JSNXBzV0ttZwAAAAAAAxqdFjVKY1pGVG84VE1HQWNCUjVwc1dLbWcAAAAAAAManxY1SmNaRlRvOFRNR0FjQlI1cHNXS21n",
  "took": 10,
  "timed_out": false,
  "terminated_early": true,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 11,
    "max_score": null,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "12",
        "_score": null,
        "_source": {
          "test_field": "jinhua"
        },
        "sort": [
          3
        ]
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "5",
        "_score": null,
        "_source": {
          "test_field": "test 001"
        },
        "sort": [
          4
        ]
      }
    ]
  }
}


























评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值