2021SC@SDUSC
具体 _process_queue
又做了什么呢
class Downloader(object):
# ...
def _process_queue(self, spider, slot):
if slot.latercall and slot.latercall.active():
return
# Delay queue processing if a download_delay is configured
now = time()
delay = slot.download_delay()
if delay:
penalty = delay - now + slot.lastseen
if penalty > 0:
slot.latercall = reactor.callLater(penalty, self._process_queue, spider, slot)
return
# Process enqueued requests if there are free slots to transfer for this slot
while slot.queue and slot.free_transfer_slots() > 0:
slot.lastseen = now
request, deferred = slot.queue.popleft()
dfd = self._download(slot, request, spider)
dfd.chainDeferred(deferred)
# prevent burst if inter-request delays were configured
if delay:
self._process_queue(spider, slot)
break
- 如果当前的 slot 已经有计划下次执行了,直接退出
- 如果当前的 slot 需要延迟下载,并且还没到执行时间,则使用 callLater 其加入事件循环延迟下载,并将返回结果赋值给 slot 的 latercall
- 如果当前的 slot 的下载队列有未处理的请求并且没达到并发限制,执行下载方法
_download
class Downloader(object):
# ...
def _download(self, slot, request, spider):
dfd = mustbe_deferred(self.handlers.download_request, request, spider)
def _downloaded(response):
self.signals.send_catch_log(signal=signals.response_downloaded,
response=response,
request=request,
spider=spider)
return response
dfd.addCallback(_downloaded)
slot.transferring.add(request)
def finish_transferring(_):
slot.transferring.remove(request)
self._process_queue(spider, slot)
return _
return dfd.addBoth(finish_transferring)
- 针对不同的协议调用不同的 hanler 下载请求,比如有 HTTP、FTP 等 handler,感兴趣的可以从
download_request
跟下去,这里不展开 - 注册成功回调,下载完成后,发送
response_downloaded
信号 - 将请求加入 slot 的正在传输的集合 transferring
- 注册成功和失败回调,完成后从 slot 的 transferring 移除当前请求,并再次触发
_process_queue
方法