pyspider爬虫学习-文档翻译-About-Tasks.md

About Tasks 关于任务
===========
任务是指被调度的基本单元
Tasks are the basic unit to be scheduled.

Basis
-----
每个任务都有不同的“taskid”。(默认为:“md5(url)”,可以通过重写“def get_taskid(self, task)” 方法经行修改)
* A task is differentiated by its `taskid`. (Default: `md5(url)`, can be changed by overriding the `def get_taskid(self, task)` method)
在不同的项目之间任务是相互隔离的
* Tasks are isolated between different projects.
每个任务有种状态
* A Task has 4 status:
    - active 运行
    - failed 失败
    - success 成功
    - bad 错误 - not used 非用户产生
  仅仅当任务为运行状态时才会被调度
* Only tasks in active status will be scheduled.
  任务按优先级顺序执行
* Tasks are served in order of `priority`.

Schedule
--------

#### new task
当一个新任务进来的时候
When a new task (never seen before) comes in:
  如果执行时间已经设置但是没有起作用,它将被放在基于时间的队列中等待。
* If `exetime` is set but not arrived, it will be put into a time-based queue to wait.
  否则将被接受
* Otherwise it will be accepted.
当这个任务已经在队列里面时
When the task is already in the queue:
除非强制更新否则忽略
* Ignored unless `force_update`
当一个任务完成退出
When a completed task comes out:
如果"age"已经设置,且`last_crawl_time + age < now`任务将被接受,否则丢弃
* If `age` is set, `last_crawl_time + age < now` it will be accepted. Otherwise discarded.
如果"itag"已经设置,且不等于它之前得值,任务将被接受,否则丢弃
* If `itag` is set and not equal to it's previous value, it will be accepted. Otherwise discarded.


#### task retry 任务重试
当发生读取错误或脚本错误时,任务将在默认情况下重试3次。
When a fetch error or script error happens, the task will retry 3 times by default.
首次重试将在30秒,1小时,6小时,12小时分别执行一次,更多的重试将在等待24小时后执行
The first retry will execute every time after 30 seconds, 1 hour, 6 hours, 12 hours and any more retries will postpone 24 hours.

如果“age”已经指定,那么重试延时将不会大于“age”
If `age` is specified, the retry delay will not larger then `age`.
你可以通过添加名为“retry_delay”的变量处理者的方式来配置重试延时,“retry_delay”是一个字典用来明确重试间隔,字典项格式为{retried: seconds},如果没有指定,就用一个特殊的key:''(空字符串)指定默认重试,
You can config the retry delay by adding a variable named `retry_delay` to handlerretry_delay. `retry_delay` is a dict to specify retry intervals. The items in the dict are {retried: seconds}, and a special key: '' (empty string) is used to specify the default retry delay if not specified.

这个默认的”retry_delay“ 声明如下
e.g. the default `retry_delay` declares like:
```
class MyHandler(BaseHandler):
    retry_delay = {
        0: 30,
        1: 1*60*60,
        2: 6*60*60,
        3: 12*60*60,
        '': 24*60*60
    }
```

转载于:https://my.oschina.net/sijinge/blog/1526543

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值