About Tasks 关于任务 =========== 任务是指被调度的基本单元 Tasks are the basic unit to be scheduled. Basis ----- 每个任务都有不同的“taskid”。(默认为:“md5(url)”,可以通过重写“def get_taskid(self, task)” 方法经行修改) * A task is differentiated by its `taskid`. (Default: `md5(url)`, can be changed by overriding the `def get_taskid(self, task)` method) 在不同的项目之间任务是相互隔离的 * Tasks are isolated between different projects. 每个任务有种状态 * A Task has 4 status: - active 运行 - failed 失败 - success 成功 - bad 错误 - not used 非用户产生 仅仅当任务为运行状态时才会被调度 * Only tasks in active status will be scheduled. 任务按优先级顺序执行 * Tasks are served in order of `priority`. Schedule -------- #### new task 当一个新任务进来的时候 When a new task (never seen before) comes in: 如果执行时间已经设置但是没有起作用,它将被放在基于时间的队列中等待。 * If `exetime` is set but not arrived, it will be put into a time-based queue to wait. 否则将被接受 * Otherwise it will be accepted. 当这个任务已经在队列里面时 When the task is already in the queue: 除非强制更新否则忽略 * Ignored unless `force_update` 当一个任务完成退出 When a completed task comes out: 如果"age"已经设置,且`last_crawl_time + age < now`任务将被接受,否则丢弃 * If `age` is set, `last_crawl_time + age < now` it will be accepted. Otherwise discarded. 如果"itag"已经设置,且不等于它之前得值,任务将被接受,否则丢弃 * If `itag` is set and not equal to it's previous value, it will be accepted. Otherwise discarded. #### task retry 任务重试 当发生读取错误或脚本错误时,任务将在默认情况下重试3次。 When a fetch error or script error happens, the task will retry 3 times by default. 首次重试将在30秒,1小时,6小时,12小时分别执行一次,更多的重试将在等待24小时后执行 The first retry will execute every time after 30 seconds, 1 hour, 6 hours, 12 hours and any more retries will postpone 24 hours. 如果“age”已经指定,那么重试延时将不会大于“age” If `age` is specified, the retry delay will not larger then `age`. 你可以通过添加名为“retry_delay”的变量处理者的方式来配置重试延时,“retry_delay”是一个字典用来明确重试间隔,字典项格式为{retried: seconds},如果没有指定,就用一个特殊的key:''(空字符串)指定默认重试, You can config the retry delay by adding a variable named `retry_delay` to handlerretry_delay. `retry_delay` is a dict to specify retry intervals. The items in the dict are {retried: seconds}, and a special key: '' (empty string) is used to specify the default retry delay if not specified. 这个默认的”retry_delay“ 声明如下 e.g. the default `retry_delay` declares like: ``` class MyHandler(BaseHandler): retry_delay = { 0: 30, 1: 1*60*60, 2: 6*60*60, 3: 12*60*60, '': 24*60*60 } ```
转载于:https://my.oschina.net/sijinge/blog/1526543