9. Celery任务
9.1 安装
pip install requests --default-timeout=600
pip install xmltodict --default-timeout=600
pip install celery
ln -sv /usr/local/python3.7/bin/celery /usr/bin/celery
pip install redis
wget http://download.redis.io/releases/redis-4.0.6.tar.gz
tar zxvf redis-4.0.6.tar.gz
cd redis-4.0.6
make
make install PREFIX=/usr/local/redis
cp redis.conf /usr/local/redis/bin/redis.conf
修改redis.conf 以进程的方式运行 :
./redis-server redis.conf
9.2 定时任务
目录结果如下:
_init__.py 是通过启动项目时,选择的配置文件
celeryconfig.py里面主要是celery的配置
task是任务函数
9.2.1 测试样例
通过一个样例简单了解celery定时任务的使用,然后再去添加我们所需的任务逻辑函数。
__init__.py:
# coding=utf-8
from celery import Celery
app = Celery('Crawler')
app.config_from_object('celery_app.celeryconfig')
celeryconfig.py:
# coding=utf-8
from datetime import timedelta
from celery.schedules import crontab
# 配置broker为redis
BROKER_URL = 'redis://localhost:6379/1'
# 配置结果存储至redis
CELERY_RESULT_BACKEND = 'redis://localhost:6379/2'
# 时区设置
CELERY_TIMEZONE='Asia/Shanghai'
# 导入任务
CELERY_IMPORTS = (
'celery_app.task'
)
# 配置定时任务的调度器
CELERYBEAT_SCHEDULE={
# 任务名字
'crawler_task':{
# 任务启动的函数
'task':'celery_app.task.testf',
# 定时时间设置,每10秒一次
'schedule':timedelta(seconds=2),
# 传递的参数
'args':()
}
}
task.py:
# coding=utf-8
import time
from celery_app import app
@app.task
def testf():
'''
:return:
'''
print('test crawler funcion。。。。')
9.2.2 任务启动
启动定时任务
celery worker -A celery_app --pool=solo -l INFO
celery beat -A celery_app -l INFO
由上面示例可知,我们只需要在celeryconfig.py中增加任务函数配置和在task.py中实现对应的任务函数即可。
9.3 Crawler任务实现
Crawler的实现主要为上个项目实现,这里把主体一致过来,成为celery的定时任务。
关键代码:
# coding=utf-8
# 爬虫
from celery_app.crawler import const
from celery_app.crawler import info_ini
from celery_app.crawler import baidu_wd_dns
from celery_app.crawler import dns_ip
from celery_app.crawler import comm
# from celery_app.crawler .sql_opr import SqlOpr
from celery_app.crawler import sql_opr
def get_dns_detail(dns_list):
'''
:param dns_list:
:return:
'''
dns_detail_info = {}
cur_index = 0
total = len(dns_list)
for dns_name in dns_list:
cur_index += 1
web_ip = dns_ip.get_dns_ip(dns_name)
if not web_ip:
continue
if web_ip not in dns_detail_info.keys():
dns_detail_info[web_ip] = {}
dns_detail_info[web_ip]['dns_info'] = {}
dns_detail_info[web_ip]['belong'] = dns_ip.get_ip_belong(web_ip)
dns_detail_info[web_ip]['dns_info'][dns_name] = {}
dns_detail_info[web_ip]['dns_info'][dns_name]['link_dns'] = \
dns_ip.get_dns_from_ip(web_ip)
comm.log_print("域名解析({}/{}) : {}对应的ip地址为: {}".format(
cur_index, total, dns_name, web_ip))
return dns_detail_info
def crawler_main():
'''
主函数
:return:
'''
info_obj = sql_opr.get_all_keywords()
baidu_wd_dns.get_baidu_keyword_dns(info_obj.key_list)
for key_obj in info_obj.key_list:
dns_detail_info = get_dns_detail(key_obj.dns_list)
sql_opr.del_crawler_res(key_obj.craw_id)
sql_opr.add_crawler_res(dns_detail_info, key_obj.craw_id)
结果 :