0X-1 踩坑记录
2019-7-6踩坑
场景描述:投放两个任务对数据表同一条记录的不同字段进行更新。问题在于两个任务要不是A任务没有更新成功,要不就是B任务没有更新成功,总是有字段没有更新成功。该问题在Worker数量>1才会出现,因为Worker数量等于1时,所有的Task是线性执行的,存在多个Worker的话,多个任务会并行执行。
后经查阅资料发现,这是由于Django ORM的save方法引起的,当进行了UPDATE操作后,save的时候UPDATE的是这条记录的所有字段,而不是更新的那个字段。如果别的字段在该事务提交之前被更新了,那么该事务提交的时候将会把脏数据更新到数据库。
解决方法:save的时候指定更新的字段,仅仅更新那个字段
obj.save(update_fields=['name'])
传送门:https://blog.csdn.net/yongche_shi/article/details/49096043
0X00 什么是Celery
任务队列是一种在线程或机器间分发任务的机制。
消息队列的输入是工作的一个单元,称为任务,独立的职程(Worker)进程持续监视队列中是否有需要处理的新任务。
Celery 用消息通信,通常使用中间人(Broker)在客户端和职程间斡旋。这个过程从客户端向队列添加消息开始,之后中间人把消息派送给职程。
Celery 系统可包含多个职程和中间人,以此获得高可用性和横向扩展能力。
Celery 是用 Python 编写的,但协议可以用任何语言实现。迄今,已有 Ruby 实现的 RCelery 、node.js 实现的 node-celery 以及一个 PHP 客户端 ,语言互通也可以通过 using webhooks 实现。
0X01 DEMO
1.编写一个应用,tasks.py
from celery import Celery
app = Celery('tasks', broker='amqp://guest@localhost//')
# 'tasks'为当前模块名称,broker指定所使用的消息中间件
@app.task
def add(x, y):
return x + y
2.运行Worker服务器
celery worker -A tasks -l info
3.运行了Worker服务器后,tasks.py会被加入到python sys_path,可以直接导入使用
from tasks import add
result = add.delay(4, 4)
"""
这个任务已经由之前启动的职程执行,并且你可以查看职程的控制台输出来验证。
调用任务会返回一个 AsyncResult 实例,可用于检查任务的状态,等待任务完成或获取返回值(如果任务失败,则为异常和回溯)。 但这个功能默认是不开启的,你需要设置一个 Celery 的结果后端,下一节将会详细介绍。
"""
result.ready() # 查看任务是否完成
result.result # 获取任务执行结果
4.从py文件更新celery配置
app.config_from_object('celeryconfig')
配置文件celeryconfig.py
BROKER_URL = 'amqp://'
CELERY_RESULT_BACKEND = 'amqp://'
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_ACCEPT_CONTENT=['json']
CELERY_ENABLE_UTC = True
验证配置文件是否合法:
python -m celeryconfig
0X02 使用django-celery
1.安装
apt-get install rabbitmq-server
pip install celery
pip install django-celery
2.加载
INSTALLED_APPS = [
'djcelery',
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
]
0X03 使用django-celery-beat
一个DEMO:https://github.com/celery/celery/tree/master/examples/django/
celery-4.3.0
django-celery-beat-1.5.0
1、同步数据库
python3 manage.py migrate
2、运行WebServer
3、运行celery worker
python3 -m celery worker -A celery_test -l info --uid=113 --gid=116
celery -A celery_test beat -l info --scheduler django_celery_beat.schedulers:DatabaseScheduler --detach # --detach后台运行
使用过程中发现BUG太多,暂不使用。可在Linux层面使用crontab来实现周期任务
0X04 最新配置
仅使用celery:
pip install celery==4.3.0
# BROKER和BACKEND可以使用redis和RabbitMQ
yum install rabbitmq-server
# yum install redis
不使用django-celery的原因:
安装django-celery会默认安装3.1版本的celery,而celery最新版为4.3,为了减少BUG,使用最新版比较妥当。而且使用celery已经可以满足调用任务队列的需求,不是太需要django层面的封装。django-celery-beat倒是比较有用,支持周期性下发任务,并可以改变周期。
目录结构,Django项目:
celery_test/
├── celery_test
│ ├── celery_config.py # celery配置文件
│ ├── celery.py # celery实例APP
│ ├── __init__.py # 在init加载celery,让celery跟随Django启动
│ ├── settings.py
│ ├── urls.py
│ └── wsgi.py
├── main
│ ├── admin.py
│ ├── apps.py
│ ├── __init__.py
│ ├── migrations
│ │ └── __init__.py
│ ├── models.py
│ ├── tasks.py # 任务,放在每个APP目录下的tasks.py文件中
│ ├── tests.py
│ └── views.py
├── manage.py
1、celery_config.py
# -*- coding: utf-8 -*-
# @Time : 2019/7/3 17:17
# @Author : Zcs
# @File : celery_config.py
CELERY_BROKER_URL= 'amqp://guest:%s@localhost//' % config['RABBIT_MQ']['PASSWD'] # 使用RabbitMQ
CELERY_RESULT_BACKEND = 'amqp://guest:%s@localhost//' % config['RABBIT_MQ']['PASSWD'] # 使用RabbitMQ
CELERY_TIMEZONE = 'Asia/Shanghai' # 时区设置
CELERY_TASK_SERIALIZER = 'pickle' # 任务序列化器
CELERY_RESULT_SERIALIZER = 'pickle' # 结果序列化器
CELERY_ACCEPT_CONTENT = ['json', 'pickle']
CELERY_RESULT_EXPIRES = 3600
# CELERY_WORKER_LOG_FORMAT = '%(asctime)s [%(module)s %(levelname)s] %(message)s'
# CELERY_WORKER_TASK_LOG_FORMAT = '%(task_id)s %(task_name)s %(message)s'
CELERY_WORKER_TASK_LOG_FORMAT = '%(message)s'
CELERY_WORKER_LOG_FORMAT = '%(message)s'
CELERY_TASK_EAGER_PROPAGATES = True
CELERY_WORKER_REDIRECT_STDOUTS = True
CELERY_WORKER_REDIRECT_STDOUTS_LEVEL = "INFO"
# CELERY_WORKER_HIJACK_ROOT_LOGGER = True
CELERY_WORKER_MAX_TASKS_PER_CHILD = 40
CELERY_TASK_SOFT_TIME_LIMIT = 3600
详细配置:http://docs.celeryproject.org/en/latest/userguide/configuration.html?highlight=CELERYD_CONCURRENCY
2、celery.py
# -*- coding: utf-8 -*-
# @Time : 2019/6/28 9:54
# @Author : Zcs
# @File : celery.py
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery, platforms
from celery_test import settings
# 支持使用root用户启动worker
platforms.C_FORCE_ROOT = True
# 解决win64下的一个bug需要用到该设置
os.environ.setdefault('FORKED_BY_MULTIPROCESSING', '1')
# 设置celery命令行默认环境变量,不添加该变量celery会找不到各APP的tasks
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'celery_test.settings')
# 创建celery实例
app = Celery('celery_test')
# 配置参数,使用celery_config
app.config_from_object('celery_test.celery_config')
# 设置配置以CELERY_开头
app.namespace = 'CELERY'
# 加载所有APP中的tasks.py
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
# @shared_task 装饰器能让你在没有具体的 Celery 实例时创建任务
@app.task(bind=True)
def debug_task(self):
print('Request: {0!r}'.format(self.request))
3、__init__.py
from __future__ import absolute_import, unicode_literals
# This will make sure the app is always imported when
# Django starts so that shared_task will use this app.
from .celery import app as celery_app
__all__ = ('celery_app',)
4、tasks.py
# -*- coding: utf-8 -*-
# @Time : 2019/6/27 14:44
# @Author : Zcs
# @File : tasks.py
from celery_test.celery import app
import time
@app.task # 该装饰器将目标函数变成任务,返回值为任务的uniq_id,写在函数中的返回值不会直接返回给调用者,而是在任务完成后返回到BACKEND
def send_mail(arg):
#time.sleep(20)
return arg
5、运行项目
python3 manage.py runserver 0.0.0.0:8000
celery worker -A project_name -l info --uid=993 --gid=989 # 启动worker,-A指定celery app目录,在本例中应为"celery_test"
flower --port=5555 --broker='redis://127.0.0.1:6379/2' # 运行celery监控,浏览器访问5555端口即可
6.调用任务
from django.views import View
from .tasks import send_mail
from django.http import HttpResponse
class main_view(View):
def get(self, request):
r = send_mail.delay('a') # 返回的r不为Done,而是该任务的uniq_id
return HttpResponse(r)
运行celery worker的一些参数:
Examples:
$ celery worker --app=proj -l info
$ celery worker -A proj -l info -Q hipri,lopri
$ celery worker -A proj --concurrency=4
$ celery worker -A proj --concurrency=1000 -P eventlet
$ celery worker --autoscale=10,0
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
Global Options:
-A APP, --app APP
-b BROKER, --broker BROKER
--result-backend RESULT_BACKEND
--loader LOADER
--config CONFIG
--workdir WORKDIR Optional directory to change to after detaching.
--no-color, -C
--quiet, -q
Worker Options:
-n HOSTNAME, --hostname HOSTNAME
Set custom hostname (e.g., 'w1@%h'). Expands: %h
(hostname), %n (name) and %d, (domain).
-D, --detach Start worker as a background process.
-S STATEDB, --statedb STATEDB
Path to the state database. The extension '.db' may be
appended to the filename. Default: None
-l LOGLEVEL, --loglevel LOGLEVEL
Logging level, choose between DEBUG, INFO, WARNING,
ERROR, CRITICAL, or FATAL.
-O OPTIMIZATION
--prefetch-multiplier PREFETCH_MULTIPLIER
Set custom prefetch multiplier value for this worker
instance.
Pool Options:
-c CONCURRENCY, --concurrency CONCURRENCY
Number of child processes processing the queue. The
default is the number of CPUs available on your
system.
-P POOL, --pool POOL Pool implementation: prefork (default), eventlet,
gevent or solo.
-E, --task-events, --events
Send task-related events that can be captured by
monitors like celery events, celerymon, and others.
--time-limit TIME_LIMIT
Enables a hard time limit (in seconds int/float) for
tasks.
--soft-time-limit SOFT_TIME_LIMIT
Enables a soft time limit (in seconds int/float) for
tasks.
--max-tasks-per-child MAX_TASKS_PER_CHILD, --maxtasksperchild MAX_TASKS_PER_CHILD
Maximum number of tasks a pool worker can execute
before it's terminated and replaced by a new worker.
--max-memory-per-child MAX_MEMORY_PER_CHILD, --maxmemperchild MAX_MEMORY_PER_CHILD
Maximum amount of resident memory, in KiB, that may be
consumed by a child process before it will be replaced
by a new one. If a single task causes a child process
to exceed this limit, the task will be completed and
the child process will be replaced afterwards.
Default: no limit.
Queue Options:
--purge, --discard Purges all waiting tasks before the daemon is started.
**WARNING**: This is unrecoverable, and the tasks will
be deleted from the messaging server.
--queues QUEUES, -Q QUEUES
List of queues to enable for this worker, separated by
comma. By default all configured queues are enabled.
Example: -Q video,image
--exclude-queues EXCLUDE_QUEUES, -X EXCLUDE_QUEUES
List of queues to disable for this worker, separated
by comma. By default all configured queues are
enabled. Example: -X video,image.
--include INCLUDE, -I INCLUDE
Comma separated list of additional modules to import.
Example: -I foo.tasks,bar.tasks
Features:
--without-gossip Don't subscribe to other workers events.
--without-mingle Don't synchronize with other workers at start-up.
--without-heartbeat Don't send event heartbeats.
--heartbeat-interval HEARTBEAT_INTERVAL
Interval in seconds at which to send worker heartbeat
--autoscale AUTOSCALE
Enable autoscaling by providing max_concurrency,
min_concurrency. Example:: --autoscale=10,3 (always
keep 3 processes, but grow to 10 if necessary)
Daemonization Options:
-f LOGFILE, --logfile LOGFILE
Path to log file. If no logfile is specified, stderr
is used.
--pidfile PIDFILE Optional file used to store the process pid. The
program won't start if this file already exists and
the pid is still alive.
--uid UID User id, or user name of the user to run as after
detaching.
--gid GID Group id, or group name of the main group to change to
after detaching.
--umask UMASK Effective umask(1) (in octal) of the process after
detaching. Inherits the umask(1) of the parent process
by default.
--executable EXECUTABLE
Executable to use for the detached process.
Embedded Beat Options:
-B, --beat Also run the celery beat periodic task scheduler.
Please note that there must only be one instance of
this service. .. note:: -B is meant to be used for
development purposes. For production environment, you
need to start celery beat separately.
-s SCHEDULE_FILENAME, --schedule-filename SCHEDULE_FILENAME, --schedule SCHEDULE_FILENAME
Path to the schedule database if running with the -B
option. Defaults to celerybeat-schedule. The extension
".db" may be appended to the filename. Apply
optimization profile. Supported: default, fair
--scheduler SCHEDULER
Scheduler class to use. Default is
celery.beat.PersistentScheduler
6、可以给redis或rabbitmq设置密码登录,改一下配置即可
修改rabbitmq默认guest用户的密码(默认无密码,但只允许本地登录):
rabbitmqctl change_password guest your_password
修改后更改celery_config.py中的配置:
CELERY_BROKER_URL= 'amqp://guest:your_password@localhost//'
CELERY_RESULT_BACKEND = 'amqp://guest:your_password@localhost//'
redis:
CELERY_BROKER_URL= 'redis://:your_password@127.0.0.1:6379/2'
CELERY_RESULT_BACKEND = 'redis://:your_password@127.0.0.1:6379/2'
7、更多的需求
1)让多个任务链式执行:
https://celery.readthedocs.io/en/latest/userguide/canvas.html#chains
>>> from celery import chain
>>> from proj.tasks import add, mul
>>> # (4 + 4) * 8 * 10
>>> res = chain(add.s(4, 4), mul.s(8), mul.s(10))
proj.tasks.add(4, 4) | proj.tasks.mul(8) | proj.tasks.mul(10)
https://www.cnblogs.com/wdliu/p/9517535.html
2)让多个任务链式执行,但不传递上个任务的结果
chain,管道,chord默认都会将上个任务结果传给下个任务,对于只想链式执行的任务,没有必要传递结果,默认传递结果还会使你的参数不容易配置。所以使用si()函数完成异步调用,而不是s()。当然,这并不影响你传递自己想要传递的参数,EXP:
task = chain(stop.si(),
init.si(args),
start.si())
task()