Django使用Celery

0X-1 踩坑记录

2019-7-6踩坑

场景描述:投放两个任务对数据表同一条记录的不同字段进行更新。问题在于两个任务要不是A任务没有更新成功,要不就是B任务没有更新成功,总是有字段没有更新成功。该问题在Worker数量>1才会出现,因为Worker数量等于1时,所有的Task是线性执行的,存在多个Worker的话,多个任务会并行执行。

后经查阅资料发现,这是由于Django ORM的save方法引起的,当进行了UPDATE操作后,save的时候UPDATE的是这条记录的所有字段,而不是更新的那个字段。如果别的字段在该事务提交之前被更新了,那么该事务提交的时候将会把脏数据更新到数据库。

解决方法:save的时候指定更新的字段,仅仅更新那个字段

obj.save(update_fields=['name'])

传送门:https://blog.csdn.net/yongche_shi/article/details/49096043

0X00 什么是Celery

任务队列是一种在线程或机器间分发任务的机制。

消息队列的输入是工作的一个单元,称为任务,独立的职程(Worker)进程持续监视队列中是否有需要处理的新任务。

Celery 用消息通信,通常使用中间人(Broker)在客户端和职程间斡旋。这个过程从客户端向队列添加消息开始,之后中间人把消息派送给职程。

Celery 系统可包含多个职程和中间人,以此获得高可用性和横向扩展能力。

Celery 是用 Python 编写的,但协议可以用任何语言实现。迄今,已有 Ruby 实现的 RCelery 、node.js 实现的 node-celery 以及一个 PHP 客户端 ,语言互通也可以通过 using webhooks 实现。

0X01 DEMO

1.编写一个应用,tasks.py

from celery import Celery

app = Celery('tasks', broker='amqp://guest@localhost//')
#  'tasks'为当前模块名称,broker指定所使用的消息中间件

@app.task
def add(x, y):
    return x + y

2.运行Worker服务器

celery worker -A tasks -l info

3.运行了Worker服务器后,tasks.py会被加入到python sys_path,可以直接导入使用

from tasks import add

result = add.delay(4, 4)
"""
这个任务已经由之前启动的职程执行,并且你可以查看职程的控制台输出来验证。
调用任务会返回一个 AsyncResult 实例,可用于检查任务的状态,等待任务完成或获取返回值(如果任务失败,则为异常和回溯)。 但这个功能默认是不开启的,你需要设置一个 Celery 的结果后端,下一节将会详细介绍。
"""

result.ready()  # 查看任务是否完成
result.result  # 获取任务执行结果

4.从py文件更新celery配置

app.config_from_object('celeryconfig')

配置文件celeryconfig.py

BROKER_URL = 'amqp://'
CELERY_RESULT_BACKEND = 'amqp://'

CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_ACCEPT_CONTENT=['json']
CELERY_ENABLE_UTC = True

验证配置文件是否合法:

python -m celeryconfig

0X02 使用django-celery

1.安装

apt-get install rabbitmq-server
pip install celery
pip install django-celery

2.加载

INSTALLED_APPS = [
    'djcelery',
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
]

0X03 使用django-celery-beat

一个DEMO:https://github.com/celery/celery/tree/master/examples/django/

celery-4.3.0
django-celery-beat-1.5.0

1、同步数据库

python3 manage.py migrate

2、运行WebServer

3、运行celery worker

python3 -m celery worker -A celery_test -l info --uid=113 --gid=116

celery -A celery_test beat -l info --scheduler django_celery_beat.schedulers:DatabaseScheduler --detach  # --detach后台运行

使用过程中发现BUG太多,暂不使用。可在Linux层面使用crontab来实现周期任务

0X04 最新配置

仅使用celery:

pip install celery==4.3.0
# BROKER和BACKEND可以使用redis和RabbitMQ
yum install rabbitmq-server
# yum install redis

不使用django-celery的原因:

安装django-celery会默认安装3.1版本的celery,而celery最新版为4.3,为了减少BUG,使用最新版比较妥当。而且使用celery已经可以满足调用任务队列的需求,不是太需要django层面的封装。django-celery-beat倒是比较有用,支持周期性下发任务,并可以改变周期。

目录结构,Django项目:

celery_test/
├── celery_test
│   ├── celery_config.py # celery配置文件
│   ├── celery.py # celery实例APP
│   ├── __init__.py # 在init加载celery,让celery跟随Django启动
│   ├── settings.py
│   ├── urls.py
│   └── wsgi.py
├── main
│   ├── admin.py
│   ├── apps.py
│   ├── __init__.py
│   ├── migrations
│   │   └── __init__.py
│   ├── models.py
│   ├── tasks.py # 任务,放在每个APP目录下的tasks.py文件中
│   ├── tests.py
│   └── views.py
├── manage.py

1、celery_config.py

# -*- coding: utf-8 -*-
# @Time    : 2019/7/3 17:17
# @Author  : Zcs
# @File    : celery_config.py

CELERY_BROKER_URL= 'amqp://guest:%s@localhost//' % config['RABBIT_MQ']['PASSWD'] # 使用RabbitMQ
CELERY_RESULT_BACKEND = 'amqp://guest:%s@localhost//' % config['RABBIT_MQ']['PASSWD'] # 使用RabbitMQ
CELERY_TIMEZONE = 'Asia/Shanghai'  # 时区设置
CELERY_TASK_SERIALIZER = 'pickle'  # 任务序列化器
CELERY_RESULT_SERIALIZER = 'pickle'  # 结果序列化器
CELERY_ACCEPT_CONTENT = ['json', 'pickle']
CELERY_RESULT_EXPIRES = 3600
# CELERY_WORKER_LOG_FORMAT = '%(asctime)s [%(module)s %(levelname)s] %(message)s'
# CELERY_WORKER_TASK_LOG_FORMAT = '%(task_id)s %(task_name)s %(message)s'
CELERY_WORKER_TASK_LOG_FORMAT = '%(message)s'
CELERY_WORKER_LOG_FORMAT = '%(message)s'
CELERY_TASK_EAGER_PROPAGATES = True
CELERY_WORKER_REDIRECT_STDOUTS = True
CELERY_WORKER_REDIRECT_STDOUTS_LEVEL = "INFO"
# CELERY_WORKER_HIJACK_ROOT_LOGGER = True
CELERY_WORKER_MAX_TASKS_PER_CHILD = 40
CELERY_TASK_SOFT_TIME_LIMIT = 3600

详细配置:http://docs.celeryproject.org/en/latest/userguide/configuration.html?highlight=CELERYD_CONCURRENCY

2、celery.py

# -*- coding: utf-8 -*-
# @Time    : 2019/6/28 9:54
# @Author  : Zcs
# @File    : celery.py
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery, platforms
from celery_test import settings

# 支持使用root用户启动worker
platforms.C_FORCE_ROOT = True

# 解决win64下的一个bug需要用到该设置
os.environ.setdefault('FORKED_BY_MULTIPROCESSING', '1')

# 设置celery命令行默认环境变量,不添加该变量celery会找不到各APP的tasks
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'celery_test.settings')

# 创建celery实例
app = Celery('celery_test')

# 配置参数,使用celery_config
app.config_from_object('celery_test.celery_config')

# 设置配置以CELERY_开头
app.namespace = 'CELERY'

# 加载所有APP中的tasks.py
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)

# @shared_task 装饰器能让你在没有具体的 Celery 实例时创建任务

@app.task(bind=True)
def debug_task(self):
    print('Request: {0!r}'.format(self.request))

3、__init__.py

from __future__ import absolute_import, unicode_literals

# This will make sure the app is always imported when
# Django starts so that shared_task will use this app.
from .celery import app as celery_app

__all__ = ('celery_app',)

4、tasks.py

# -*- coding: utf-8 -*-
# @Time    : 2019/6/27 14:44
# @Author  : Zcs
# @File    : tasks.py
from celery_test.celery import app
import time


@app.task # 该装饰器将目标函数变成任务,返回值为任务的uniq_id,写在函数中的返回值不会直接返回给调用者,而是在任务完成后返回到BACKEND
def send_mail(arg):
    #time.sleep(20)
    return arg

5、运行项目

python3 manage.py runserver 0.0.0.0:8000
celery worker -A project_name -l info --uid=993 --gid=989  # 启动worker,-A指定celery app目录,在本例中应为"celery_test"
flower --port=5555 --broker='redis://127.0.0.1:6379/2'  # 运行celery监控,浏览器访问5555端口即可

6.调用任务

from django.views import View
from .tasks import send_mail
from django.http import HttpResponse


class main_view(View):

    def get(self, request):
        r = send_mail.delay('a') # 返回的r不为Done,而是该任务的uniq_id
        return HttpResponse(r)

运行celery worker的一些参数:

Examples:

        $ celery worker --app=proj -l info
        $ celery worker -A proj -l info -Q hipri,lopri

        $ celery worker -A proj --concurrency=4
        $ celery worker -A proj --concurrency=1000 -P eventlet
        $ celery worker --autoscale=10,0

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit

Global Options:
  -A APP, --app APP
  -b BROKER, --broker BROKER
  --result-backend RESULT_BACKEND
  --loader LOADER
  --config CONFIG
  --workdir WORKDIR     Optional directory to change to after detaching.
  --no-color, -C
  --quiet, -q

Worker Options:
  -n HOSTNAME, --hostname HOSTNAME
                        Set custom hostname (e.g., 'w1@%h'). Expands: %h
                        (hostname), %n (name) and %d, (domain).
  -D, --detach          Start worker as a background process.
  -S STATEDB, --statedb STATEDB
                        Path to the state database. The extension '.db' may be
                        appended to the filename. Default: None
  -l LOGLEVEL, --loglevel LOGLEVEL
                        Logging level, choose between DEBUG, INFO, WARNING,
                        ERROR, CRITICAL, or FATAL.
  -O OPTIMIZATION
  --prefetch-multiplier PREFETCH_MULTIPLIER
                        Set custom prefetch multiplier value for this worker
                        instance.

Pool Options:
  -c CONCURRENCY, --concurrency CONCURRENCY
                        Number of child processes processing the queue. The
                        default is the number of CPUs available on your
                        system.
  -P POOL, --pool POOL  Pool implementation: prefork (default), eventlet,
                        gevent or solo.
  -E, --task-events, --events
                        Send task-related events that can be captured by
                        monitors like celery events, celerymon, and others.
  --time-limit TIME_LIMIT
                        Enables a hard time limit (in seconds int/float) for
                        tasks.
  --soft-time-limit SOFT_TIME_LIMIT
                        Enables a soft time limit (in seconds int/float) for
                        tasks.
  --max-tasks-per-child MAX_TASKS_PER_CHILD, --maxtasksperchild MAX_TASKS_PER_CHILD
                        Maximum number of tasks a pool worker can execute
                        before it's terminated and replaced by a new worker.
  --max-memory-per-child MAX_MEMORY_PER_CHILD, --maxmemperchild MAX_MEMORY_PER_CHILD
                        Maximum amount of resident memory, in KiB, that may be
                        consumed by a child process before it will be replaced
                        by a new one. If a single task causes a child process
                        to exceed this limit, the task will be completed and
                        the child process will be replaced afterwards.
                        Default: no limit.

Queue Options:
  --purge, --discard    Purges all waiting tasks before the daemon is started.
                        **WARNING**: This is unrecoverable, and the tasks will
                        be deleted from the messaging server.
  --queues QUEUES, -Q QUEUES
                        List of queues to enable for this worker, separated by
                        comma. By default all configured queues are enabled.
                        Example: -Q video,image
  --exclude-queues EXCLUDE_QUEUES, -X EXCLUDE_QUEUES
                        List of queues to disable for this worker, separated
                        by comma. By default all configured queues are
                        enabled. Example: -X video,image.
  --include INCLUDE, -I INCLUDE
                        Comma separated list of additional modules to import.
                        Example: -I foo.tasks,bar.tasks

Features:
  --without-gossip      Don't subscribe to other workers events.
  --without-mingle      Don't synchronize with other workers at start-up.
  --without-heartbeat   Don't send event heartbeats.
  --heartbeat-interval HEARTBEAT_INTERVAL
                        Interval in seconds at which to send worker heartbeat
  --autoscale AUTOSCALE
                        Enable autoscaling by providing max_concurrency,
                        min_concurrency. Example:: --autoscale=10,3 (always
                        keep 3 processes, but grow to 10 if necessary)

Daemonization Options:
  -f LOGFILE, --logfile LOGFILE
                        Path to log file. If no logfile is specified, stderr
                        is used.
  --pidfile PIDFILE     Optional file used to store the process pid. The
                        program won't start if this file already exists and
                        the pid is still alive.
  --uid UID             User id, or user name of the user to run as after
                        detaching.
  --gid GID             Group id, or group name of the main group to change to
                        after detaching.
  --umask UMASK         Effective umask(1) (in octal) of the process after
                        detaching. Inherits the umask(1) of the parent process
                        by default.
  --executable EXECUTABLE
                        Executable to use for the detached process.

Embedded Beat Options:
  -B, --beat            Also run the celery beat periodic task scheduler.
                        Please note that there must only be one instance of
                        this service. .. note:: -B is meant to be used for
                        development purposes. For production environment, you
                        need to start celery beat separately.
  -s SCHEDULE_FILENAME, --schedule-filename SCHEDULE_FILENAME, --schedule SCHEDULE_FILENAME
                        Path to the schedule database if running with the -B
                        option. Defaults to celerybeat-schedule. The extension
                        ".db" may be appended to the filename. Apply
                        optimization profile. Supported: default, fair
  --scheduler SCHEDULER
                        Scheduler class to use. Default is
                        celery.beat.PersistentScheduler

6、可以给redis或rabbitmq设置密码登录,改一下配置即可

修改rabbitmq默认guest用户的密码(默认无密码,但只允许本地登录):

rabbitmqctl change_password guest your_password

修改后更改celery_config.py中的配置:

CELERY_BROKER_URL= 'amqp://guest:your_password@localhost//'
CELERY_RESULT_BACKEND = 'amqp://guest:your_password@localhost//'

redis:

CELERY_BROKER_URL= 'redis://:your_password@127.0.0.1:6379/2'
CELERY_RESULT_BACKEND = 'redis://:your_password@127.0.0.1:6379/2'

7、更多的需求

1)让多个任务链式执行:

https://celery.readthedocs.io/en/latest/userguide/canvas.html#chains

>>> from celery import chain
>>> from proj.tasks import add, mul

>>> # (4 + 4) * 8 * 10
>>> res = chain(add.s(4, 4), mul.s(8), mul.s(10))
proj.tasks.add(4, 4) | proj.tasks.mul(8) | proj.tasks.mul(10)

https://www.cnblogs.com/wdliu/p/9517535.html

2)让多个任务链式执行,但不传递上个任务的结果

chain,管道,chord默认都会将上个任务结果传给下个任务,对于只想链式执行的任务,没有必要传递结果,默认传递结果还会使你的参数不容易配置。所以使用si()函数完成异步调用,而不是s()。当然,这并不影响你传递自己想要传递的参数,EXP:

task = chain(stop.si(),
             init.si(args),
             start.si())
task()

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值