Django使用Celery

最新推荐文章于 2024-10-10 14:52:51 发布

ButFlyzzZ

最新推荐文章于 2024-10-10 14:52:51 发布

阅读量1.8k

点赞数 2

分类专栏： Python 文章标签： celery python3 django

本文链接：https://blog.csdn.net/qq_29113041/article/details/93595084

版权

Python 专栏收录该内容

25 篇文章 1 订阅

订阅专栏

0X-1 踩坑记录

2019-7-6踩坑

场景描述：投放两个任务对数据表同一条记录的不同字段进行更新。问题在于两个任务要不是A任务没有更新成功，要不就是B任务没有更新成功，总是有字段没有更新成功。该问题在Worker数量>1才会出现，因为Worker数量等于1时，所有的Task是线性执行的，存在多个Worker的话，多个任务会并行执行。

后经查阅资料发现，这是由于Django ORM的save方法引起的，当进行了UPDATE操作后，save的时候UPDATE的是这条记录的所有字段，而不是更新的那个字段。如果别的字段在该事务提交之前被更新了，那么该事务提交的时候将会把脏数据更新到数据库。

解决方法：save的时候指定更新的字段，仅仅更新那个字段

obj.save(update_fields=['name'])

传送门：https://blog.csdn.net/yongche_shi/article/details/49096043

0X00 什么是Celery

任务队列是一种在线程或机器间分发任务的机制。

消息队列的输入是工作的一个单元，称为任务，独立的职程（Worker）进程持续监视队列中是否有需要处理的新任务。

Celery 用消息通信，通常使用中间人（Broker）在客户端和职程间斡旋。这个过程从客户端向队列添加消息开始，之后中间人把消息派送给职程。

Celery 系统可包含多个职程和中间人，以此获得高可用性和横向扩展能力。

Celery 是用 Python 编写的，但协议可以用任何语言实现。迄今，已有 Ruby 实现的 RCelery 、node.js 实现的 node-celery 以及一个 PHP 客户端，语言互通也可以通过 using webhooks 实现。

0X01 DEMO

1.编写一个应用，tasks.py

from celery import Celery

app = Celery('tasks', broker='amqp://guest@localhost//')
#  'tasks'为当前模块名称，broker指定所使用的消息中间件

@app.task
def add(x, y):
    return x + y

2.运行Worker服务器

celery worker -A tasks -l info

3.运行了Worker服务器后，tasks.py会被加入到python sys_path，可以直接导入使用

from tasks import add

result = add.delay(4, 4)
"""
这个任务已经由之前启动的职程执行，并且你可以查看职程的控制台输出来验证。
调用任务会返回一个 AsyncResult 实例，可用于检查任务的状态，等待任务完成或获取返回值（如果任务失败，则为异常和回溯）。 但这个功能默认是不开启的，你需要设置一个 Celery 的结果后端，下一节将会详细介绍。
"""

result.ready()  # 查看任务是否完成
result.result  # 获取任务执行结果

4.从py文件更新celery配置

app.config_from_object('celeryconfig')

配置文件celeryconfig.py

BROKER_URL = 'amqp://'
CELERY_RESULT_BACKEND = 'amqp://'

CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_ACCEPT_CONTENT=['json']
CELERY_ENABLE_UTC = True

验证配置文件是否合法：

python -m celeryconfig

0X02 使用django-celery

1.安装

apt-get install rabbitmq-server
pip install celery
pip install django-celery

2.加载

INSTALLED_APPS = [
    'djcelery',
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
]

0X03 使用django-celery-beat

一个DEMO：https://github.com/celery/celery/tree/master/examples/django/

celery-4.3.0
django-celery-beat-1.5.0

1、同步数据库

python3 manage.py migrate

2、运行WebServer

3、运行celery worker

python3 -m celery worker -A celery_test -l info --uid=113 --gid=116

celery -A celery_test beat -l info --scheduler django_celery_beat.schedulers:DatabaseScheduler --detach  # --detach后台运行

使用过程中发现BUG太多，暂不使用。可在Linux层面使用crontab来实现周期任务

0X04 最新配置

仅使用celery：

pip install celery==4.3.0
# BROKER和BACKEND可以使用redis和RabbitMQ
yum install rabbitmq-server
# yum install redis

不使用django-celery的原因：

安装django-celery会默认安装3.1版本的celery，而celery最新版为4.3，为了减少BUG，使用最新版比较妥当。而且使用celery已经可以满足调用任务队列的需求，不是太需要django层面的封装。django-celery-beat倒是比较有用，支持周期性下发任务，并可以改变周期。

目录结构，Django项目：

celery_test/
├── celery_test
│   ├── celery_config.py # celery配置文件
│   ├── celery.py # celery实例APP
│   ├── __init__.py # 在init加载celery，让celery跟随Django启动
│   ├── settings.py
│   ├── urls.py
│   └── wsgi.py
├── main
│   ├── admin.py
│   ├── apps.py
│   ├── __init__.py
│   ├── migrations
│   │   └── __init__.py
│   ├── models.py
│   ├── tasks.py # 任务，放在每个APP目录下的tasks.py文件中
│   ├── tests.py
│   └── views.py
├── manage.py

1、celery_config.py

# -*- coding: utf-8 -*-
# @Time    : 2019/7/3 17:17
# @Author  : Zcs
# @File    : celery_config.py

CELERY_BROKER_URL= 'amqp://guest:%s@localhost//' % config['RABBIT_MQ']['PASSWD'] # 使用RabbitMQ
CELERY_RESULT_BACKEND = 'amqp://guest:%s@localhost//' % config['RABBIT_MQ']['PASSWD'] # 使用RabbitMQ
CELERY_TIMEZONE = 'Asia/Shanghai'  # 时区设置
CELERY_TASK_SERIALIZER = 'pickle'  # 任务序列化器
CELERY_RESULT_SERIALIZER = 'pickle'  # 结果序列化器
CELERY_ACCEPT_CONTENT = ['json', 'pickle']
CELERY_RESULT_EXPIRES = 3600
# CELERY_WORKER_LOG_FORMAT = '%(asctime)s [%(module)s %(levelname)s] %(message)s'
# CELERY_WORKER_TASK_LOG_FORMAT = '%(task_id)s %(task_name)s %(message)s'
CELERY_WORKER_TASK_LOG_FORMAT = '%(message)s'
CELERY_WORKER_LOG_FORMAT = '%(message)s'
CELERY_TASK_EAGER_PROPAGATES = True
CELERY_WORKER_REDIRECT_STDOUTS = True
CELERY_WORKER_REDIRECT_STDOUTS_LEVEL = "INFO"
# CELERY_WORKER_HIJACK_ROOT_LOGGER = True
CELERY_WORKER_MAX_TASKS_PER_CHILD = 40
CELERY_TASK_SOFT_TIME_LIMIT = 3600

详细配置：http://docs.celeryproject.org/en/latest/userguide/configuration.html?highlight=CELERYD_CONCURRENCY

2、celery.py

# -*- coding: utf-8 -*-
# @Time    : 2019/6/28 9:54
# @Author  : Zcs
# @File    : celery.py
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery, platforms
from celery_test import settings

# 支持使用root用户启动worker
platforms.C_FORCE_ROOT = True

# 解决win64下的一个bug需要用到该设置
os.environ.setdefault('FORKED_BY_MULTIPROCESSING', '1')

# 设置celery命令行默认环境变量，不添加该变量celery会找不到各APP的tasks
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'celery_test.settings')

# 创建celery实例
app = Celery('celery_test')

# 配置参数，使用celery_config
app.config_from_object('celery_test.celery_config')

# 设置配置以CELERY_开头
app.namespace = 'CELERY'

# 加载所有APP中的tasks.py
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)

# @shared_task 装饰器能让你在没有具体的 Celery 实例时创建任务

@app.task(bind=True)
def debug_task(self):
    print('Request: {0!r}'.format(self.request))

3、__init__.py

from __future__ import absolute_import, unicode_literals

# This will make sure the app is always imported when
# Django starts so that shared_task will use this app.
from .celery import app as celery_app

__all__ = ('celery_app',)

4、tasks.py

# -*- coding: utf-8 -*-
# @Time    : 2019/6/27 14:44
# @Author  : Zcs
# @File    : tasks.py
from celery_test.celery import app
import time


@app.task # 该装饰器将目标函数变成任务，返回值为任务的uniq_id,写在函数中的返回值不会直接返回给调用者，而是在任务完成后返回到BACKEND
def send_mail(arg):
    #time.sleep(20)
    return arg

5、运行项目

python3 manage.py runserver 0.0.0.0:8000
celery worker -A project_name -l info --uid=993 --gid=989  # 启动worker，-A指定celery app目录，在本例中应为"celery_test"
flower --port=5555 --broker='redis://127.0.0.1:6379/2'  # 运行celery监控，浏览器访问5555端口即可

6.调用任务

from django.views import View
from .tasks import send_mail
from django.http import HttpResponse


class main_view(View):

    def get(self, request):
        r = send_mail.delay('a') # 返回的r不为Done，而是该任务的uniq_id
        return HttpResponse(r)

运行celery worker的一些参数：

Examples:

        $ celery worker --app=proj -l info
        $ celery worker -A proj -l info -Q hipri,lopri

        $ celery worker -A proj --concurrency=4
        $ celery worker -A proj --concurrency=1000 -P eventlet
        $ celery worker --autoscale=10,0

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit

Global Options:
  -A APP, --app APP
  -b BROKER, --broker BROKER
  --result-backend RESULT_BACKEND
  --loader LOADER
  --config CONFIG
  --workdir WORKDIR     Optional directory to change to after detaching.
  --no-color, -C
  --quiet, -q

Worker Options:
  -n HOSTNAME, --hostname HOSTNAME
                        Set custom hostname (e.g., 'w1@%h'). Expands: %h
                        (hostname), %n (name) and %d, (domain).
  -D, --detach          Start worker as a background process.
  -S STATEDB, --statedb STATEDB
                        Path to the state database. The extension '.db' may be
                        appended to the filename. Default: None
  -l LOGLEVEL, --loglevel LOGLEVEL
                        Logging level, choose between DEBUG, INFO, WARNING,
                        ERROR, CRITICAL, or FATAL.
  -O OPTIMIZATION
  --prefetch-multiplier PREFETCH_MULTIPLIER
                        Set custom prefetch multiplier value for this worker
                        instance.

Pool Options:
  -c CONCURRENCY, --concurrency CONCURRENCY
                        Number of child processes processing the queue. The
                        default is the number of CPUs available on your
                        system.
  -P POOL, --pool POOL  Pool implementation: prefork (default), eventlet,
                        gevent or solo.
  -E, --task-events, --events
                        Send task-related events that can be captured by
                        monitors like celery events, celerymon, and others.
  --time-limit TIME_LIMIT
                        Enables a hard time limit (in seconds int/float) for
                        tasks.
  --soft-time-limit SOFT_TIME_LIMIT
                        Enables a soft time limit (in seconds int/float) for
                        tasks.
  --max-tasks-per-child MAX_TASKS_PER_CHILD, --maxtasksperchild MAX_TASKS_PER_CHILD
                        Maximum number of tasks a pool worker can execute
                        before it's terminated and replaced by a new worker.
  --max-memory-per-child MAX_MEMORY_PER_CHILD, --maxmemperchild MAX_MEMORY_PER_CHILD
                        Maximum amount of resident memory, in KiB, that may be
                        consumed by a child process before it will be replaced
                        by a new one. If a single task causes a child process
                        to exceed this limit, the task will be completed and
                        the child process will be replaced afterwards.
                        Default: no limit.

Queue Options:
  --purge, --discard    Purges all waiting tasks before the daemon is started.
                        **WARNING**: This is unrecoverable, and the tasks will
                        be deleted from the messaging server.
  --queues QUEUES, -Q QUEUES
                        List of queues to enable for this worker, separated by
                        comma. By default all configured queues are enabled.
                        Example: -Q video,image
  --exclude-queues EXCLUDE_QUEUES, -X EXCLUDE_QUEUES
                        List of queues to disable for this worker, separated
                        by comma. By default all configured queues are
                        enabled. Example: -X video,image.
  --include INCLUDE, -I INCLUDE
                        Comma separated list of additional modules to import.
                        Example: -I foo.tasks,bar.tasks

Features:
  --without-gossip      Don't subscribe to other workers events.
  --without-mingle      Don't synchronize with other workers at start-up.
  --without-heartbeat   Don't send event heartbeats.
  --heartbeat-interval HEARTBEAT_INTERVAL
                        Interval in seconds at which to send worker heartbeat
  --autoscale AUTOSCALE
                        Enable autoscaling by providing max_concurrency,
                        min_concurrency. Example:: --autoscale=10,3 (always
                        keep 3 processes, but grow to 10 if necessary)

Daemonization Options:
  -f LOGFILE, --logfile LOGFILE
                        Path to log file. If no logfile is specified, stderr
                        is used.
  --pidfile PIDFILE     Optional file used to store the process pid. The
                        program won't start if this file already exists and
                        the pid is still alive.
  --uid UID             User id, or user name of the user to run as after
                        detaching.
  --gid GID             Group id, or group name of the main group to change to
                        after detaching.
  --umask UMASK         Effective umask(1) (in octal) of the process after
                        detaching. Inherits the umask(1) of the parent process
                        by default.
  --executable EXECUTABLE
                        Executable to use for the detached process.

Embedded Beat Options:
  -B, --beat            Also run the celery beat periodic task scheduler.
                        Please note that there must only be one instance of
                        this service. .. note:: -B is meant to be used for
                        development purposes. For production environment, you
                        need to start celery beat separately.
  -s SCHEDULE_FILENAME, --schedule-filename SCHEDULE_FILENAME, --schedule SCHEDULE_FILENAME
                        Path to the schedule database if running with the -B
                        option. Defaults to celerybeat-schedule. The extension
                        ".db" may be appended to the filename. Apply
                        optimization profile. Supported: default, fair
  --scheduler SCHEDULER
                        Scheduler class to use. Default is
                        celery.beat.PersistentScheduler

6、可以给redis或rabbitmq设置密码登录，改一下配置即可

修改rabbitmq默认guest用户的密码（默认无密码，但只允许本地登录）：

rabbitmqctl change_password guest your_password

修改后更改celery_config.py中的配置：

CELERY_BROKER_URL= 'amqp://guest:your_password@localhost//'
CELERY_RESULT_BACKEND = 'amqp://guest:your_password@localhost//'

redis：

CELERY_BROKER_URL= 'redis://:your_password@127.0.0.1:6379/2'
CELERY_RESULT_BACKEND = 'redis://:your_password@127.0.0.1:6379/2'

7、更多的需求

1）让多个任务链式执行：

https://celery.readthedocs.io/en/latest/userguide/canvas.html#chains

>>> from celery import chain
>>> from proj.tasks import add, mul

>>> # (4 + 4) * 8 * 10
>>> res = chain(add.s(4, 4), mul.s(8), mul.s(10))
proj.tasks.add(4, 4) | proj.tasks.mul(8) | proj.tasks.mul(10)

https://www.cnblogs.com/wdliu/p/9517535.html

2）让多个任务链式执行，但不传递上个任务的结果

chain，管道，chord默认都会将上个任务结果传给下个任务，对于只想链式执行的任务，没有必要传递结果，默认传递结果还会使你的参数不容易配置。所以使用si()函数完成异步调用，而不是s()。当然，这并不影响你传递自己想要传递的参数，EXP：

task = chain(stop.si(),
             init.si(args),
             start.si())
task()

ButFlyzzZ

关注

2
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录