django celery 异步任务 异步存储

环境:win11、python 3.9.2、django 4.2.11、celery 4.4.7、MySQL 8.1、redis 3.0

背景:基于django框架的大量任务实现,并且需要保存数据库

时间:20240409

说明:异步爬取小说,并将其保存到数据库

1、创建django项目,并创建app,测试调通

# 创建目录GetFiction
pip install django==4.2.11 pymysql-1.1.0
django-admin startproject getfiction .
django-admin startapp getsection

配置MySQL连接、应用注册、日志、其他配置

# getfiction/__init__.py  filepath

import pymysql

pymysql.install_as_MySQLdb()

# getfiction/settings.py  filepath

# 数据库连接

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'fictions',
        'USER': '****',
        'PASSWORD': '****',
        'HOST': '127.0.0.1',
        'PORT': '3306',
    }
}

# 应用注册

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'getsection',
]

# 其他配置

LANGUAGE_CODE = 'zh-Hans'

TIME_ZONE = 'Asia/Shanghai'

# 日志配置

import logging

# 创建日志器
logger = logging.getLogger("test")
# 为日志器设置日志等级,如果这里不设置,将会使用其父级日志器的等日志等级
# 这里它的父日志器是root,root的默认日志级别是 logging.WARNING
logger.setLevel(logging.INFO)

# 创建文件处理程序
fh = logging.FileHandler(filename="./test.log",encoding="utf8")
# 创建流处理程序
sh = logging.StreamHandler()

# 为文件处理程序设置日志等级
fh.setLevel(logging.ERROR)
# 为流处理程序设置日志等级
sh.setLevel(logging.DEBUG)

# 创建格式化程序
ffmt = logging.Formatter(
    fmt = "%(asctime)s - %(levelname)s - %(name)s - %(filename)s:%(lineno)d - %(message)s",
    datefmt = "%Y/%m/%d %H:%M:%S"
)
# 创建格式化程序
sfmt = logging.Formatter(
    fmt = "%(asctime)s - %(levelname)s - %(name)s - %(filename)s:%(lineno)d - %(message)s",
)

# 将 ffmt 格式化程序应用到 fh 文件处理程序
fh.setFormatter(ffmt)
# 将 sfmt 格式化程序应用到 sh 流处理程序
sh.setFormatter(sfmt)

# 将文件处理程序应用到logger日志器
logger.addHandler(fh)
# 将流处理程序应用到logger日志器
logger.addHandler(sh)

 配置首页的路由以及视图函数

# getfiction/urls.py

from getsection.views import index

urlpatterns = [
    path('admin/', admin.site.urls),
    path('', index),
]

# getsection/views.py

from django.shortcuts import HttpResponse

def index(request):
    # 测试首页
    return HttpResponse(str("hello"))

调试(postman或是浏览器都行)

python manage.py runserver 0.0.0.0:8000

postman测试结果,如下:

2、django中集成celery,异步处理任务,并将任务存储到MySQL

celery配置:broker使用redis,backend使用django自带的ORM,并注册celery相关应用

# getfiction/celery.py

from __future__ import absolute_import, unicode_literals
from celery import Celery
import os

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'getfiction.settings')  # 设置django环境
app = Celery('djcelery', broker='redis://127.0.0.1:6379/0', backend='django-db')
app.autodiscover_tasks()  # 发现任务文件每个app下的task.py

# getfiction/__init__.py

from __future__ import absolute_import, unicode_literals
from .celery import app as celery_app
import pymysql

__all__ = ['celery_app']
pymysql.install_as_MySQLdb()

# getfiction/settings.py

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'django_celery_results',
    'getsection',
    'djcelery',
]

安装redis并启动:Release 3.0.504 · microsoftarchive/redis (github.com)

安装redis、eventlet模块:pip install redis eventlet

 安装celery相关模块

pip install celery django-celery django-celery-results

启动测试报错:

        1、cannot import name 'ugettext_lazy' from 'django.utils.translation'

        “ugettext_lazy”已被 Django 3+ 弃用,所以需要修改django的源码,如下:

        

 将ugettext_lazy使用gettext_lazy替代,修改如下:

# from django.utils.translation import ugettext_lazy as _
from django.utils.translation import gettext_lazy as _

重启项目,继续测试

        2、ModuleNotFoundError: No module named 'celery.five'

        版本不兼容,将celery的版本降到 4.4.7

pip install celery==4.4.7

重启项目,继续测试

        3、cannot import name 'force_unicode' from 'django.utils.encoding'

处理方式同上述1, 将 force_unicode替换为 force_str

# from django.utils.encoding import force_unicode as force_text  # noqa
from django.utils.encoding import force_str as force_text  # noqa

重启继续

        4、except self._encode_error, exc:

        anyjson不兼容python3,所以产生报错,下面会连续修改该模块相关源码,如下:

        

# GetFiction\venv\lib\site-packages\anyjson\__init__.py
    
88    except self._encode_error as exc:
89            raise (TypeError, TypeError(*exc.args), sys.exc_info()[2])

100   except self._decode_error as exc:
101           raise (ValueError, ValueError(*exc.args), sys.exc_info()[2])

120   print("Running anyjson as a stand alone script is not supported")

67    if isinstance(modinfo["encerror"], str):
69    if isinstance(modinfo["encerror"], str):

         5、from django.utils.translation import ugettext_lazy as _  错误同 1,

        6、from django.utils.translation import ungettext, ugettext as _ 与1类似

        

from django.utils.translation import gettext, gettext as _
# from django.utils.translation import ungettext, ugettext as _

        7、cannot import name 'force_unicode' from 'django.utils.encoding'

# venv/Lib/site-packages/djcelery/admin.py
from django.utils.encoding import force_str as force_text  # noqa

至此,修改源码部分完成

3、添加任务,执行,写入到数据库

 数据库迁移

python manage.py makemigrations   #生成迁移文件
python manage.py migrate          #执行迁移,生成数据表

创建发送任务视图函数:

# getsection/views.py

from django.shortcuts import HttpResponse
from getfiction.settings import logger
from getsection.tasks import getfictioninfo
from playwright.sync_api import sync_playwright


def index(request):
    with sync_playwright() as p:
        browser = p.chromium.launch()
        page = browser.new_page()
        base_url = "https://www.83ks.org"
        # https://www.83ks.org/read/196719/2535054.html 某一章的内容
        page.goto(f"{base_url}/book/196719/")
        element_href = page.query_selector_all("#list dl a")
        novel_href_dic = {}
        if element_href:
            for i in element_href[:5]:
                c = [i.get_attribute('href')] + i.get_attribute('title').split(" ")[:0:-1]
                if len(c) < 3:
                    logger.error(str(c) + "该章节存在错误")
                elif len(c) == 4:
                    c.remove("lwxs.com")
                    novel_href_dic[c[-1]] = c[:2]
                else:
                    novel_href_dic[c[-1]] = c[:2]
        browser.close()
    for secindex, url_and_secname in novel_href_dic.items():
        getfictioninfo.delay(secindex, url_and_secname)
    return HttpResponse("OK")

安装playwright模块:pip install playwright

# getsection/tasks.py

from __future__ import absolute_import
from celery import shared_task
from playwright.sync_api import sync_playwright
from getfiction.settings import logger



@shared_task
def getfictioninfo(secindex, url_and_secname):
    # 第68章  ['/read/196719/1660838.html', '势不可挡']
    with sync_playwright() as p:
        browser = p.chromium.launch()
        section_page = browser.new_page()
        section_url = "https://www.83ks.org" + url_and_secname[0]
        try:
            section_page.goto(section_url)
        except Exception as e:
            logger.error(str(secindex) + str(e))
        section_page_element = section_page.query_selector_all("#content p")
        section_name = secindex + " " + url_and_secname[1] + "\n"
        logger.info(section_name)
        for i in section_page_element:
            section_name += i.inner_text() + " "
        return section_name

启动celery:celery -A getfiction worker -l info -P eventlet  

启动django:python manage.py runserver 0.0.0.0:8000

访问首页,进行测试:

数据存储,如下:

 django_celery_results_taskresult

 存在的问题,版本存在不兼容的问题,尚需优化

### 配置Celery 为了使Django项目能够利用Celery执行异步任务,需先完成必要的环境搭建工作。这包括安装所需的软件包以及设置相应的配置项。 #### 安装依赖库 通过pip工具来安装`celery`及其消息代理(如Redis),命令如下: ```bash pip install celery redis ``` #### 设置Django项目的Celery实例 创建一个新的Python文件用于初始化Celery应用,在此假设命名为`celery.py`并放置于与`manage.py`同级目录下[^2]。 ```python from __future__ import absolute_import, unicode_literals import os from celery import Celery # set the default Django settings module for the 'celery' program. os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'your_project_name.settings') app = Celery('your_project_name') # Using a string here means the worker doesn't have to serialize # the configuration object to child processes. app.config_from_object('django.conf:settings', namespace='CELERY') # Load task modules from all registered Django app configs. app.autodiscover_tasks() ``` 在此基础上还需修改`__init__.py`以确保当Django启动时自动加载Celery应用程序: ```python # This will make sure the app is always imported when # Django starts so that shared_task will use this app. from .celery import app as celery_app __all__ = ('celery_app',) ``` #### 更新Settings Configuration 编辑`settings.py`加入Celery的相关参数定义,例如指定使用的Broker URL和Result Backend等选项[^1]。 ```python CELERY_BROKER_URL = 'redis://localhost:6379/0' CELERY_RESULT_BACKEND = 'redis://localhost:6379/0' # Optional configurations, see http://celery.readthedocs.org/en/latest/configuration.html CELERY_ACCEPT_CONTENT = ['json'] CELERY_TASK_SERIALIZER = 'json' CELERY_RESULT_SERIALIZER = 'json' CELERY_TIMEZONE = 'UTC' ``` 以上步骤完成后便可以在任意Django App内开发具体的异步任务函数了。 ### 编写异步任务 在期望添加异步操作的应用程序中新建tasks模块,并编写具体业务逻辑的任务方法。下面给出一个简单的例子展示如何发送邮件作为后台作业处理: ```python from celery import shared_task from django.core.mail import send_mail @shared_task def send_email(subject, message, recipient_list): """Send an email asynchronously.""" try: result = send_mail( subject=subject, message=message, from_email=None, # Use DEFAULT_FROM_EMAIL setting recipient_list=recipient_list, fail_silently=False, ) return f'Successfully sent {result} emails.' except Exception as e: raise ValueError(f"Failed to send mail due to error: {str(e)}") ``` 此时调用该任务的方式有两种:一种是在视图或其他地方直接触发;另一种则是借助调度器定期运行特定任务
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值