上次说到 缓冲数据库同步到mysql的问题,实际上,同步并不需要自己另外写程序写命令来特地执行同步操作。妙处就在celery配置好并开启相应服务后,可以自动完成这项工作。
celery 一个半中文文档地址:(看全英有些吃力,专业术语好多)
http://docs.jinkan.org/docs/celery/
以下是简介:
Celery - 分布式任务队列
Celery 是一个简单、灵活且可靠的,处理大量消息的分布式系统,并且提供维护这样一个系统的必需工具。
它是一个专注于实时处理的任务队列,同时也支持任务调度。
Celery 有广泛、多样的用户与贡献者社区,你可以通过 IRC 或是 邮件列表 加入我们。
Celery 是开源的,使用 BSD 许可证 授权。
在这个项目中的文件结构:
ichnaea.async(注意不是egg下,我也不知道为何不是运行egg下的文件)
- app.py
- config.py
- settings.py
- task.py
看起来跟webapp那个文件夹里的内容很像。
仔细看看文档后发现确实很多地方如出一辙。
用以下命令跑起worker:
ICHNAEA_CFG=location.ini bin/celery -A ichnaea.async.app:celery_app worker \
-Ofair --no-execv --without-mingle --without-gossip
配置参数的含义在文档里都有详细解释。
start...
redis_uri is: redis://localhost:6379/0
-------------- celery@sa-VirtualBox v3.1.23 (Cipater)
---- **** -----
--- * *** * -- Linux-4.4.0-31-generic-x86_64-with-Ubuntu-16.04-xenial
-- * - **** ---
- ** ---------- [config]
- ** ---------- .> app: ichnaea.async.app:0x7fdce337aa10
- ** ---------- .> transport: redis://localhost:6379/0
- ** ---------- .> results: redis://localhost:6379/0
- *** --- * --- .> concurrency: 1 (prefork)
-- ******* ----
--- ***** ----- [queues]
-------------- .> celery_blue exchange=celery(direct) key=celery_blue
.> celery_cell exchange=celery(direct) key=celery_cell
.> celery_content exchange=celery(direct) key=celery_content
.> celery_default exchange=celery(direct) key=celery_default
.> celery_export exchange=celery(direct) key=celery_export
.> celery_incoming exchange=celery(direct) key=celery_incoming
.> celery_monitor exchange=celery(direct) key=celery_monitor
.> celery_ocid exchange=celery(direct) key=celery_ocid
.> celery_reports exchange=celery(direct) key=celery_reports
.> celery_wifi exchange=celery(direct) key=celery_wifi
[2016-08-19 11:53:23,870: WARNING/MainProcess] celery@sa-VirtualBox ready.
看到出现这一段文字,说明启动成功。
然而在运行过程中并没有传说中的‘周期性动作’的配置和执行。
数据库仍然没有半点动静。
接下来终于发现celery有个神奇的beat:
http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html
celery beat is a scheduler. It kicks off tasks at regular intervals, which are then executed by the worker nodes available in the cluster.
By default the entries are taken from the CELERYBEAT_SCHEDULE setting, but custom stores can also be used, like storing the entries in an SQL database.
You have to ensure only a single scheduler is running for a schedule at a time, otherwise you would end up with duplicate tasks. Using a centralized approach means the schedule does not have to be synchronized, and the service can operate without using locks.
让add这个task每隔30秒执行一次
from celery.schedules import crontab
CELERYBEAT_SCHEDULE = {
# Executes every Monday morning at 7:30 A.M
'add-every-monday-morning': {
'task': 'tasks.add',
'schedule':seconds(30),
'args': (16, 16),
},
}
那我们项目中的task在哪里?又是在哪里配置周期时常呢?
/ProgFile/ichnaea-for-liuqiao/ichnaea/ichnaea/async/task.py:
if enabled and cls._schedule:
app.conf.CELERYBEAT_SCHEDULE.update(cls.beat_config())
照葫芦画瓢搜索到这样几句代码
beat_config看起来很可疑:
@classmethod
def beat_config(cls):
"""
Returns the beat schedule for this task, taking into account
the optional shard_model to create multiple schedule entries.
"""
if cls._shard_model is None:
return {cls.shortname(): {
'task': cls.name,
'schedule': cls._schedule,
}}
result = {}
for shard_id in cls._shard_model.shards().keys():
result[cls.shortname() + '_' + shard_id] = {
'task': cls.name,
'schedule': cls._schedule,
'kwargs': {'shard_id': shard_id},
}
return result
再来追踪下task和schedule
真相是真正的task在data/tasks下,所有的task都用注解的方式继承了这个基类,同时也指定了周期,比如下面这个:
@celery_app.task(base=BaseTask, bind=True, queue='celery_reports',
_countdown=2, expires=20, _schedule=timedelta(seconds=32))
def update_incoming(self):
print 'update_incoming'
export.IncomingQueue(self)(export_reports)
现在还不清楚这个task到底完成什么任务。
不管了,看看beat怎么开启来。
Starting the Scheduler
To start the celery beat service:
$ celery -A proj beat
这个proj 是ichnaea.async.app:celery_app,没有这个app啥也干不了
You can also start embed beat inside the worker by enabling workers -B option, this is convenient if you will never run more than one worker node, but it’s not commonly used and for that reason is not recommended for production use:
$ celery -A proj worker -B 不推荐使用这种方法
Beat needs to store the last run times of the tasks in a local database file (named celerybeat-schedule by default), so it needs access to write in the current directory, or alternatively you can specify a custom location for this file:
$ celery -A proj beat -s /home/celery/var/run/celerybeat-schedule 这种会报错
用第一种方法开启,出现:
a@sa-VirtualBox:/ProgFile/ichnaea-for-liuqiao/ichnaea$ ICHNAEA_CFG=location.ini bin/celery -A ichnaea.async.app:celery_app beat
start...
redis_uri is: redis://localhost:6379/0
celery beat v3.1.23 (Cipater) is starting.
__ - ... __ - _
Configuration ->
. broker -> redis://localhost:6379/0
. loader -> celery.loaders.app.AppLoader
. scheduler -> celery.beat.PersistentScheduler
. db -> celerybeat-schedule
. logfile -> [stderr]@%WARNING
. maxinterval -> now (0s)
这样开启成功了。一旦beat跑起来,所有带周期的task都像接到了命令一样开始执行起来了。
这个时候在我们开启worker的那个终端可以看到自动出现的一排排黄色字体,like:
[2016-08-19 14:03:31,521: WARNING/Worker-1] **
我自己在每个task开始的地方加了个print,打印任务的名字:
[2016-08-19 14:05:37,375: WARNING/Worker-1] update_incoming
[2016-08-19 14:05:37,377: WARNING/Worker-1] query in session.py entities:
[2016-08-19 14:05:37,378: WARNING/Worker-1] (<class 'ichnaea.models.config.ExportConfig'>,)
[2016-08-19 14:05:37,380: WARNING/Worker-1] sqlalchemy.orm.query
[2016-08-19 14:05:40,971: WARNING/Worker-1] update_cellarea
[2016-08-19 14:05:49,969: WARNING/Worker-1] update_datamap
[2016-08-19 14:05:50,091: WARNING/Worker-1] update_datamap
[2016-08-19 14:05:50,142: WARNING/Worker-1] update_datamap
[2016-08-19 14:05:50,168: WARNING/Worker-1] update_datamap
[2016-08-19 14:05:52,997: WARNING/Worker-1] update_blue
[2016-08-19 14:05:53,021: WARNING/Worker-1] update_blue
[2016-08-19 14:05:53,058: WARNING/Worker-1] update_blue
[2016-08-19 14:05:53,118: WARNING/Worker-1] update_blue
[2016-08-19 14:05:53,162: WARNING/Worker-1] update_blue
[2016-08-19 14:05:53,205: WARNING/Worker-1] update_blue
[2016-08-19 14:05:53,233: WARNING/Worker-1] update_blue
[2016-08-19 14:05:53,248: WARNING/Worker-1] update_blue
[2016-08-19 14:05:53,252: WARNING/Worker-1] update_blue
[2016-08-19 14:05:53,275: WARNING/Worker-1] update_blue
[2016-08-19 14:06:09,111: WARNING/Worker-1] update_wifi
[2016-08-19 14:06:09,136: WARNING/Worker-1] update_wifi
[2016-08-19 14:06:09,175: WARNING/Worker-1] update_wifi
[2016-08-19 14:06:09,202: WARNING/Worker-1] update_wifi
[2016-08-19 14:06:09,239: WARNING/Worker-1] update_wifi
[2016-08-19 14:06:09,256: WARNING/Worker-1] update_wifi
[2016-08-19 14:06:09,294: WARNING/Worker-1] update_wifi
[2016-08-19 14:06:09,307: WARNING/Worker-1] update_wifi
[2016-08-19 14:06:09,330: WARNING/Worker-1] update_wifi
[2016-08-19 14:06:09,336: WARNING/Worker-1] update_wifi
[2016-08-19 14:06:09,355: WARNING/Worker-1] update_wifi
[2016-08-19 14:06:09,369: WARNING/Worker-1] update_wifi
[2016-08-19 14:06:09,373: WARNING/Worker-1] update_wifi
[2016-08-19 14:06:09,392: WARNING/Worker-1] update_wifi
[2016-08-19 14:06:09,413: WARNING/Worker-1] update_wifi
[2016-08-19 14:06:09,436: WARNING/Worker-1] update_wifi
[2016-08-19 14:06:09,573: WARNING/Worker-1] update_incoming
[2016-08-19 14:06:09,575: WARNING/Worker-1] query in session.py entities:
我以为到这一步,redis里的东西就会进入到mysql了,然并卵。
仅发现:
1.stat表里有了些记录:
mysql> select * from stat;
+-----+------------+-------+
| key | time | value |
+-----+------------+-------+
| 1 | 2016-08-18 | 0 |
| 1 | 2016-08-19 | 0 |
| 2 | 2016-08-18 | 0 |
| 2 | 2016-08-19 | 0 |
| 3 | 2016-08-18 | 0 |
| 3 | 2016-08-19 | 0 |
| 4 | 2016-08-18 | 0 |
| 4 | 2016-08-19 | 0 |
| 7 | 2016-08-18 | 0 |
| 7 | 2016-08-19 | 0 |
| 8 | 2016-08-18 | 0 |
| 8 | 2016-08-19 | 0 |
| 9 | 2016-08-18 | 0 |
| 9 | 2016-08-19 | 0 |
+-----+------------+-------+
14 rows in set (0.00 sec)
第二个发现时,md,redis里的key跟之前完全不一样了!
127.0.0.1:6379> keys *
1) "statcounter_unique_wifi_20160819"
2) "statcounter_unique_wifi_20160818"
3) "statcounter_unique_blue_20160819"
4) "statcounter_blue_20160818"
5) "statcounter_unique_cell_20160818"
6) "statcounter_unique_cell_ocid_20160818"
7) "statcounter_unique_cell_20160819"
8) "statcounter_wifi_20160818"
9) "statcounter_unique_blue_20160818"
10) "_kombu.binding.celeryev"
11) "_kombu.binding.celery.pidbox"
12) "statcounter_blue_20160819"
13) "statcounter_unique_cell_ocid_20160819"
14) "_kombu.binding.celery"
15) "statcounter_cell_20160818"
16) "statcounter_wifi_20160819"
17) "statcounter_cell_20160819"
还有,某些个时候,开worker的那个终端会报错,报错内容看不懂。