RabbitMQ提供了四种Exchange:fanout,direct,topic,header,常用的是fanout,direct,topic
Direct
- 消息传递时需要一个“routing_key”,可以简单的理解为要发送到的队列名字。
- 这种模式下不需要将Exchange进行任何绑定(binding)操作
接收端
# # -*- coding: utf-8 -*-
import pika
connection = pika.BlockingConnection(pika.ConnectionParameters(host="0.0.0.0", virtual_host="/"))
channel = connection.channel()
channel.exchange_declare(exchange='direct_logs', type='direct')
result = channel.queue_declare(durable=True, queue="direct_key")
def callback(ch, method, properties, body):
print " [x] Received %s routing_key %s" % (body, method.routing_key)
ch.basic_ack(method.delivery_tag)
channel.basic_consume(callback, queue=result.method.queue)
channel.start_consuming()
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
发送端
# # -*- coding: utf-8 -*-
import pika
connection = pika.BlockingConnection(pika.ConnectionParameters(host="0.0.0.0", virtual_host="/"))
channel = connection.channel()
channel.exchange_declare(exchange='direct_logs', type='direct')
channel.basic_publish(exchange='direct_logs',
routing_key='k1',
body="22222222",
properties=pika.BasicProperties(
delivery_mode=2,
))
channel.basic_publish(exchange='direct_logs',
routing_key='k2',
body="22222222",
properties=pika.BasicProperties(
delivery_mode=2,
))
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
Fanout
- 这种模式不需要
routing_key
- 这种模式需要提前将Exchange与Queue进行绑定,一个Exchange可以绑定多个Queue,一个Queue可以同多个Exchange进行绑定。
接收端
# # -*- coding: utf-8 -*-
import pika
connection = pika.BlockingConnection(pika.ConnectionParameters(host="0.0.0.0", virtual_host="/"))
channel = connection.channel()
channel.exchange_declare(exchange='fanout_logs', type='fanout')
result = channel.queue_declare(durable=True)
channel.queue_bind(exchange='fanout_logs', queue=result.method.queue)
def callback(ch, method, properties, body):
print " [x] Received %s routing_key %s" % (body, method.routing_key)
ch.basic_ack(method.delivery_tag)
channel.basic_consume(callback, queue=result.method.queue)
channel.start_consuming()
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
发送端
# # -*- coding: utf-8 -*-
import pika
connection = pika.BlockingConnection(pika.ConnectionParameters(host="0.0.0.0", virtual_host="/"))
channel = connection.channel()
channel.exchange_declare(exchange='fanout_logs', type='fanout')
channel.basic_publish(exchange='fanout_logs',
routing_key='k1',
body="22222222",
properties=pika.BasicProperties(
delivery_mode=2,
))
channel.basic_publish(exchange='fanout_logs',
routing_key='k2',
body="22222222",
properties=pika.BasicProperties(
delivery_mode=2,
))
channel.basic_publish(exchange='fanout_logs',
routing_key='k3',
body="22222222",
properties=pika.BasicProperties(
delivery_mode=2,
))
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
结果
[x] Received 22222222 routing_key k1
[x] Received 22222222 routing_key k2
[x] Received 22222222 routing_key k3
- 1
- 2
- 3
- 4
Topic
- 这种模式需要RouteKey,也许要提前绑定Exchange与Queue。
- 在进行绑定时,要提供一个该队列关心的主题,如
*.log.*
表示该队列关心所有涉及log的消息(一个routing_key为”a.log.error”的消息会被转发到该队列)。
接收端
# # -*- coding: utf-8 -*-
import pika
connection = pika.BlockingConnection(pika.ConnectionParameters(host="0.0.0.0", virtual_host="/"))
channel = connection.channel()
channel.exchange_declare(exchange="topic_logs", type='topic')
result = channel.queue_declare(durable=True)
channel.queue_bind(exchange="topic_logs", queue=result.method.queue, routing_key="*.log.*")
channel.queue_bind(exchange="topic_logs", queue=result.method.queue, routing_key="*.db.cc")
def callback(ch, method, properties, body):
print " [x] Received %s routing_key %s" % (body, method.routing_key)
ch.basic_ack(method.delivery_tag)
channel.basic_consume(callback, queue=result.method.queue)
channel.start_consuming()
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
发送端
# # -*- coding: utf-8 -*-
import pika
connection = pika.BlockingConnection(pika.ConnectionParameters(host="0.0.0.0", virtual_host="/"))
channel = connection.channel()
channel.exchange_declare(exchange='topic_logs', type='topic')
channel.basic_publish(exchange='topic_logs',
routing_key='user.log.error',
body="22222222",
properties=pika.BasicProperties(
delivery_mode=2,
))
channel.basic_publish(exchange='topic_logs',
routing_key='user.log.success',
body="22222222",
properties=pika.BasicProperties(
delivery_mode=2,
))
channel.basic_publish(exchange='topic_logs',
routing_key='ad.db.cc',
body="22222222",
properties=pika.BasicProperties(
delivery_mode=2,
))
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
基于rabbitmq 简单的分布式爬虫程序
架构
- Download进程负责下载页面
- ParseBase监听Download下载完成的消息,解析页面(URL,EMAIL,……)
使用supervisor 管理进程
使用fabfile部署代码
简单版代码
https://github.com/neo-hu/rabbitmq-crawler
完整版
下载:频率修改,代理(翻墙)设置
页面解析:关键字,分词统计等
web管理页面等功能