1. 方案
我们采用RabbitMQ的Mirroed Queue(http://www.rabbitmq.com/ha.html)方案(Active/Active的方案),首先需要搭建RabbitMQ的集群(http://www.rabbitmq.com/clustering.html)。
假如我们有两台机器 IP 分别为 10.10.126.36 (M1)和10.10.126.71(M2), 我们需要搭建M1为master, M2为slave的集群。
该方案的特点如下:
(1)两个机器的数据是同步的;
(2)如果slave挂掉, master仍将进行服务;如果master挂掉,slave被选为master, 其中没有从master同步的消息,将丢失。
(3)如果机器全部挂掉,如果数据持久化, 先重启的机器将成为master, 后来的重启的丢弃原来的内容,并成为slave, 并且不会同步maste现有r的消息,只有新进来的消息才会同步master的消息。
2. 搭建
2.0 卸载原来的MQ(2.8.1)
cd /opt/apps/rabbitmq-server/
./rabbitmqctl stop_app
./rabbitmq-server stop
ps aux|grep rabbitmq
kill -9 pid
yum remove rabbitmq*
rm -rf /var/lib/rabbitmq
rm -rf /var/log/rabbitmq
rm -rf /etc/rabbitmq
rm -rf ~/.erlang.cookie
exchange :
ReputationExchange | direct |
direct_logs | direct | D |
sendcloud | direct | D |
myChannel | fanout |
messge queue:
Overview | Messages | Message rates | |||||||
---|---|---|---|---|---|---|---|---|---|
Name | Exclusive | Parameters | Status | Ready | Unacked | Total | incoming | deliver / get | ack |
email_queue | D | Idle | 0 | 0 | 0 | ||||
queue_data_analysis | D | Active | 0 | 0 | 0 | 1.7/s | |||
queue_logs2 | D | Active | 0 | 0 | 0 | 1.7/s | 1.7/s | ||
webhooks_queue | D | Active | 0 | 0 | 0 | 7.8/s | 7.8/s | ||
webhooks_queue2 | D | Active | 22 | 0 | 22 | 0.00/s | |||
webhooks_queue3 | D | Idle | 73 | 0 | 73 | ||||
webhooks_queue4 | D | Active | 171 | 4 | 175 | 1.3/s | 1.3/s | ||
webhooks_queue5 | D | Active | 289 | 2 | 291 | 1.3/s | 0.67/s | 0.67/s | |
webhooks_queue6 | D | Active | 675 | 0 | 675 | 0.67/s |
|
From | Routing key | Arguments | |
---|---|---|---|
(AMQP default)Exchange | queue_data_analysis | ||
direct_logsExchange | route_mail |
From | Routing key | Arguments | |
---|---|---|---|
(AMQP default)Exchange | queue_logs2 | ||
direct_logsExchange | route_mail |
2.1 安装RabbitMQ(3.0.1.1)
http://www.rabbitmq.com/install-rpm.html
wget -O /etc/yum.repos.d/epel-erlang.repo http://repos.fedorapeople.org/repos/peter/erlang/epel-erlang.repo
yum install erlang
rpm --import http://www.rabbitmq.com/rabbitmq-signing-key-public.asc
wget -c http://www.rabbitmq.com/releases/rabbitmq-server/v3.0.1/rabbitmq-server-3.0.1-1.noarch.rpm yum install rabbitmq-server-3.0.1-1.noarch.rpm
#chkconfig rabbitmq-server on 这个还是手动比较好,免得首先重启了slave, 就悲剧了!!!
rabbitmq-plugins enable rabbitmq_management
/sbin/service rabbitmq-server start
日志目录:
cd /var/log/rabbitmq
停止操作:
rabbitmqctl stop
查看状态:
rabbitmqctl status
2.1 搭建RabbitMQ集群
因为RabbitMQerlang进行通信, 因此需要使用统一的cookie。cookie地址/var/lib/rabbitmq/.erlang.cookie 。
我们需要保证各台RabbitMQ的cookie一样,最好的办法是直接scp过去。
(1)我们的办法先拷贝到跳板机上。
scp /var/lib/rabbitmq/.erlang.cookie root@10.10.70.126:/opt/
(2)关掉10.10.126.71的rabbitmq
rabbitmqctl stop
(3)复制jump机器的cookie
scp /opt/.erlang.cookie root@10.10.126.71:/var/lib/rabbitmq/
(4)改变10.10.126.71 cookie的用户组和用户(保证都是rabbitmq)并启动
chown rabbitmq.rabbitmq .erlang.cookie
/sbin/service rabbitmq-server start
=================================================
(5)搭建集群
域名绑定:/etc/hosts
10.10.126.71 zw_126_71
10.10.126.36 zw_126_36
===
zw_126_71(salve)$ rabbitmqctl stop_app
zw_126_71(salve)$ rabbitmqctl join_cluster rabbit@zw_126_36
zw_126_71(salve)$ rabbitmqctl start_app
===
查看状态:
rabbitmqctl cluster_status
(6) 如果需要脱离集群:
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl start_app
(7)配置mirrored queue(http://www.rabbitmq.com/ha.html)
配置复制(复制全部queue)
rabbitmqctl set_policy ha-all ".*" '{"ha-mode":"all"}'
如果只需要复制某些queue, 如只需要复制webhooks开头的queue
rabbitmqctl set_policy ha-webhooks "^webhooks" '{"ha-mode":"all"}'
那么将复制: webhooks_queue, webhooks_queue2等queue。
这个也可以在http://10.10.126.36:55672/#/users的页面进行polices配置(admin)。
3. 访问
搭建好之后,我们需要考虑如何访问RabbitMQ, 并且当down掉时如何处理相应的情况。
3.1 访问方法
http://www.rabbitmq.com/clustering.html 官方说法如下:
Connecting to Clusters from Clients
A client can connect as normal to any node within a cluster. If that node should fail, and the rest of the cluster survives, then the client should notice the closed connection, and should be able to reconnect to some surviving member of the cluster. Generally, it's not advisable to bake in node hostnames or IP addresses into client applications: this introduces inflexibility and will require client applications to be edited, recompiled and redeployed should the configuration of the cluster change or the number of nodes in the cluster change. Instead, we recommend a more abstracted approach: this could be a dynamic DNS service which has a very short TTL configuration, or a plain TCP load balancer, or some sort of mobile IP achieved with pacemaker or similar technologies. In general, this aspect of managing the connection to nodes within a cluster is beyond the scope of RabbitMQ itself, and we recommend the use of other technologies designed specifically to solve these problems.
总结起来有:
(1)客户端自己处理集群的访问,需要编辑程序,需要处理节点挂掉的相关逻辑;
(2)动态DNS服务;
(3)TCP 负载均衡,如Nginx, haproxy之类;
(4)mobile IP achieved with pacemaker
考虑到我们只有两个节点,可以自己编写程序来实现,实现也较为简单。当业务增长时,可以考虑tcp负载均衡等方案。
3.2 访问实现
生产者模型为: 创建connection => 创建channnel => 生产消息
消费着模型为: 创建connection => 创建channnel =>consume 获取消息=> 发送ack
3. 2. 1 需要关心的异常
经过测试,如果MQdown掉,各个阶段产生异常如下(JAVA 为例):
1. connection
java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
2. channel
com.rabbitmq.client.AlreadyClosedException: clean connection shutdown; reason: Attempt to use closed connection
at com.rabbitmq.client.impl.AMQConnection.ensureIsOpen(AMQConnection.java:154)
3. consume 获取消息
channel.basicConsume("webhooks_queue", false, consumer); // 会抛出异常
ShutdownSignalException是 AlreadyClosedException的父类
com.rabbitmq.client.AlreadyClosedException: clean connection shutdown; reason: Attempt to use closed channel
at com.rabbitmq.client.impl.AMQChannel.ensureIsOpen(AMQChannel.java:190)
----------------------------------------------------------------------
QueueingConsumer.Delivery delivery = consumer.nextDelivery(); // ShutdownSignalException
String message = new String(delivery.getBody());
com.rabbitmq.client.ShutdownSignalException: connection error; reason: {#method<connection.close>(reply-code=320, reply-text=CONNECTION_FORCED - broker forced connection closure with reason 'shutdown', class-id=0, method-id=0), null, ""}
4. ack
com.rabbitmq.client.AlreadyClosedException: clean connection shutdown; reason: Attempt to use closed channel
at com.rabbitmq.client.impl.AMQChannel.ensureIsOpen(AMQChannel.java:190)
5. 生产消息
com.rabbitmq.client.AlreadyClosedException: clean connection shutdown; reason: Attempt to use closed channel
at com.rabbitmq.client.impl.AMQChannel.ensureIsOpen(AMQChannel.java:190)
也就是说,这些逻辑都可以被控制。
3. 2. 3 处理方法
参看附件 RabbitMqUtils.java
(1)生产
Address[] RABBIT_ADDRS = Address.parseAddresses("10.10.126.36:5672,10.10.126.71:5671");
long TRY_INTERVAL = 60000l // 1 minute
Connection connection = RabbitMqUtils.createConnection(MqHandlerImpl.RABBIT_ADDRS, TRY_INTERVAL);
Channel channel = RabbitMqUtils.createChannel(connection, MqHandlerImpl.RABBIT_ADDRS, TRY_INTERVAL);
// 取得消息,对消息的处理进行任何异常的捕获。
// String message = getMsg();
try{
channel.basicPublish("direct_logs", "route_mail", MessageProperties.PERSISTENT_BASIC, message.getBytes());
RabbitMqUtils.closeChannel(channel);
RabbitMqUtils.closeConnection(connection);
} catch (Exception e) { // 这里基本上可以断定时mq挂掉了。
// 重连MQ
}
(2)消费
Address[] RABBIT_ADDRS = Address.parseAddresses("10.10.126.36:5672,10.10.126.71:5671");
long TRY_INTERVAL = 60000l // 1 minute
Connection connection = RabbitMqUtils.createConnection(MqHandlerImpl.RABBIT_ADDRS, TRY_INTERVAL);
Channel channel = RabbitMqUtils.createChannel(connection, MqHandlerImpl.RABBIT_ADDRS, TRY_INTERVAL);
channel.basicConsume(queuename, false, consumer);
while (true) {
try {
delivery = consumer.nextDelivery();
String message = new String(delivery.getBody());
// 处理消息
//........................
// ack
channel.basicAck(delivery.getEnvelope().getDeliveryTag(), false);
} catch (Exception e) { // 这里基本上可以断定时mq挂掉了。
// 10s后重新开始新建mq连接
Thread.sleep(10000);
try {
RabbitMqUtils.closeChannel(channel);
RabbitMqUtils.closeConnection(connection);
connection = RabbitMqUtils.createConnection(RABBIT_ADDRS, TRY_INTERVAL);
channel = RabbitMqUtils.createChannel(connection, RABBIT_ADDRS, TRY_INTERVAL);
// 重连MQ
}