Akka 编程(20):容错处理(一)

13 篇文章 0 订阅

我们在前面介绍Actor系统时说过每个Actor都是其子Actor的管理员,并且每个Actor定义了发生错误时的管理策略,策略一旦定义好,之后不能修改,就像是Actor系统不可分割的一部分。
实用错误处理
首先我们来看一个例子来显示一种处理数据存储错误的情况,这是现实中一个应用可能出现的典型错误。当然实际的应用可能针对数据源不存在时有不同的处理,这里我们使用重新连接的处理方法。
下面是例子的源码,比较长,需要仔细阅读,最好是实际运行,参考日志来理解:

1import akka.actor._
2import akka.actor.SupervisorStrategy._
3import scala.concurrent.duration._
4import akka.util.Timeout
5import akka.event.LoggingReceive
6import akka.pattern.{ask, pipe}
7import com.typesafe.config.ConfigFactory
8 
9/**
10 * Runs the sample
11 */
12object FaultHandlingDocSample extends App {
13 
14  import Worker._
15 
16  val config = ConfigFactory.parseString( """
17      akka.loglevel = "DEBUG"
18      akka.actor.debug {
19      receive = on
20      lifecycle = on
21      }
22      """)
23 
24  val system = ActorSystem("FaultToleranceSample", config)
25  val worker = system.actorOf(Props[Worker], name = "worker")
26  val listener = system.actorOf(Props[Listener], name = "listener")
27  // start the work and listen on progress
28  // note that the listener is used as sender of the tell,
29  // i.e. it will receive replies from the worker
30  worker.tell(Start, sender = listener)
31}
32 
33/**
34 * Listens on progress from the worker and shuts down the system when enough
35 * work has been done.
36 */
37class Listener extends Actor with ActorLogging {
38 
39  import Worker._
40 
41  // If we don’t get any progress within 15 seconds then the service is unavailable
42  context.setReceiveTimeout(15 seconds)
43 
44  def receive = {
45    case Progress(percent) =>
46      log.info("Current progress: {} %", percent)
47      if (percent >= 100.0) {
48        log.info("That’s all, shutting down")
49        context.system.shutdown()
50      }
51    case ReceiveTimeout =>
52      // No progress within 15 seconds, ServiceUnavailable
53      log.error("Shutting down due to unavailable service")
54      context.system.shutdown()
55  }
56}
57 
58object Worker {
59 
60  case object Start
61 
62  case object Do
63 
64  final case class Progress(percent: Double)
65 
66}
67 
68/**
69 * Worker performs some work when it receives the ‘Start‘ message.
70 * It will continuously notify the sender of the ‘Start‘ message
71 * of current ‘‘Progress‘‘. The ‘Worker‘ supervise the ‘CounterService‘.
72 */
73class Worker extends Actor with ActorLogging {
74 
75  import Worker._
76  import CounterService._
77 
78  implicit val askTimeout = Timeout(5 seconds)
79  // Stop the CounterService child if it throws ServiceUnavailable
80  override val supervisorStrategy = OneForOneStrategy() {
81    case _: CounterService.ServiceUnavailable => Stop
82  }
83  // The sender of the initial Start message will continuously be notified
84  // about progress
85  var progressListener: Option[ActorRef] = None
86  val counterService = context.actorOf(Props[CounterService], name ="counter")
87  val totalCount = 51
88 
89  import context.dispatcher
90 
91  // Use this Actors’ Dispatcher as ExecutionContext
92  def receive = LoggingReceive {
93    case Start if progressListener.isEmpty =>
94      progressListener = Some(sender())
95      context.system.scheduler.schedule(Duration.Zero, 1 second, self, Do)
96    case Do =>
97      counterService ! Increment(1)
98      counterService ! Increment(1)
99      counterService ! Increment(1)
100      // Send current progress to the initial sender
101      counterService ? GetCurrentCount map {
102        case CurrentCount(_, count) => Progress(100.0 * count / totalCount)
103      } pipeTo progressListener.get
104  }
105}
106 
107object CounterService {
108 
109  final case class Increment(n: Int)
110 
111  case object GetCurrentCount
112 
113  final case class CurrentCount(key: String, count: Long)
114 
115  class ServiceUnavailable(msg: String) extends RuntimeException(msg)
116 
117  private case object Reconnect
118 
119}
120 
121/**
122 * Adds the value received in ‘Increment‘ message to a persistent
123 * counter. Replies with ‘CurrentCount‘ when it is asked for ‘CurrentCount‘.
124 * ‘CounterService‘ supervise ‘Storage‘ and ‘Counter‘.
125 */
126class CounterService extends Actor {
127 
128  import CounterService._
129  import Counter._
130  import Storage._
131 
132  // Restart the storage child when StorageException is thrown.
133  // After 3 restarts within 5 seconds it will be stopped.
134  override val supervisorStrategy = OneForOneStrategy(maxNrOfRetries = 3,
135    withinTimeRange = 5 seconds) {
136    case _: Storage.StorageException => Restart
137  }
138  val key = self.path.name
139  var storage: Option[ActorRef] = None
140  var counter: Option[ActorRef] = None
141  var backlog = IndexedSeq.empty[(ActorRef, Any)]
142  val MaxBacklog = 10000
143 
144  import context.dispatcher
145 
146  // Use this Actors’ Dispatcher as ExecutionContext
147  override def preStart() {
148    initStorage()
149  }
150 
151  /**
152   * The child storage is restarted in case of failure, but after 3 restarts,
153   * and still failing it will be stopped. Better to back-off than continuously
154   * failing. When it has been stopped we will schedule a Reconnect after a delay.
155   * Watch the child so we receive Terminated message when it has been terminated.
156   */
157  def initStorage() {
158    storage = Some(context.watch(context.actorOf(Props[Storage], name ="storage")))
159    // Tell the counter, if any, to use the new storage
160    counter foreach {
161      _ ! UseStorage(storage)
162    }
163    // We need the initial value to be able to operate
164    storage.get ! Get(key)
165  }
166 
167  def receive = LoggingReceive {
168    case Entry(k, v) if == key && counter == None =>
169      // Reply from Storage of the initial value, now we can create the Counter
170      val = context.actorOf(Props(classOf[Counter], key, v))
171      counter = Some(c)
172      // Tell the counter to use current storage
173      c ! UseStorage(storage)
174      // and send the buffered backlog to the counter
175      for ((replyTo, msg) <- backlog) c.tell(msg, sender = replyTo)
176      backlog = IndexedSeq.empty
177    case msg@Increment(n) => forwardOrPlaceInBacklog(msg)
178 
179    case msg@GetCurrentCount => forwardOrPlaceInBacklog(msg)
180    case Terminated(actorRef) if Some(actorRef) == storage =>
181      // After 3 restarts the storage child is stopped.
182      // We receive Terminated because we watch the child, see initStorage.
183      storage = None
184      // Tell the counter that there is no storage for the moment
185      counter foreach {
186        _ ! UseStorage(None)
187      }
188      // Try to re-establish storage after while
189      context.system.scheduler.scheduleOnce(10 seconds, self, Reconnect)
190    case Reconnect =>
191      // Re-establish storage after the scheduled delay
192      initStorage()
193  }
194 
195  def forwardOrPlaceInBacklog(msg: Any) {
196    // We need the initial value from storage before we can start delegate to
197    // the counter. Before that we place the messages in a backlog, to be sent
198    // to the counter when it is initialized.
199    counter match {
200      case Some(c) => c forward msg
201      case None =>
202        if (backlog.size >= MaxBacklog)
203          throw new ServiceUnavailable(
204            "CounterService not available, lack of initial value")
205        backlog :+= (sender() -> msg)
206    }
207  }
208}
209 
210object Counter {
211 
212  final case class UseStorage(storage: Option[ActorRef])
213 
214}
215 
216/**
217 * The in memory count variable that will send current
218 * value to the ‘Storage‘, if there is any storage
219 * available at the moment.
220 */
221class Counter(key: String, initialValue: Long) extends Actor {
222 
223  import Counter._
224  import CounterService._
225  import Storage._
226 
227  var count = initialValue
228  var storage: Option[ActorRef] = None
229 
230  def receive = LoggingReceive {
231    case UseStorage(s) =>
232      storage = s
233      storeCount()
234    case Increment(n) =>
235      count += n
236      storeCount()
237    case GetCurrentCount =>
238      sender() ! CurrentCount(key, count)
239  }
240 
241  def storeCount() {
242    // Delegate dangerous work, to protect our valuable state.
243    // We can continue without storage.
244    storage foreach {
245      _ ! Store(Entry(key, count))
246    }
247  }
248}
249 
250object DummyDB {
251 
252  import Storage.StorageException
253 
254  private var db = Map[String, Long]()
255 
256  @throws(classOf[StorageException])
257  def save(key: String, value: Long): Unit = synchronized {
258    if (11 <= value && value <= 14)
259      throw new StorageException("Simulated store failure " + value)
260    db += (key -> value)
261  }
262 
263  @throws(classOf[StorageException])
264  def load(key: String): Option[Long] = synchronized {
265    db.get(key)
266  }
267}
268 
269object Storage {
270 
271  final case class Store(entry: Entry)
272 
273  final case class Get(key: String)
274 
275  final case class Entry(key: String, value: Long)
276 
277  class StorageException(msg: String) extends RuntimeException(msg)
278 
279}
280 
281/**
282 * Saves key/value pairs to persistent storage when receiving ‘Store‘ message.
283 * Replies with current value when receiving ‘Get‘ message.
284 * Will throw StorageException if the underlying data store is out of order.
285 */
286class Storage extends Actor {
287 
288  import Storage._
289 
290  val db = DummyDB
291 
292  def receive = LoggingReceive {
293    case Store(Entry(key, count)) => db.save(key, count)
294    case Get(key) => sender() ! Entry(key, db.load(key).getOrElse(0L))
295  }
296}

这个例子定义了五个Actor,分别是Worker, Listener, CounterService ,Counter 和 Storage,下图给出了系统正常运行时的流程(无错误发生的情况):
20140830001

 

其中Worker是CounterService的父Actor(管理员),CounterService是Counter和Storage的父Actor(管理员)图中浅红色,白色代表引用,其中Worker引用了Listener,Listener也引用了Worker,它们之间不存在父子关系,同样Counter也引用了Storage,但Counter不是Storage的管理员。

正常流程如下:

步骤描述
1progress Listener 通知Worker开始工作.
2Worker通过定时发送Do消息给自己来完成工作
3,4,5Worker接受到Do消息时,通知其子Actor CounterService 三次递增计数器,

CounterService 将Increment消息转发给Counter,它将递增计数器变量然后把当前值发送给Storeage保存

6,7 Workier询问CounterService 当前计数器的值,然后通过管道把结果传给Listener

下图给出系统出错的情况,例子中Worker和CounterService作为管理员分别定义了两个管理策略,Worker在收到CounterService 的ServiceUnaviable上终止CounterService的运行,而CounterService在收到StorageException时重启Storage。

20140830002

 

出错时的流程

步骤描述
1 Storage抛出StorageException异常
2 Storage的管理员CounterService根据策略在接受到StorageException异常后重启Storage
3,4,5,6 Storage继续出错并重启
7 如果在5秒钟之内Storage出错三次并重启,其管理员(CounterService)就终止Storage运行
8 CounterService 同时监听Storage的Terminated消息,它在Storeage终止后接受到Terminated消息
9,10,11 并且通知Counter 暂时没有Storage
12 CounterService 延时一段时间给自己发生Reconnect消息
13,14 当它收到Reconnect消息时,重新创建一个Storage
15,16 然后通知Counter使用新的Storage

这里给出运行的一个日志供参考。

  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值