Concurrency in F# – Part III – Erlang Style Message Passing

8 篇文章 0 订阅
3 篇文章 0 订阅

参见:http://strangelights.com/blog/archive/2007/10/24/1601.aspx

Why is the introduction of Erlang style message passing into F# interesting? Well you may have never heard of Erlang but if you’ve ever used a cell phone you’ve probably used an Erlang system. Erlang was originally built by Ericsson and released as open source in 1998, it was built to support the highly distributed and fault-tolerant systems required to support mobile phone networks. Many people consider Erlang to be the language that gets concurrency right.

So if you want to do concurrency why not just use Erlang? Well the fact that Erlang is built on its own custom runtime means there are few libraries or tools available for it, well few compared to the number available for .NET. Tim Brady has series of post on Erlang were he criticizes it for slow file I/O and regular expressions; again I would put these problems down to the custom run time. So taking what’s good from Erlang and put it into a language, that’s build on a platform that has lots of libraries and tools and good file I/O and fast regular expressions, would seem like a very good idea.

Erlang programs are typically composed of agents that pass messages to each other, the messages being passed between agents via a message queue. In F# we create an agent using the MailboxProcessor.Start function, this function takes a function as a parameter and is passed an instance of the message and must return an asynchronous workflow. In this work flow you will typically read messages from the mailbox and process them. Below we see an example of a word counting agent, that is an agent that will count the number of times it is passed a word. It is the sort of thing we might use to perform statistical analysis on a text:

/// The internal type of messages for the agent

type Message = Word of string | Fetch of IChannel<Map<string,int>> | Stop

 

type WordCountingAgent() =

    let counter = MailboxProcessor.Start(fun inbox ->

             // The states of the message processing state machine...

             let rec loop(words : Map<string,int>) =

                async { let! msg = inbox.Receive()

                        match msg with

                        | Word word ->

                            if words.ContainsKey word then

                                let count = words.[word]

                                let words = words.Remove word

                                return! loop(words.Add (word, (count + 1)) )

                            else

                                // do printfn "New word: %s" word

                                return! loop(words.Add (word, 1) )

                               

                        | Stop ->

                            // exit

                            return ()

                        | Fetch  replyChannel  ->

                            // post response to reply channel and continue

                            do replyChannel.Post(words)

                            return! loop(words) }

 

             // The initial state of the message processing state machine...

             loop(Map.empty))

 

    member a.AddWord(n) = counter.Post(Word(n))

    member a.Stop() = counter.Post(Stop)

    member a.Fetch() = counter.PostSync(fun replyChannel -> Fetch(replyChannel))

 

There are two things worth noting about the overall design, first we use a sum type to represent all the possible types of messages, this is a very common pattern for this style of programming and secondly we wrap our agent in a class to provide a friendlier interface to the outside world. A happy side effect of this is that other .NET languages would find this class really easy to use too.

Now if we look more at the implementation details we see all the work is done in the function we pass to MailboxProcessor.Start. Here we read the messages we are posted and perform the relevant actions. The actual work of counting words is done in the “Word” action, here we use a Map, a function data structure similar to a dictionary but immutable, to store the words along with the count for the number of times it has been found. We use the Post function of the MailboxProcessor to post the Word message to the message queue. In the “Fetch” action we return the current Map containing all the words found to date, notice how this is implemented using the special PostSync function provided by the MailboxProcessor. And the “Stop”action stops the agent J

The advantage of implementing the word counting agent in this way is the agent is now thread safe and can be shared between threads working on related texts freely. Also as we use an immutable map to store the state we can pass this out to the outside world and carry on processing without having to worry about it becoming inconsistent and corrupted.

To demonstrate this I wrote some code to read from text files and analyze the numbers of occurrences of each word:

let counter = new WordCountingAgent()

 

let readLines file =

  seq { use r = new StreamReader( File.OpenRead file )

        while not r.EndOfStream do yield r.ReadLine() }

 

let processFile file =

    let lines = readLines file

    for line in lines do

        let punctuation = [| ' '; '.'; '"'; ''';

          ','; ';'; ':'; '!'; '?'; '-'; '('; ')'; |]

        let words = line.Split(punctuation)

        for word in words do

            if word.Length > 0 then

                counter.AddWord word

 

let printWords = false

 

let main() =

    let autoResetEvent = new AutoResetEvent(false)

    let files = Directory.GetFiles(@"C:/Users/robert/Documents/Fielding")

    let i = ref 0

    for file in files do

        use readfile = new BackgroundWorker()

        readfile.DoWork.Add(fun _ ->

            printfn "Starting '%s'" (Path.GetFileNameWithoutExtension file)

            processFile file |> ignore )

        readfile.RunWorkerCompleted.Add(fun _ ->

            printfn "Finished '%s'" (Path.GetFileNameWithoutExtension file)

            incr i

            if !i = files.Length then

                autoResetEvent.Set() |> ignore)

        readfile.RunWorkerAsync()

    while not (autoResetEvent.WaitOne(100, false)) do

        let words = counter.Fetch()

        printfn "Words: %i" words.Count

    let res = counter.Fetch()

   

    printfn "Finished Words: %i" res.Count

    if printWords then

        res.Iterate (fun k v -> printfn "%s : %i" k v)

    counter.Stop()

    read_line()

   

main()

 

You’ll see in the implementation we use the background worker thread to process the texts, this because we need notification of when the processing has finished and asynchronous workflows do not yet offer this. We could very well have used stuff from the “Task Parallel Library”, but I didn’t simply because I thought it would make it easier for people to test the sample for themselves. More on the Task Parallel Library later.

I’m not sure this is a very realistic approach to the problem as if we were doing it properly we’d probably analyze each text then merger the results (because know the results for each individual text could also be interesting). Also if we were doing it properly we’d probably pay much more attention to how we split up the words. But at the end of the day it is a good way to provide work to test our agent.

I choose the analyze the works of Henry Fielding, which I downloaded from http://www.gutenberg.org, I choose Fielding because  I have a soft spot for Tom Jones, as also because the number of works available on project Gutenberg from Fielding was small enough so I could download them all yet large enough to provide a decent amount of work.

We run on my duel core machine we get the following output (varies slightly from run to run):

Starting 'From This World to the Next'

Starting 'Amelia'

Starting 'History of Tom Jones, a Foundling'

Starting 'Joseph Andrews Volume 1'

Starting 'Joseph Andrews Volume 2'

Starting 'Journal of a Voyage to Lisbon - Volume 1'

Starting 'The History of the Life of the Late Mr Jonathan Wild the Great'

Starting 'The Works of Henry Fielding'

Words: 1236

Finished 'From This World to the Next'

Finished 'Joseph Andrews Volume 1'

Finished 'Journal of a Voyage to Lisbon - Volume 1'

Finished 'The Works of Henry Fielding'

Finished 'Joseph Andrews Volume 2'

Finished 'The History of the Life of the Late Mr Jonathan Wild the Great'

Finished 'Amelia'

Finished 'History of Tom Jones, a Foundling'

Words: 16610

Finished Words: 24469

 

There’s a couple of things worth noting about the output. First for quite a while it makes both processors work at full speed. This would seem to suggest that we’re getting the number of threads right – there are threads to carry on working while the other threads are blocked doing I/O. Here the Task Parallel Library’s TaskManager class might have helped us limited the number of threads in action at one time and reduce context switching overhead. But after the last book is processed and we see: Finished 'History of Tom Jones, a Foundling' one process carries on working on its own while the other idles, it takes a while before we see Finished Words: 24469. This effectively means we’ve overloaded our agent and it has not been able to process all the words that we put into it, but thanks to the message queue it did eventually catch up when more work stopped being added. This is why this style of programming is a good choice when the work loaded is varied, even if an agent can process all it work at peak times it can store work up till the system has some idle time and finish processing then.

It’s also worth noting we only ever see too intermediate word counts (Words: 1236 and Words: 16610). The code that produces these word counts is as follows:

    while not (autoResetEvent.WaitOne(100, false)) do

        let words = counter.Fetch()

        printfn "Words: %i" words.Count

 

So given that samples runs for about considerable longer that 200 m/s we might expect to see alot more that 2 intermediate word counts. We don’t because fetching the dictionary has to be done synchronously so our “Fetch” action has to wait in the queue to be process. This means that the thread is blocked at the counter.Fetch() instruction for considerably longer than it is blocked at autoResetEvent.WaitOne(100, false), and thus we only see too intermediate counts.

 

Wrapping it up, I think we’ve see than Erlang style message passing provides an interesting model for creating concurrent applications. Even if a more realistic application would be composed of many agents working together passing messages to each other we have seen the core of an agents tasks, it ability to keep a consistent data structure while being called from many different threads. Writing this post made me reflect on the difference between this style of programming and the style offered by the TaskManager in the Task Parallel Library. It seems to me that in the Task Parallel Library you only ever control inputting messages/actions to the queue where as using a MailboxProcessor in F# allows you to control both ends of the queue. The TaskManager has the advantage that it can use also sort of clever heuristics to know when to start new threads/actions as it controls the executions of new tasks. However, using a MailboxProcessor in F# would seem to offer some interesting possibilities not yet possibly with the Task Parallel Library.

Anyway, this series will continue!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
东南亚位于我国倡导推进的“一带一路”海陆交汇地带,作为当今全球发展最为迅速的地区之一,近年来区域内生产总值实现了显著且稳定的增长。根据东盟主要经济体公布的最新数据,印度尼西亚2023年国内生产总值(GDP)增长5.05%;越南2023年经济增长5.05%;马来西亚2023年经济增速为3.7%;泰国2023年经济增长1.9%;新加坡2023年经济增长1.1%;柬埔寨2023年经济增速预计为5.6%。 东盟国家在“一带一路”沿线国家中的总体GDP经济规模、贸易总额与国外直接投资均为最大,因此有着举足轻重的地位和作用。当前,东盟与中国已互相成为双方最大的交易伙伴。中国-东盟贸易总额已从2013年的443亿元增长至 2023年合计超逾6.4万亿元,占中国外贸总值的15.4%。在过去20余年中,东盟国家不断在全球多变的格局里面临挑战并寻求机遇。2023东盟国家主要经济体受到国内消费、国外投资、货币政策、旅游业复苏、和大宗商品出口价企稳等方面的提振,经济显现出稳步增长态势和强韧性的潜能。 本调研报告旨在深度挖掘东南亚市场的增长潜力与发展机会,分析东南亚市场竞争态势、销售模式、客户偏好、整体市场营商环境,为国内企业出海开展业务提供客观参考意见。 本文核心内容: 市场空间:全球行业市场空间、东南亚市场发展空间。 竞争态势:全球份额,东南亚市场企业份额。 销售模式:东南亚市场销售模式、本地代理商 客户情况:东南亚本地客户及偏好分析 营商环境:东南亚营商环境分析 本文纳入的企业包括国外及印尼本土企业,以及相关上下游企业等,部分名单 QYResearch是全球知名的大型咨询公司,行业涵盖各高科技行业产业链细分市场,横跨如半导体产业链(半导体设备及零部件、半导体材料、集成电路、制造、封测、分立器件、传感器、光电器件)、光伏产业链(设备、硅料/硅片、电池片、组件、辅料支架、逆变器、电站终端)、新能源汽车产业链(动力电池及材料、电驱电控、汽车半导体/电子、整车、充电桩)、通信产业链(通信系统设备、终端设备、电子元器件、射频前端、光模块、4G/5G/6G、宽带、IoT、数字经济、AI)、先进材料产业链(金属材料、高分子材料、陶瓷材料、纳米材料等)、机械制造产业链(数控机床、工程机械、电气机械、3C自动化、工业机器人、激光、工控、无人机)、食品药品、医疗器械、农业等。邮箱:market@qyresearch.com

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值