Java里面要实现Code Replacement,也就是什么热部署,通常是使用ClassLoader机制。不久前看到了一篇Google的Paper,里面讲解的C++代码热部署更为复杂。
在Erlang里面,实现Code Replacement其实很简单,最方便的方法可以参考 《Erlang Reference Manual》的12.3:
- -module(m).
- -export([loop/0 ]).
- loop() ->
- receive
- code_switch ->
- m:loop();
- Msg ->
- ...
- loop()
- end.
-module(m).
-export([loop/0]).
loop() ->
receive
code_switch ->
m:loop();
Msg ->
...
loop()
end.
这种简单的HelloWorld例子,不能满足我的疑问。在一个更加复杂的应用里面,比如有多个Process,部分Process的代码更换后,其他没有更新的Process会怎样呢?下面做个试验。
实例代码:codereload.erl
- -module(codereload).
- -export([main/0 , master_loop/ 2 , worker_loop/ 0 ]).
- -define(VERSION, "0.1" ).
- main() ->
- process_flag(trap_exit, true ),
- Pid1 = spawn(?MODULE, worker_loop, []),
- Pid2 = spawn(?MODULE, worker_loop, []),
- spawn(fun() -> register(master, self()), master_loop(Pid1, Pid2) end).
- master_loop(Pid1, Pid2) ->
- io:format("Pid1, Pid2 is alive ~p ~p~n" ,[is_process_alive(Pid1), is_process_alive(Pid2)]),
- receive
- refresh ->
- io:format("Master code reload~n" ),
- Pid1 ! refresh,
- Pid2 ! refresh,
- codereload:master_loop(Pid1, Pid2);
- Any ->
- io:format("Master ~p receive message: ~p~n" , [?VERSION, Any]),
- Pid1 ! Any,
- Pid2 ! Any,
- master_loop(Pid1, Pid2)
- end.
- worker_loop() ->
- receive
- refresh ->
- io:format("Worker code ~p reload~n" , [self()]),
- codereload:worker_loop();
- Any ->
- io:format("Worker ~p at ~p also receive message: ~p~n" , [?VERSION, self(), Any]),
- worker_loop()
- end.
-module(codereload).
-export([main/0, master_loop/2, worker_loop/0]).
-define(VERSION, "0.1").
main() ->
process_flag(trap_exit, true),
Pid1 = spawn(?MODULE, worker_loop, []),
Pid2 = spawn(?MODULE, worker_loop, []),
spawn(fun() -> register(master, self()), master_loop(Pid1, Pid2) end).
master_loop(Pid1, Pid2) ->
io:format("Pid1, Pid2 is alive ~p ~p~n",[is_process_alive(Pid1), is_process_alive(Pid2)]),
receive
refresh ->
io:format("Master code reload~n"),
Pid1 ! refresh,
Pid2 ! refresh,
codereload:master_loop(Pid1, Pid2);
Any ->
io:format("Master ~p receive message: ~p~n", [?VERSION, Any]),
Pid1 ! Any,
Pid2 ! Any,
master_loop(Pid1, Pid2)
end.
worker_loop() ->
receive
refresh ->
io:format("Worker code ~p reload~n", [self()]),
codereload:worker_loop();
Any ->
io:format("Worker ~p at ~p also receive message: ~p~n", [?VERSION, self(), Any]),
worker_loop()
end.
程序里面有三个进程,一个Master和两个Worker,下面使用分布式进程通信来实现Code Replacement。
使用
erl -sname bar@localhost
启动erlang shell
启动codereload:
Pid1, Pid2 is alive true true
<0.39.0>
发一个"HI"过去看看:
"HI"
嗯,收到了
(foo@localhost)2> Pid1, Pid2 is alive true true
(foo@localhost)2> Worker "0.1" at <0.37.0> also receive message: "HI"
(foo@localhost)2> Worker "0.1" at <0.38.0> also receive message: "HI"
好了,将源程序改一下:
- -define(VERSION, "0.1" ). --> -define(VERSION, "0.2" ).
-define(VERSION, "0.1"). --> -define(VERSION, "0.2").
- Pid2 ! refresh, --> %Pid2 ! refresh,
Pid2 ! refresh, --> %Pid2 ! refresh,
这里改了版本号,还有注释掉发送给Pid2的更新通知
先编译新版本
{ok,codereload}
发送更新通知:
refresh
收到了:
(foo@localhost)3> Worker code <0.37.0> reload
(foo@localhost)3> Worker code <0.38.0> reload
(foo@localhost)3> Pid1, Pid2 is alive true true
这时候master, Pid1, Pid2应该都是v0.2的版本,看看是不是这样:
"HI"
(foo@localhost)3> Pid1, Pid2 is alive true true
(foo@localhost)3> Worker "0.2" at <0.37.0> also receive message: "HI"
(foo@localhost)3> Worker "0.2" at <0.38.0> also receive message: "HI"
OK,改成了 "0.2" 了,代码替换成功。事情还没有完,留意上面将 Pid2 ! refresh 给注释掉了么,这时候更新会怎样呢?先修改程序:
- -define(VERSION, "0.2" ). --> -define(VERSION, "0.3" ).
-define(VERSION, "0.2"). --> -define(VERSION, "0.3").
- %Pid2 ! refresh, --> Pid2 ! refresh,
%Pid2 ! refresh, --> Pid2 ! refresh,
再次编译:
{ok,codereload}
发送更新通知:
refresh
(foo@localhost)4> Worker code <0.37.0> reload
(foo@localhost)4> Pid1, Pid2 is alive true true
可见,现在只有Pid1收到了更新通知进行了更新,会产生怎样的结果呢:
"HI"
(foo@localhost)4> Pid1, Pid2 is alive true true
(foo@localhost)4> Worker "0.3" at <0.37.0> also receive message: "HI"
(foo@localhost)4> Worker "0.2" at <0.38.0> also receive message: "HI"
噢,Pid2还停留在v0.2的代码上,可见各个进程的代码是独立的。
发送更新通知,让Pid2加载v0.3的代码:
refresh
(foo@localhost)4> Worker code <0.37.0> reload
(foo@localhost)4> Worker code <0.38.0> reload
(foo@localhost)4> Pid1, Pid2 is alive true true
这回Pid2更新到新版本了:
- (bar @localhost ) 9 > {master, 'foo@localhost' } ! "HI" .
- "HI"
(bar@localhost)9> {master, 'foo@localhost'} ! "HI".
"HI"
(foo@localhost)4> Pid1, Pid2 is alive true true
(foo@localhost)4> Worker "0.3" at <0.37.0> also receive message: "HI"
(foo@localhost)4> Worker "0.3" at <0.38.0> also receive message: "HI"
看到头晕了么?我也晕了,还有一种情况呢。在上面,
(foo@localhost)4> Pid1, Pid2 is alive true true
(foo@localhost)4> Worker "0.3" at <0.37.0> also receive message: "HI"
(foo@localhost)4> Worker "0.2" at <0.38.0> also receive message: "HI"
这步,如果不输入
而是再次编译:
{ok,codereload}
发送个消息过去看看:
- (bar @localhost ) 9 > {master, 'foo@localhost' } ! "HI" .
- "HI"
(bar@localhost)9> {master, 'foo@localhost'} ! "HI".
"HI"
(foo@localhost)5> Pid1, Pid2 is alive true false
(foo@localhost)5> Worker "0.3" at <0.37.0> also receive message: "HI"
噢,怎么回事,is_process_alive(Pid2)返回false了,进程怎么挂了?
《Erlang Reference Manual》的12.3:
Bot old and current code is valid, and may be evaluated concurrently. Fully qualified function calls always refer to current code. Old code may still be evaluated because of processes lingering in the old code.
If a third instance of the module is loaded, the code server will remove (purge) the old code and any processes lingering in it will be terminated. Then the third instance becomes 'current' and the previously current code becomes 'old'.
可见,Erlang里面,模块的代码只有新旧两个版本,当时Pid1对应于v0.3(Current),Pid2对应于v0.2(Old),当进行编译之后,Pid1的代码就是Old,而Pid2就被强制终止了。
对于以守护进程形式启动的Erlang进程,就不能在shell里面直接编译了,如果在外部编译,这个进程是不认的。我想到的一个方法就是使用rpc来调用编译:
rpc:call('foo@localhost', shell_default, c, [codereload]).
对于热部署,Erlang还有一种更强大的形式,就是使用OTP的Release Handling,我也不会,以后再学习了。