最近弄了个wcf的监控服务,偶尔监控到目标服务会报一个目标积极拒绝的错误。一开始以为服务停止了,上服务器检查目标服务好好的活着。于是开始查原因。
一般来说目标积极拒绝(TCP 10061)的异常主要是2种可能:
1:服务器关机或者服务关闭
2:Client调用的端口错误或者服务器防火墙没开相应的端口
但是我们的服务本身是可以调用的,只是偶尔报这个错误,说明并不是这2个问题造成的。继续google,在stackoverflow上看到这样一篇:传送门
1
2
3
4
5
6
7
8
9
10
11
12
13
|
If
this
happens always, it literally means that the machine exists but that it has no services listening
on
the specified port, or there
is
a firewall stopping you.
If it happens occasionally - you used the word
"sometimes"
- and retrying succeeds, it
is
likely because the server has a full
'<strong>backlog</strong>'
.
When you are waiting to be accepted
on
a listening socket, you are placed
in
a backlog. This backlog
is
finite and quite
short
- values of 1, 2 or 3 are not unusual - and so the OS might be unable to queue your request
for
the
'accept'
to consume.
The backlog
is
a parameter
on
the listen function - all languages and platforms have basically the same API
in
this
regard, even the C# one. This parameter
is
often configurable
if
you control the server, and
is
likely read
from
some settings file or the registry. Investigate how to configure your server.
If you wrote the server, you might have heavy processing
in
the accept of your socket, and
this
can be better moved to a separate worker-thread so your accept
is
always ready to receive connections. There are various architecture choices you can explore that mitigate queuing up clients and processing them sequentially.
Regardless of whether you can increase the server backlog, you
do
need retry logic
in
your client code to cope with
this
issue -
as
even with a
long
backlog the server might be receiving lots of other requests
on
that port at that time.
There
is
a rare possibility
where
a NAT router would give
this
error should it's ports
for
mappings be exhausted. I think we can discard
this
possibility
as
too much of a
long
shot though, since the router has 64K simultaneous connections to the same destination address/port before exhaustion.
|
大概意思就是如果这个错误是一直发生的那么可能是服务器或者防火墙的问题,如果这个问题是“Sometime”发生的,那么可能是backlog的问题。backlog是tcp层面的请求队列,当你调用socket发起请求的时候服务端会排成一个队列,在高并发情况下服务端来不及处理请求,那么有些请求就被直接被丢弃,于是就报了目标积极拒绝TCP10061的异常。
有了backlog于是继续google关键字“WCF backlog”发现wcf binding配置确实有一个listenBacklog的项目,默认值是10,于是把服务的listenBacklog改成100,问题搞定。
对了添加listenBacklog属性的时候有个注意的是一定要移除一个默认的endpoint <endpoint address="mex" binding="mexTcpBinding" bindingConfiguration="" contract="IMetadataExchange" />这个endpoint是用来给vs等发现元数据用的,如果这个不移走启动服务的时候会报端口已经被监听的错误。
参考:
https://msdn.microsoft.com/en-us/library/ee377061(v=bts.10).aspx