问题描述:
Ribbon的权重响应时间策略(WeightedResponseTimeRule),是根据服务响应时间分配权重,响应时间越长,权重越小,被选
中的可能性越低。
WeightedResponseTimeRule的choose方法的关键代码如下:
// last one in the list is the sum of all weights
double maxTotalWeight = currentWeights.size() == 0 ? 0 : currentWeights.get(currentWeights.size() - 1);
// No server has been hit yet and total weight is not initialized
// fallback to use round robin
if (maxTotalWeight < 0.001d) {
server = super.choose(getLoadBalancer(), key);
if(server == null) {
return server;
}
} else {
// generate a random weight between 0 (inclusive) to maxTotalWeight (exclusive)
double randomWeight = random.nextDouble() * maxTotalWeight;
// pick the server index based on the randomIndex
int n = 0;
for (Double d : currentWeights) {
if (d >= randomWeight) {
serverIndex = n;
break;
} else {
n++;
}
}
server = allList.get(serverIndex);
}
当响应时间没有更新时,会默认走轮询策略;响应时间更新后,会根据权重取服务。现在的情况是,当我指定权重响应时间策略时,ribbon总是通过轮询取服务。
问题分析:
通过debug跟踪发现是rxjava的版本导致的。这里我以Ribbon的Example来说明一下:
package com.netflix.ribbon.examples.loadbalancer;
import com.google.common.collect.Lists;
import com.netflix.client.DefaultLoadBalancerRetryHandler;
import com.netflix.client.RetryHandler;
import com.netflix.loadbalancer.*;
import com.netflix.loadbalancer.reactive.LoadBalancerCommand;
import com.netflix.loadbalancer.reactive.ServerOperation;
import rx.Observable;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.List;
public class URLConnectionLoadBalancer {
private final ILoadBalancer loadBalancer;
// retry handler that does not retry on same server, but on a different server
private final RetryHandler retryHandler = new DefaultLoadBalancerRetryHandler(0, 1, true);
public URLConnectionLoadBalancer(List<Server> serverList) {
loadBalancer = LoadBalancerBuilder.newBuilder().buildFixedServerListLoadBalancer(serverList);
}
public String call(final String path) throws Exception {
return LoadBalancerCommand.<String>builder()
.withLoadBalancer(loadBalancer)
.build()
.submit(new ServerOperation<String>() {
public Observable<String> call(Server server) {
URL url;
try {
url = new URL("http://" + server.getHost() + ":" + server.getPort() + path);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
return Observable.just(conn.getResponseMessage());
} catch (Exception e) {
return Observable.error(e);
}
}
}).toBlocking().first();
}
public LoadBalancerStats getLoadBalancerStats() {
return ((BaseLoadBalancer) loadBalancer).getLoadBalancerStats();
}
public static void main(String[] args) throws Exception {
URLConnectionLoadBalancer urlLoadBalancer = new URLConnectionLoadBalancer(Lists.newArrayList(
new Server("www.baidu.com", 80),
new Server("github.com", 80),
new Server("blog.csdn.net", 80)));
for (int i = 0; i < 6; i++) {
System.out.println(urlLoadBalancer.call("/"));
}
System.out.println("=== Load balancer stats ===");
System.out.println(urlLoadBalancer.getLoadBalancerStats());
}
}
打印的日志:
=== Load balancer stats ===
Zone stats: {},Server stats: [[Server:github.com:80; Zone:UNKNOWN; Total Requests:2; Successive connection failure:0; Total blackout seconds:0; Last connection made:Tue Aug 30 22:15:32 CST 2016; First connection made: Tue Aug 30 22:15:31 CST 2016; Active Connections:0; total failure count in last (1000) msecs:0; average resp time:895.0; 90 percentile resp time:979.0; 95 percentile resp time:979.0; min resp time:811.0; max resp time:979.0; stddev resp time:84.0]
, [Server:www.baidu.com:80; Zone:UNKNOWN; Total Requests:2; Successive connection failure:0; Total blackout seconds:0; Last connection made:Tue Aug 30 22:15:33 CST 2016; First connection made: Tue Aug 30 22:15:32 CST 2016; Active Connections:0; total failure count in last (1000) msecs:0; average resp time:162.0; 90 percentile resp time:164.0; 95 percentile resp time:164.0; min resp time:160.0; max resp time:164.0; stddev resp time:2.0]
, [Server:blog.csdn.net:80; Zone:UNKNOWN; Total Requests:2; Successive connection failure:0; Total blackout seconds:0; Last connection made:Tue Aug 30 22:15:33 CST 2016; First connection made: Tue Aug 30 22:15:32 CST 2016; Active Connections:0; total failure count in last (1000) msecs:0; average resp time:90.0; 90 percentile resp time:99.0; 95 percentile resp time:99.0; min resp time:81.0; max resp time:99.0; stddev resp time:9.0]
]
2、
Ribbon版本2.2.0,rxjava版本1.1.5
打印的日志:
=== Load balancer stats ===
Zone stats: {},Server stats: [[Server:github.com:80; Zone:UNKNOWN; Total Requests:0; Successive connection failure:0; Total blackout seconds:0; Last connection made:Tue Aug 30 22:27:20 CST 2016; First connection made: Tue Aug 30 22:27:19 CST 2016; Active Connections:2; total failure count in last (1000) msecs:0; average resp time:0.0; 90 percentile resp time:0.0; 95 percentile resp time:0.0; min resp time:0.0; max resp time:0.0; stddev resp time:0.0]
, [Server:www.baidu.com:80; Zone:UNKNOWN; Total Requests:0; Successive connection failure:0; Total blackout seconds:0; Last connection made:Tue Aug 30 22:27:21 CST 2016; First connection made: Tue Aug 30 22:27:20 CST 2016; Active Connections:2; total failure count in last (1000) msecs:0; average resp time:0.0; 90 percentile resp time:0.0; 95 percentile resp time:0.0; min resp time:0.0; max resp time:0.0; stddev resp time:0.0]
, [Server:blog.csdn.net:80; Zone:UNKNOWN; Total Requests:0; Successive connection failure:0; Total blackout seconds:0; Last connection made:Tue Aug 30 22:27:21 CST 2016; First connection made: Tue Aug 30 22:27:20 CST 2016; Active Connections:2; total failure count in last (1000) msecs:0; average resp time:0.0; 90 percentile resp time:0.0; 95 percentile resp time:0.0; min resp time:0.0; max resp time:0.0; stddev resp time:0.0]
]
可以看到状态的更新出现了异常(平均响应时间没有得到更新)。
return operation.call(server).doOnEach(new Observer<T>() {
private T entity;
@Override
public void onCompleted() {
recordStats(tracer, stats, entity, null);
// TODO: What to do if onNext or onError are never called?
}
@Override
public void onError(Throwable e) {
recordStats(tracer, stats, null, e);
logger.debug("Got error {} when executed on server {}", e, server);
if (listenerInvoker != null) {
listenerInvoker.onExceptionWithServer(e, context.toExecutionInfo());
}
}
@Override
public void onNext(T entity) {
this.entity = entity;
if (listenerInvoker != null) {
listenerInvoker.onExecutionSuccess(entity, context.toExecutionInfo());
}
}
private void recordStats(Stopwatch tracer, ServerStats stats, Object entity, Throwable exception) {
tracer.stop();
loadBalancerContext.noteRequestCompletion(stats, entity, exception, tracer.getDuration(TimeUnit.MILLISECONDS), retryHandler);
}
});
可以看到onCompleted方法有对状态的记录操作,onNext没有对状态进行任何操作。1.0.10版本正常执行完会走
onCompleted方法,从而更新状态,1.1.5版本不会进入onCompleted方法。
public void request(long n) {
if(!this.once) {
if(n < 0L) {
throw new IllegalStateException("n >= required but it was " + n);
} else if(n != 0L) {
this.once = true;
Subscriber a = this.actual;
if(!a.isUnsubscribed()) {
Object v = this.value;
try {
a.onNext(v);
} catch (Throwable var6) {
Exceptions.throwOrReport(var6, a, v);
return;
}
if(!a.isUnsubscribed()) {
a.onCompleted();
}
}
}
}
}
在调用onCompleted方法前会进行一次是否取消订阅的判断。
public void onNext(T i) {
if(!this.isUnsubscribed() && this.count++ < OperatorTake.this.limit) {
boolean stop = this.count == OperatorTake.this.limit;
child.onNext(i);
if(stop && !this.completed) {
this.completed = true;
try {
child.onCompleted();
} finally {
this.unsubscribe();
}
}
}
}
first方法会调用OperatorTake的call方法,在onNext方法执行完前会进行取消订阅的操作。
到这里,问题可以得到解决了。
这里提供两个解决方案:1.依赖1.0.10的版本;2.可以调用last方法或forEach方法。