多线程池是我们最常用的并行编程工具,多线程是性能优化在多核处理器时代是最常用的手段。而线程池是处理并发请求和任务的常用方法,使用线程池可以减少在创建和销毁线程上所花的时间以及系统资源的开销,解决系统资源利用不足的问题,创建一个线程池来并发的任务看起来非常简单,其实线程池的参数是很有讲究的。
以 Java 为例,一个标准的线程池创建方法如下:
/** Thread Pool Executor */
public ThreadPoolExecutor(
int corePoolSize, //核心线程数
int maxPoolSize, //最大线程数
long keepAliveTime, //存活时间,超过corePoolSize的空闲线程在此时间之后会被回收
TimeUnit unit, //存活时间单位
BlockingQueue<Runnable> workQueue//阻塞的任务队列
RejectedExecutionHandler handler //当队列已满,线程数已达maxPoolSize时的策略
) {...}
虽然JDK 提供了一些默认实现,比如:
- static ExecutorService newCachedThreadPool()
- static ExecutorService newFixedThreadPool(int nThreads)
- static ScheduledExecutorService newScheduledThreadPool(int corePoolSize)
这些线程池并不能满足不了各种各样的业务场景,我们要为 ThreadPoolExecutor 设置更加合理的线程池参数来达到最优,以满足应用的性能需求。
1. 根据经验和通式公式按需求设置相对合理的参数
拿线程数来说, 我们需要考虑线程数设置多少才合适, 这个取决于诸多因素:
- 服务器的 CPU 资源。
- 取决任务的类型和其消耗的资源情况。
如果任务是读写数据库, 那么它取决于数据库连接池的连接数目, 以及数据库的负载和性能, 而如果任务是通过网络访问第三方服务,那么它取决于网络负载大小,以及第三方服务的负载和性能。
通常来说,CPU 密集型的任务占用CPU 时间较长,线程数可以设置的小一点, I/O密集型的任务占用CPU时间较短,线程数可以设的大一点。
我们的目的是充分利用给到我们的 CPU 资源,如果线程的任务有很多等待时间,比如等待磁盘和网络I/O,那么就把线程数设多一点,如果任务本身非常耗费CPU的计算资源,CPU 处理时间较长,那么就把线程数设得小一点。
根据以下公式
线程数 = CPU核数 * 希望的CPU使用率 * (1 + 等待时间/处理时间)
假设我们的服务器为4核CPU,我们要创建一个线程池来发送度量数据指标到远端的 Kafka 上,网络延迟约为50ms,数据解析编码压缩时间大约5ms,CPU占用率希望在10%之内。根据下面的计算结果,得出我们需要4.4, 约5个线程
4 * 0.1 * (1 + 50 / 5) = 4.4
于是, 我们设置参数如下:
参数 | 赋值 | 解释 |
---|---|---|
int corePoolSize | 5 | 核心线程数 |
int maxPoolSize | 10 | 最大线程数 |
long keepAliveTime | 5000 | 线程保活时间,超过核心线程数的空闲线程在此时间之后会被回收,这个值设长一点有利于避免频繁的创建和销毁线程 |
TimeUnit unit | TimeUnit.MILLISECOND | 保活时间的单位, 这里用毫秒 |
BlockingQueue<Runnable> workQueue | new LinkedBlockingQueue(500) | 暂存线程任务的阻塞队列,先入先出的场景就用LinkedBlockingQueue 好了 |
ThreadFactory threadfactory | new DefaultThreadFactory() | 线程创建工厂 |
RejectedExecutionHandler handler | new DefaultRejectedExecutionHandler() | 当线程队列和线程数已满,或者线程池关闭,对新任务的拒绝服务策略,内置的有4种策略: 1) AbortPolicy, 2) CallerRunsPolicy, 3) DiscardPolicy, 4) DiscardOldestPolicy |
2. 根据度量指标进行调整
为了进行充分的度量,我们必需对线程池的各种指标进行记录和展示。
先来简单了解一些度量术语,详情参见https://metrics.dropwizard.io/4.1.2/manual/core.html
MetricRegistry
各种度量数据的容器,类似于 windows 的系统注册表,各项度量数据都可以在其中进行注册。
度量类型
Gauge 计量器,它代表一个瞬时变化的值,比如连接数,线程数等
Counter 计数器,它代表一个连续变化的值,比如线程队列长度,不会突变,但是会递增或递减
Meter 测量仪, 它用来统计基于时间单位的处理速率,比如TPS(每秒事务数), DAU(日均活跃用户)等
Timer 计时器,它用来统计所花费时间的统计分布值,比如线程的忙闲程度,平均响应时间等
线程相关度量指标
- 线程数: 最大,最小和实时的线程数
- 线程队列长度: 最大长度限制和实时长度
- 任务处理速率:任务提交与完成速度
- 任务运行数量
- 线程的忙闲比
- 任务被拒绝的数量
- 任务在队列中等待的时间:最大和实时的等待时间
- 超过最大等待时间的任务数量
线程的度量与监控的方法
创建线程池并注册各项度量指标
运行线程池并收集度量指标
观察度量指标并相应地调整参数
线程的度量与监控的实例
我们可以应用 dropwizard 的 metrics 库中的 https://metrics.dropwizard.io/ 类库 InstrumentedExecutorService 来帮助我们进行上述指标的统计,部分关键代码如下:
InstrumentedExecutorService
public class InstrumentedExecutorService implements ExecutorService {
private static final AtomicLong NAME_COUNTER = new AtomicLong();
private final ExecutorService delegate;
private final Meter submitted;
private final Counter running;
private final Meter completed;
private final Timer idle;
private final Timer duration;
public InstrumentedExecutorService(ExecutorService delegate, MetricRegistry registry) {
this(delegate, registry, "instrumented-delegate-" + NAME_COUNTER.incrementAndGet());
}
public InstrumentedExecutorService(ExecutorService delegate, MetricRegistry registry, String name) {
this.delegate = delegate;
this.submitted = registry.meter(MetricRegistry.name(name, new String[]{"submitted"}));
this.running = registry.counter(MetricRegistry.name(name, new String[]{"running"}));
this.completed = registry.meter(MetricRegistry.name(name, new String[]{"completed"}));
this.idle = registry.timer(MetricRegistry.name(name, new String[]{"idle"}));
this.duration = registry.timer(MetricRegistry.name(name, new String[]{"duration"}));
}
//...
private class InstrumentedRunnable implements Runnable {
private final Runnable task;
private final Timer.Context idleContext;
InstrumentedRunnable(Runnable task) {
this.task = task;
this.idleContext = idle.time();
}
@Override
public void run() {
idleContext.stop();
running.inc();
try (Timer.Context durationContext = duration.time()) {
task.run();
} finally {
running.dec();
completed.mark();
}
}
}
}
它通过装饰器模式对原来的 Executor Service 进行包装,记录了 submited, running, completed, idle , duration 这些指标,我们可以另外再记录一些指标,部分代码如下:
1) 先定义一个线程池参数对象
package com.github.walterfan.helloconcurrency;
import com.codahale.metrics.Gauge;
import com.codahale.metrics.MetricRegistry;
import lombok.Builder;
import lombok.Getter;
import lombok.Setter;
import java.time.Duration;
/**
* @Author: Walter Fan
**/
@Getter
@Setter
@Builder
public class ThreadPoolParam {
private int minPoolSize;
private int maxPoolSize;
private Duration keepAliveTime;
private int queueSize;
private String threadPrefix;
private boolean daemon;
private MetricRegistry metricRegistry;
}
2) 再写一个创建线程池的工具类:
package com.github.walterfan.helloconcurrency;
import com.codahale.metrics.Counter;
import com.codahale.metrics.Gauge;
import com.codahale.metrics.InstrumentedExecutorService;
import com.codahale.metrics.Meter;
import com.codahale.metrics.MetricRegistry;
import com.google.common.util.concurrent.ThreadFactoryBuilder;
import lombok.extern.slf4j.Slf4j;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.RejectedExecutionHandler;
import java.util.concurrent.ThreadFactory;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
import java.util.function.Supplier;
import static com.codahale.metrics.MetricRegistry.name;
/**
* @Author: Walter Fan
**/
@Slf4j
public class ThreadPoolUtil {
public static class DiscardAndLogPolicy implements RejectedExecutionHandler {
final MetricRegistry metricRegistry;
final Meter rejectedMeter;
final Counter rejectedCounter;
public DiscardAndLogPolicy(String threadPrefix, MetricRegistry metricRegistry) {
this.metricRegistry = metricRegistry;
this.rejectedMeter = metricRegistry.meter(threadPrefix + ".rejected-meter");
this.rejectedCounter = metricRegistry.counter(threadPrefix + ".rejected-counter");
}
public void rejectedExecution(Runnable r, ThreadPoolExecutor e) {
if (!e.isShutdown()) {
e.getQueue().poll();
e.execute(r);
}
}
}
public static ThreadPoolExecutor createThreadExecutor(ThreadPoolParam threadPoolParam) {
MetricRegistry metricRegistry = threadPoolParam.getMetricRegistry();
metricRegistry.register(threadPoolParam.getThreadPrefix() + ".min", createIntGauge(() -> threadPoolParam.getMinPoolSize()));
metricRegistry.register(threadPoolParam.getThreadPrefix() + ".max", createIntGauge(() -> threadPoolParam.getMaxPoolSize()));
metricRegistry.register(threadPoolParam.getThreadPrefix() + ".queue_limitation", createIntGauge(() -> threadPoolParam.getQueueSize()));
ThreadPoolExecutor executor = new ThreadPoolExecutor(threadPoolParam.getMinPoolSize(),
threadPoolParam.getMaxPoolSize(),
threadPoolParam.getKeepAliveTime().toMillis(),
TimeUnit.MILLISECONDS,
new LinkedBlockingQueue<Runnable>(threadPoolParam.getQueueSize()),
createThreadFactory(threadPoolParam),
createRejectedExecutionHandler(threadPoolParam));
metricRegistry.register(threadPoolParam.getThreadPrefix() + ".pool_size", createIntGauge(() -> executor.getPoolSize()));
metricRegistry.register(threadPoolParam.getThreadPrefix() + ".queue_size", createIntGauge(() -> executor.getQueue().size()));
return executor;
}
public static ExecutorService createExecutorService(ThreadPoolParam threadPoolParam) {
ThreadPoolExecutor executor = createThreadExecutor(threadPoolParam);
return new InstrumentedExecutorService(executor,
threadPoolParam.getMetricRegistry(),
threadPoolParam.getThreadPrefix());
}
private static Gauge<Integer> createIntGauge(Supplier<Integer> suppier) {
return () -> suppier.get();
}
public static ThreadFactory createThreadFactory(ThreadPoolParam threadPoolParam) {
return new ThreadFactoryBuilder()
.setDaemon(threadPoolParam.isDaemon())
.setNameFormat(threadPoolParam.getThreadPrefix() + "-%d")
.build();
}
public static RejectedExecutionHandler createRejectedExecutionHandler(ThreadPoolParam threadPoolParam) {
return new DiscardAndLogPolicy(threadPoolParam.getThreadPrefix(), threadPoolParam.getMetricRegistry());
}
}
3)以我们最常用的洗扑克牌为例,分别用冒泡排序,插入排序和 JDK 自带的 TimSort 来对若干副牌排序,这些任务都放入线程池中执行,当我们采用不同的线程池参数时,效果大不相同。
3.1) 扑克牌对象类
package com.github.walterfan.helloconcurrency;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
/**
* @Author: Walter Fan
* @Date: 21/2/2020, Fri
**/
public class Poker {
public static class Card {
enum Suite {
Spades(4), Hearts(3), Clubs(2), Diamonds(1);
int value;
Suite(int value) {
this.value = value;
}
private static Map<Integer, Suite> valueMap = new HashMap<>();
static {
for (Suite suite : Suite.values()) {
valueMap.put(suite.value, suite);
}
}
public static Suite valueOf(int pageType) {
return valueMap.get(pageType);
}
}
Suite suite;
//1~13
int point;
public Card(int suiteValue, int point) {
this.suite = Suite.valueOf(suiteValue);
this.point = point;
}
public String toString() {
String strPoint = Integer.toString(point);
if (point > 10) {
switch (point) {
case 11:
strPoint = "J";
break;
case 12:
strPoint = "Q";
break;
case 13:
strPoint = "K";
break;
}
}
return suite.name() + ":" + strPoint;
}
public int getScore() {
return suite.value * 100 + point;
}
}
public static List<Card> createCardList(int suiteCount) {
List<Card> cards = new ArrayList<>(52);
for(int i = 1; i < 5; i++) {
for(int j = 1; j < 14 ;++j) {
cards.add(new Card(i, j));
}
}
if(suiteCount > 1) {
for(int j = 0; j < suiteCount - 1; j++) {
cards.addAll(new ArrayList<>(cards));
}
}
Collections.shuffle(cards);
return cards;
}
public static class CardComparator implements Comparator<Card> {
@Override
public int compare(Card o1, Card o2) {
return o1.getScore() - o2.getScore();
}
}
}
3.2) 排序任务类
package com.github.walterfan.helloconcurrency;
import com.google.common.base.Stopwatch;
import lombok.extern.slf4j.Slf4j;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.stream.Collectors;
import static java.util.concurrent.TimeUnit.MILLISECONDS;
/**
* @Author: Walter Fan
* @Date: 27/3/2020, Fri
**/
@Slf4j
public class SortCardTask implements Callable<Long> {
public enum SortMethod { BUBBLE_SORT, INSERT_SORT, TIM_SORT}
private final List<Poker.Card> cards;
private SortMethod sortMethod;
public SortCardTask(List<Poker.Card> cards, SortMethod method) {
this.cards = cards;
this.sortMethod = method;
}
@Override
public Long call() {
Stopwatch stopwatch = Stopwatch.createStarted();
switch(sortMethod) {
case BUBBLE_SORT:
bubbleSort(cards, new Poker.CardComparator());
break;
case INSERT_SORT:
insertSort(cards, new Poker.CardComparator());
break;
case TIM_SORT:
timSort(cards, new Poker.CardComparator());
break;
}
stopwatch.stop();
long millis = stopwatch.elapsed(MILLISECONDS);
log.info("{} cards sort by {} spend {} milliseconds - {}" , cards.size(), sortMethod, millis, stopwatch); // formatted string like "12.3 ms"
return millis;
}
public static <T> void bubbleSort(List<T> aList, Comparator<T> comparator) {
boolean sorted = false;
int loopCount = aList.size() - 1;
while (!sorted) {
sorted = true;
for (int i = 0; i < loopCount; i++) {
if (comparator.compare(aList.get(i), aList.get(i + 1)) > 0) {
Collections.swap(aList, i, i + 1);
sorted = false;
}
}
}
}
public static <T> void insertSort(List<T> aList, Comparator<T> comparator) {
int size = aList.size();
for (int i = 1; i < size; ++i) {
T selected = aList.get(i);
if (size < 10) {
log.info("{} insert to {}", selected, aList.subList(0, i).stream().map(String::valueOf).collect(Collectors.joining(", ")));
}
int j = i - 1;
//find a position for insert currentElement in the left sorted collection
while (j >= 0 && comparator.compare(selected, aList.get(j)) < 0) {
//it does not overwrite existed element because the j+1=i that is currentElement at beginging
aList.set(j + 1, aList.get(j));
j--;
}
aList.set(j + 1, selected);
}
}
public static <T> void timSort(List<T> aList, Comparator<T> comparator) {
aList.stream().sorted(comparator).collect(Collectors.toList());
}
}
3.3) 线程池演示类
package com.github.walterfan.helloconcurrency;
import com.codahale.metrics.Slf4jReporter;
import com.codahale.metrics.MetricRegistry;
import com.google.common.base.Stopwatch;
import lombok.extern.slf4j.Slf4j;
import java.time.Duration;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
/**
* @Author: Walter Fan
* @Date: 23/3/2020, Mon
**/
@Slf4j
public class ThreadPoolDemo {
private ExecutorService executorService;
public ThreadPoolDemo(ThreadPoolParam threadPoolParam) {
executorService = ThreadPoolUtil.createExecutorService(threadPoolParam);
}
public Callable<Long> createTask(int cardSuiteCount, SortCardTask.SortMethod method) {
List<Poker.Card> cards = Poker.createCardList(cardSuiteCount);
return new SortCardTask(cards, method);
}
public List<Future<Long>> exeucteTasks(List<Callable<Long>> tasks) {
try {
return this.executorService.invokeAll(tasks);
} catch (InterruptedException e) {
log.warn("invokeAll interrupt", e);
return Collections.emptyList();
}
}
public void waitUntil(long ms) {
executorService.shutdown();
try {
if (!executorService.awaitTermination(ms, TimeUnit.MILLISECONDS)) {
executorService.shutdownNow();
}
} catch (InterruptedException ex) {
executorService.shutdownNow();
Thread.currentThread().interrupt();
}
}
public static void main(String[] args) {
Stopwatch stopwatch = Stopwatch.createStarted();
log.info("--- start ---");
MetricRegistry metricRegistry = new MetricRegistry();
final Slf4jReporter logReporter = Slf4jReporter.forRegistry(metricRegistry)
.outputTo(log)
.convertRatesTo(TimeUnit.SECONDS)
.convertDurationsTo(TimeUnit.MILLISECONDS)
.build();
logReporter.start(1, TimeUnit.MINUTES);
ThreadPoolParam threadPoolParam = ThreadPoolParam.builder()
.minPoolSize(4)
.maxPoolSize(8)
.daemon(true)
.keepAliveTime(Duration.ofSeconds(1))
.queueSize(2)
.threadPrefix("cards-thread-pool")
.metricRegistry(metricRegistry)
.build();
ThreadPoolDemo demo = new ThreadPoolDemo(threadPoolParam);
List<Callable<Long>> tasks = new ArrayList<>();
tasks.add(demo.createTask(2, SortCardTask.SortMethod.BUBBLE_SORT));
tasks.add(demo.createTask(2, SortCardTask.SortMethod.INSERT_SORT));
tasks.add(demo.createTask(2, SortCardTask.SortMethod.TIM_SORT));
tasks.add(demo.createTask(4, SortCardTask.SortMethod.BUBBLE_SORT));
tasks.add(demo.createTask(4, SortCardTask.SortMethod.INSERT_SORT));
tasks.add(demo.createTask(4, SortCardTask.SortMethod.TIM_SORT));
tasks.add(demo.createTask(8, SortCardTask.SortMethod.BUBBLE_SORT));
tasks.add(demo.createTask(8, SortCardTask.SortMethod.INSERT_SORT));
tasks.add(demo.createTask(8, SortCardTask.SortMethod.TIM_SORT));
List<Future<Long>> results = demo.exeucteTasks(tasks);
logReporter.report();
stopwatch.stop();
log.info("--- end {} ---", stopwatch);
}
}
执行结果如下, 我用 Slf4jReporter 把若干度量指标打印到日志中
22:12:31.553 [main] INFO com.github.walterfan.helloconcurrency.ThreadPoolDemo - --- start ---
22:12:31.706 [cards-thread-pool-1] INFO com.github.walterfan.helloconcurrency.SortCardTask - 104 cards sort by INSERT_SORT spend 2 milliseconds - 2.577 ms
22:12:31.707 [cards-thread-pool-0] INFO com.github.walterfan.helloconcurrency.SortCardTask - 104 cards sort by BUBBLE_SORT spend 4 milliseconds - 4.045 ms
22:12:31.713 [cards-thread-pool-2] INFO com.github.walterfan.helloconcurrency.SortCardTask - 104 cards sort by TIM_SORT spend 9 milliseconds - 9.640 ms
22:12:31.713 [cards-thread-pool-1] INFO com.github.walterfan.helloconcurrency.SortCardTask - 416 cards sort by TIM_SORT spend 1 milliseconds - 1.005 ms
22:12:31.719 [cards-thread-pool-0] INFO com.github.walterfan.helloconcurrency.SortCardTask - 416 cards sort by INSERT_SORT spend 6 milliseconds - 6.890 ms
22:12:31.721 [cards-thread-pool-6] INFO com.github.walterfan.helloconcurrency.SortCardTask - 6656 cards sort by TIM_SORT spend 18 milliseconds - 18.15 ms
22:12:31.722 [cards-thread-pool-3] INFO com.github.walterfan.helloconcurrency.SortCardTask - 416 cards sort by BUBBLE_SORT spend 18 milliseconds - 18.82 ms
22:12:31.768 [cards-thread-pool-5] INFO com.github.walterfan.helloconcurrency.SortCardTask - 6656 cards sort by INSERT_SORT spend 64 milliseconds - 64.71 ms
22:12:32.086 [cards-thread-pool-4] INFO com.github.walterfan.helloconcurrency.SortCardTask - 6656 cards sort by BUBBLE_SORT spend 383 milliseconds - 383.1 ms
22:12:32.088 [main] INFO com.github.walterfan.helloconcurrency.ThreadPoolDemo - type=GAUGE, name=cards-thread-pool.max, value=8
22:12:32.089 [main] INFO com.github.walterfan.helloconcurrency.ThreadPoolDemo - type=GAUGE, name=cards-thread-pool.min, value=4
22:12:32.089 [main] INFO com.github.walterfan.helloconcurrency.ThreadPoolDemo - type=GAUGE, name=cards-thread-pool.pool_size, value=7
22:12:32.089 [main] INFO com.github.walterfan.helloconcurrency.ThreadPoolDemo - type=GAUGE, name=cards-thread-pool.queue_limitation, value=2
22:12:32.089 [main] INFO com.github.walterfan.helloconcurrency.ThreadPoolDemo - type=GAUGE, name=cards-thread-pool.queue_size, value=0
22:12:32.090 [main] INFO com.github.walterfan.helloconcurrency.ThreadPoolDemo - type=COUNTER, name=cards-thread-pool.rejected-counter, count=0
22:12:32.090 [main] INFO com.github.walterfan.helloconcurrency.ThreadPoolDemo - type=COUNTER, name=cards-thread-pool.running, count=0
22:12:32.093 [main] INFO com.github.walterfan.helloconcurrency.ThreadPoolDemo - type=METER, name=cards-thread-pool.completed, count=9, m1_rate=0.0, m5_rate=0.0, m15_rate=0.0, mean_rate=22.244898287056326, rate_unit=events/second
22:12:32.093 [main] INFO com.github.walterfan.helloconcurrency.ThreadPoolDemo - type=METER, name=cards-thread-pool.rejected-meter, count=0, m1_rate=0.0, m5_rate=0.0, m15_rate=0.0, mean_rate=0.0, rate_unit=events/second
22:12:32.093 [main] INFO com.github.walterfan.helloconcurrency.ThreadPoolDemo - type=METER, name=cards-thread-pool.submitted, count=9, m1_rate=0.0, m5_rate=0.0, m15_rate=0.0, mean_rate=22.225106382509, rate_unit=events/second
22:12:32.104 [main] INFO com.github.walterfan.helloconcurrency.ThreadPoolDemo - type=TIMER, name=cards-thread-pool.duration, count=9, min=1.198973, max=383.426986, mean=58.572064601980294, stddev=117.07645143134481, p50=9.994637, p75=19.103525, p95=383.426986, p98=383.426986, p99=383.426986, p999=383.426986, m1_rate=0.0, m5_rate=0.0, m15_rate=0.0, mean_rate=21.905169236599374, rate_unit=events/second, duration_unit=milliseconds
22:12:32.104 [main] INFO com.github.walterfan.helloconcurrency.ThreadPoolDemo - type=TIMER, name=cards-thread-pool.idle, count=9, min=0.481746, max=11.00177, mean=3.218059222222222, stddev=4.171958329347576, p50=1.105362, p75=1.84808, p95=11.00177, p98=11.00177, p99=11.00177, p999=11.00177, m1_rate=0.0, m5_rate=0.0, m15_rate=0.0, mean_rate=21.86126775357198, rate_unit=events/second, duration_unit=milliseconds
22:12:32.105 [main] INFO com.github.walterfan.helloconcurrency.ThreadPoolDemo - --- end 554.3 ms ---
试着调整其中的线程参数,你会发现执行效果大不相同,总的时间差别很大。
以后有时间我们再看看如何用 JMX Reporter 来监控线程池,用 I/O 密集型的任务进行观察。