在Java中，Influxdb2 写入数据时，线程及内存逐渐飙升的问题及解决方案

最新推荐文章于 2024-05-27 22:18:04 发布

置顶氵我是大明星

最新推荐文章于 2024-05-27 22:18:04 发布

阅读量2.5k

点赞数 1

文章标签： java servlet 开发语言

本文链接：https://blog.csdn.net/qq_30488445/article/details/130706004

版权

在Java中，Influxdb2 写入数据时，线程及内存逐渐飙升的问题及解决方案

influxDB的概念

InfluxDB 2.x 是一个开源的时序数据库，用于存储、查询和可视化时间序列数据。时序数据库是一种特殊的数据库，它专门用于处理按时间顺序排列的数据，例如工业传感器数据、应用程序日志等等。

InfluxDB 2.x 的数据模型由“桶”（bucket）、“测量”（measurement）、“标签”（tag）和“字段”（field）四个概念组成。其中，“桶”是存储数据的最高层级，相当于关系数据库中的数据库；“测量”相当于关系数据库中的表格，用于保存一类相似的数据；“标签”是用于对数据进行分类和过滤的元数据，类似于关系数据库中的索引；“字段”则用于保存实际的数据值。通过这些概念的组合，InfluxDB 2.x 可以方便地存储和查询时间序列数据。

使用influxDB

引入依赖

<dependency>
  <groupId>com.influxdb</groupId>
  <artifactId>influxdb-client-java</artifactId>
  <version>3.0.1</version>
  <scope>compile</scope>
</dependency>

配置文件influx2.properties，也可以直接写在*.yml中

influx2.url= http://localhost:8086
influx2.org= my_influxdb
influx2.token= BqtcXzzG1SzFzkR9w0bPAsjVKQ8pSxb-szoCFREtY8zPCi2FoNPMqqofTqsv4VDXWwYUgGb4PtHXT73AwymZlQ==
influx2.bucket= iot

读取配置文件

@Component
@ConfigurationProperties(prefix = "influx2")
@PropertySource(value = "influx2.properties")
public class InfluxDBProperties {

    private String url;

    private String token;

    private String org;

    private String bucket;

    public String getUrl() {
        return url;
    }

    public void setUrl(String url) {
        this.url = url;
    }

    public String getToken() {
        return token;
    }

    public void setToken(String token) {
        this.token = token;
    }

    public String getOrg() {
        return org;
    }

    public void setOrg(String org) {
        this.org = org;
    }

    public String getBucket() {
        return bucket;
    }

    public void setBucket(String bucket) {
        this.bucket = bucket;
    }

}

influxDB config

@Configuration
public class InfluxDBConfig {

    @Bean
    public InfluxDBClient influxDBClient() {
        return InfluxDBClientFactory.create();
    }
}

操作influxDB

可以直接使用 influxdb-client-java 包,这个包中的InfluxDBClient提供了多种操作influxDB的方法，实现由 Influx API Service 定义的 HTTP API ，如：

WriteApi 实现类，异步非阻塞 API，用于将时间序列数据写入 InfluxDB 2.0

WriteApiBlocking实现类，同步阻塞 API，用于将时间序列数据写入 InfluxDB 2.0

QueryApi实现类，实现查询 HTTP API 端点

DeleteApi实现类，用于从 InfluxDB 2.0 中删除时间序列数据的 API

使用方式

@Autowired
InfluxDBClient influxDBClient;
// 也可以这样
// private InfluxDBClient influxDBClient = SpringUtils.getBean(InfluxDBClient.class);
@Autowired
private InfluxDBProperties properties;
// 查询
StringBuffer buffer = new StringBuffer();
buffer.append("from(bucket: \"" + bucketName + "\") ");
buffer.append("|> range(start:" + start + ", stop:" + stop + ") ");
buffer.append("|> filter(fn: (r) => r._measurement == \"" + tableName + "\") ");
influxDBClient.getQueryApi().query(buffer.toString());
// 删除
DeletePredicateRequest deletePredicateRequest = new DeletePredicateRequest();
deletePredicateRequest.start(LocalDateTime.now().atOffset(ZoneOffset.ofHours(0)).minusDays(5));
deletePredicateRequest.stop(LocalDateTime.now().atOffset(ZoneOffset.ofHours(0)));
influxDBClient.getDeleteApi().delete(deletePredicateRequest, properties.getBucket(), properties.getOrg());
// 单个新增
Point point = Point.measurement(measurement).addFields(fields).time(Instant.now(), WritePrecision.NS);
influxDBClient.getWriteApi().writePoint(point);
// 批量新增
List list = new ArrayList<>();
for(int i=0;i<10000;i++){
	Point point = Point.measurement(measurement).addFields(fields).time(Instant.now(), WritePrecision.NS);
  list.add(point);
}
influxDBClient.getWriteApi().writePoints(list);

以上只是列举了基本的使用，更多使用请参考InfluxDBClient提供的接口。

写入数据时，线程及内存逐渐飙升的问题

定位线程来源

先看代码

  private InfluxDBClient influxDBClient = SpringUtils.getBean(InfluxDBClient.class);
	
	@Override
	public void run() {
		List<Point> list = new ArrayList<>();
		Set<String> devices = service.getCacheSet(host);
		int total = 0;
		for (String device : devices) {
			if (device.indexOf(UserConstants.REDIS_KEY_SEPARATOR) != -1) {
				Map<String, String> data = service.getCacheMap(device);

				Map<String, String> filteredMap = data.entrySet().stream()
						.filter(x -> !x.getKey().startsWith("_")).collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));

				Point point = InfluxUtils.convertPoint(device, filteredMap);
				total += filteredMap.size();
				list.add(point);
			}
		}
		// 写入influxDB
		influxDBClient.getWriteApi().writePoints(list);
	}

在使用任务对influxDB进行重复新增时，在JVM的线程栈中，出现了大量的 RxNewThreadScheduler 这个线程组，我们对代码进行排查，发现Job构建时，influxdb 插件会将统计数据，通过HTTP请求，存储到influxdb数据库中。Influxdb插件在执行HTTP请求时，利用 OkHttp + RxJava 的方式完成。于是将对 influxdb 插件上报统计数据到influxdb 数据库的关键流程源码做分析：

influxDBClient.getWriteApi().writePoints(list);

进入看源码：

@Nonnull
@Override
public WriteApi getWriteApi() {
	return getWriteApi(WriteOptions.DEFAULTS);
}

获取 influxdb 写入的api，并将统计数据通过api发送比较关键的就是这个写 API 的配置：WriteOptions.DEFAULTS，我们看下他具体的配置:

/**
* Default configuration with values that are consistent with Telegraf.
*/
public static final WriteOptions DEFAULTS = WriteOptions.builder().build();

// ... 省略其他代码 
public static class Builder {

        private int batchSize = DEFAULT_BATCH_SIZE;
        private int flushInterval = DEFAULT_FLUSH_INTERVAL;
        private int jitterInterval = DEFAULT_JITTER_INTERVAL;
        private int retryInterval = DEFAULT_RETRY_INTERVAL;
        private int maxRetries = DEFAULT_MAX_RETRIES;
        private int maxRetryDelay = DEFAULT_MAX_RETRY_DELAY;
        private int maxRetryTime = DEFAULT_MAX_RETRY_TIME;
        private int exponentialBase = DEFAULT_EXPONENTIAL_BASE;
        private int bufferLimit = DEFAULT_BUFFER_LIMIT;
        private Scheduler writeScheduler = Schedulers.newThread();
        private BackpressureOverflowStrategy backpressureStrategy = BackpressureOverflowStrategy.DROP_OLDEST;

}
 /// ... 省略其他代码

其中比较关键的是 I/O 线程调度器Scheduler，这个是 RxJava 中提供的，他的实现是Schedulers.newThread()，我们在进入Schedulers.newThread() 看源码：

@NonNull
static final Scheduler NEW_THREAD;
///... 省略其他代码
static {
  // ...
  NEW_THREAD = RxJavaPlugins.initNewThreadScheduler(new NewThreadTask());
}
///... 省略其他代码
@NonNull
public static Scheduler newThread() {
	return RxJavaPlugins.onNewThreadScheduler(NEW_THREAD);
}
///...  省略其他代码
public static Scheduler onNewThreadScheduler(@NonNull Scheduler defaultScheduler) {
        Function<? super Scheduler, ? extends Scheduler> f = onNewThreadHandler;
        if (f == null) {
            return defaultScheduler;
        }
        return apply(f, defaultScheduler);
    }

可以看到，真正的处理逻辑是交给了 newThreadScheduler 去处理的。newThreadScheduler 的初始化中，创建了一哥NewThreadTask，真正的线程处理逻辑交给他。

static final class NewThreadTask implements Callable<Scheduler> {
  @Override
  public Scheduler call() throws Exception {
  	return NewThreadHolder.DEFAULT;
  }
}

NewThreadTask 实现了Callable 接口并重写了 call 方法，所以真正执行时，会调用该类的 call 方法，而call 方法中，返回的调度器是NewThreadScheduler 这个调度器。而NewThreadScheduler 这个类 ,正是我们开始排查时在JVM中大量出现的类RxNewThreadScheduler这个调度器，它在真正执行工作的时候，会创建一个NewThreadWorker，NewThreadWorker 所使用的线程池，最终创建出来的是一个最大线程池数量特别巨大的线程池。当Job重复写入时，influxdb的写入量也飙升，而influxdb所用的IO线程调度器RxJava，创建的线程池是几乎没有上限的，这就导致influxdb在写入量很高时，创建的线程数也多，最终导致线程数飙升。

解决方案

使用同步写入

influxDBClient.getWriteApiBlocking().writePoints(list);
// 或
WriteApiBlocking writeApi = influxDBClient.getWriteApiBlocking();
writeApi.writePoints(list);

氵我是大明星

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
5
评论
在Java中，Influxdb2 写入数据时，线程及内存逐渐飙升的问题及解决方案

当Job重复写入时，influxdb的写入量也飙升，而influxdb所用的IO线程调度器RxJava，创建的线程池是几乎没有上限的，这就导致influxdb在写入量很高时，创建的线程数也多，最终导致线程数飙升。“字段”则用于保存实际的数据值。在使用任务对influxDB进行重复新增时，在JVM的线程栈中，出现了大量的 RxNewThreadScheduler 这个线程组，我们对代码进行排查，发现Job构建时，influxdb 插件会将统计数据，通过HTTP请求，存储到influxdb数据库中。
复制链接

扫一扫