Testcontainers的MongoDB模块和Spring数据MongoDB在行动

1.简介 (1. Introduction)

How can I easily test my MongoDB multi-document transaction code without setting up MongoDB on my device? One might argue that they have to set it up first because in order to carry out such a transaction it needs a session which requires a replica set. Thankfully, there is no need to create a 3-node replica set and we can run these transactions only against a single database instance.

如何在不在设备上设置MongoDB的情况下轻松测试MongoDB多文档事务代码? 可能有人争辩说,他们必须先设置它,因为要执行这样的事务,它需要一个需要副本集的会话。 幸运的是,无需创建3节点副本集,并且我们只能对单个数据库实例运行这些事务。

To achieve this, we may do the following:

为此,我们可以执行以下操作:

  • Run a MongoDB container of version 4 or higher and specify a --replSet command;

    运行版本4或更高版本的MongoDB容器,并指定--replSet命令;
  • Initialize a single replica set by executing a proper command;

    通过执行适当的命令来初始化单个副本集;
  • Wait for the initialization to complete;

    等待初始化完成;
  • Connect to a standalone without specifying a replica set in order not to worry about modifying our OS host file.

    在不指定副本集的情况下连接到独立服务器,以免担心修改我们的OS主机文件。

It is worth mentioning that a replica set is not the only option here because MongoDB version 4.2 introduces distributed transactions in sharded clusters, which is beyond the scope of this article.

值得一提的是,副本集不是唯一的选择,因为MongoDB 4.2版在分片集群中引入了分布式事务,这不在本文讨论范围之内。

There are a lot of ways of how to initialize a replica set, including Docker compose, bash scripts, services in a CI/CD etc. However, it takes some extra work in terms of scripting, handling random ports, and making it part of the CI/CD process. Fortunately, starting from Testcontainers’ version 1.14.2 we are able to delegate all the heavy lifting to the MongoDB Module.

有很多方法可以初始化副本集,包括Docker compose,bash脚本,CI / CD中的服务等。但是,在脚本编写,处理随机端口以及使其成为其中的一部分方面,它需要进行一些额外的工作。 CI / CD流程。 幸运的是,从Testcontainers的1.14.2版本开始,我们可以将所有繁重的工作委托给MongoDB Module

Let us try it out on a small warehouse management system based on Spring Boot 2.3. In the recent past one had to use ReactiveMongoOperations and its inTransaction method, but since Spring Data MongoDB 2.2 M4 we have been able to leverage the good old @Transactional annotation or more advanced TransactionalOperator.

让我们在基于Spring Boot 2.3的小型仓库管理系统上进行尝试。 在最近的过去,人们不得不使用ReactiveMongoOperations及其inTransaction方法,但是自从Spring Data MongoDB 2.2 M4以来,我们已经能够利用良好的旧@Transactional注释或更高级的TransactionalOperator

Our application should have a REST API to provide the information on successfully processed files including the number of the documents modified. Regarding the files causing errors along the way, we should skip them to process all the files.

我们的应用程序应具有REST API,以提供有关成功处理的文件的信息,包括修改的文档数。 关于一路引起错误的文件,我们应该跳过它们以处理所有文件。

It may be noted that even though duplicated articles and their sizes within a single file are a rare case, this possibility is quite realistic, and therefore should be handled as well.

可能会注意到,即使在单个文件中出现重复的文章及其大小是很少见的情况,这种可能性也是很现实的,因此也应加以处理。

As per business requirements to our system, we already have some products in our database and we upload a bunch of Excel (xlsx) files to update some fields of the matched documents in our storage. Data is supposed to be only at the first sheet of any workbook. Each file is processed in a separate multi-document transaction to prevent simultaneous modifications of the same documents. For example, Figure 1 shows collision cases on how a transaction ends up except for a possible scenario when transactions are executed sequentially (json representation is shortened here for the sake of simplicity). Transactional behavior helps us to avoid clashing the data and guarantees consistency.

根据系统的业务需求,我们的数据库中已经有一些产品,并且我们上载了一堆Excel(xlsx)文件以更新存储中匹配文档的某些字段。 数据应该仅位于任何工作簿的第一页。 每个文件都在单独的多文档事务中处理,以防止同时修改相同的文档。 例如,图1显示了冲突的情况,除了顺序执行事务时可能发生的情况,事务如何结束(为简化起见,此处将json表示缩短)。 交易行为有助于我们避免数据冲突并保证一致性。

Figure 1

Figure 1 Transaction sequence diagram: collision cases

图1事务序列图:冲突案例

As for a product collection, we have an article as a unique index. At the same time, each article is bound to a concrete size. Therefore, it is important for our application to verify that both of them are in the database before updating. Figure 2 gives an insight into this collection.

至于产品系列,我们有一篇文章作为唯一索引。 同时,每篇文章都必须有具体的尺寸。 因此,对于我们的应用程序来说,在更新之前验证它们是否都在数据库中非常重要。 图2提供了对该集合的见解。

Figure 2

Figure 2 Product collection details

图2产品收集详细信息

2.业务逻辑实现 (2. Business logic implementation)

Let us elaborate on the major points of the above-mentioned business logic and start with ProductController as an entry point for the processing. You can find a complete project on GitHub. Prerequisites are Java8+ and Docker.

让我们详细说明上述业务逻辑的要点,并从ProductController作为处理的入口点开始。 您可以在GitHub上找到完整的项目。 先决条件是Java8 +和Docker。

@PatchMapping(
  consumes = MediaType.MULTIPART_FORM_DATA_VALUE,
  produces = MediaType.APPLICATION_STREAM_JSON_VALUE
)
public ResponseEntity<Flux<FileUploadDto>> patchProductQuantity(
  @RequestPart("file") Flux<FilePart> files,
  @AuthenticationPrincipal Principal principal
) {
  log.debug("shouldPatchProductQuantity");
  return ResponseEntity.accepted().body(
    uploadProductService.patchProductQuantity(files, principal.getName())
  );
}

1) Wrap a response in a ResponseEntity and return the flux of the FileUploadDto; 2) Get a current authentication principal, coming in handy later on; 3) Pass the flux of the FilePart to process.

1)将响应包装在ResponseEntity并返回FileUploadDtoflux ; 2)获取当前的身份验证主体,稍后派上用场; 3)将FilePartflux FilePart给进程。

Here is the patchProductQuantity method of the UploadProductServiceImpl:

这里是patchProductQuantity的方法UploadProductServiceImpl

public Flux<FileUploadDto> patchProductQuantity(
  final Flux<FilePart> files,
  final String userName
) {
  return Mono.fromRunnable(() -> initRootDirectory(userName))
    .publishOn(Schedulers.newBoundedElastic(1, 1, "initRootDirectory"))
    .log(String.format("cleaning-up directory: %s", userName))
    .thenMany(files.flatMap(f ->
        saveFileToDiskAndUpdate(f, userName)
          .subscribeOn(Schedulers.boundedElastic())
      )
    );
}

1) Use the name of the user as the root directory name; 2) Do the blocking initialization of the root directory on a separate elastic thread; 3) For each Excel file: 3.1) Save it on a disk; 3.2) Then update the quantity of the products on a separate elastic thread, as blocking processing of the file is ran.

1)使用用户名作为根目录名; 2)在单独的弹性线程上对根目录进行阻塞初始化; 3)对于每个Excel文件:3.1)将其保存在磁盘上; 3.2)然后,在运行文件的冻结处理时,在单独的弹性线上更新产品数量。

The saveFileToDiskAndUpdate method does the following logic:

saveFileToDiskAndUpdate方法执行以下逻辑:

private Mono<FileUploadDto> saveFileToDiskAndUpdate(
  final FilePart file,
  final String userName
) {
  final String fileName = file.filename();
  final Path path = Paths.get(pathToStorage, userName, fileName);
  return Mono.just(path)
    .log(String.format("A file: %s has been uploaded", fileName))
    .flatMap(file::transferTo)
    .log(String.format("A file: %s has been saved", fileName))
    .then(processExcelFile(fileName, userName, path));
}
  1. Copy the content of the file to the user’s directory;

    将文件内容复制到用户目录;
  2. After the copy stage is completed, call the processExcelFile method.

    复制阶段完成后,调用processExcelFile方法。

At this point, we are going to divide logic in accordance with the size of the file:

此时,我们将根据文件的大小划分逻辑:

private Mono<FileUploadDto> processExcelFile(
  final String fileName,
  final String userName,
  final Path path
) {
  return Mono.fromCallable(() -> Files.size(path))
    .flatMap(size -> {
      if (size >= bigFileSizeThreshold) {
        return processBigExcelFile(fileName, userName);
      } else {
        return processSmallExcelFile(fileName, userName);
      }
    });
}
  1. Wrap the blocking Files.size(path) call in Mono.fromCallable;

    将阻塞的Files.size(path)调用包装在Mono.fromCallable

  2. bigFileSizeThreshold is injected from a proper application.yml file via @Value("${upload-file.bigFileSizeThreshold}").

    bigFileSizeThreshold是通过@Value("${upload-file.bigFileSizeThreshold}")从适当的application.yml文件注入的。

Before going into detail on processing Excel files depending on their size, we should take a look at the getProducts method of the ExcelFileDaoImpl:

在根据大小处理Excel文件之前,我们先看一下ExcelFileDaoImplgetProducts方法:

@Override
public Flux<Product> getProducts(
  final String pathToStorage,
  final String fileName,
  final String userName
) {
  return Flux.defer(() -> {
    FileInputStream is;
    Workbook workbook;
    try {
      final File file = Paths.get(pathToStorage, userName, fileName).toFile();
      verifyFileAttributes(file);
      is = new FileInputStream(file);
      workbook = StreamingReader.builder()
        .rowCacheSize(ROW_CACHE_SIZE)
        .bufferSize(BUFFER_SIZE)
        .open(is);
    } catch (IOException e) {
      return Mono.error(new UploadProductException(
        String.format("An exception has been occurred while parsing a file: %s " +
          "has been saved", fileName), e));
    }

    try {
      final Sheet datatypeSheet = workbook.getSheetAt(0);
      final Iterator<Row> iterator = datatypeSheet.iterator();

      final AtomicInteger rowCounter = new AtomicInteger();
      if (iterator.hasNext()) {
        final Row currentRow = iterator.next();
        rowCounter.incrementAndGet();
        verifyExcelFileHeader(fileName, currentRow);
      }
      return Flux.<Product>create(fluxSink -> fluxSink.onRequest(value -> {
        try {
          for (int i = 0; i < value; i++) {
            if (!iterator.hasNext()) {
              fluxSink.complete();
              return;
            }

            final Row currentRow = iterator.next();
            final Product product = Objects.requireNonNull(getProduct(
              FileRow.builder()
                .fileName(fileName)
                .currentRow(currentRow)
                .rowCounter(rowCounter.incrementAndGet())
                .build()
            ), "product is not supposed to be null");
            fluxSink.next(product);
          }
        } catch (Exception e1) {
          fluxSink.error(e1);
        }
      })).doFinally(signalType -> {
        try {
          is.close();
          workbook.close();
        } catch (IOException e1) {
          log.error("Error has occurred while releasing {} resources: {}", fileName, e1);
        }
      });
    } catch (Exception e) {
      return Mono.error(e);
    }
  });
}
  1. differ the whole logic once there is a new subscriber;

    differ整个逻辑一旦有新的订户;

  2. Verify the excel file header;

    验证excel文件头;
  3. Create a flux to provide the requested number of products;

    创建flux以提供所需数量的产品;

  4. Convert an Excel row into a Product domain object;

    将Excel行转换为Product域对象;

  5. Finally, close all of the opened resources.

    最后,关闭所有打开的资源。

Getting back to the processing of the Excel files in the UploadProductServiceImpl, we are going to use the MongoDB’s bulkWrite method on a collection to update products in bulk, which requires the eagerly evaluated list of the UpdateOneModel. In practice, collecting such a list is a memory-consuming operation, especially for big files.

返回到UploadProductServiceImpl Excel文件的UploadProductServiceImpl ,我们将在集合上使用MongoDB的bulkWrite方法批量更新产品,这需要热切评估的UpdateOneModel列表。 实际上,收集此类列表是一项消耗内存的操作,尤其是对于大文件。

Regarding small Excel files, we provide a more detailed log and do additional validation check:

关于小型Excel文件,我们提供了更详细的日志并进行了其他验证检查:

private Mono<FileUploadDto> processSmallExcelFile(
  final String fileName,
  final String userName
) {
  log.debug("processSmallExcelFile: {}", fileName);
  return excelFileDao.getProducts(pathToStorage, fileName, userName)
    .reduce(new ConcurrentHashMap<ProductArticleSizeDto, Tuple2<UpdateOneModel<Document>, BigInteger>>(),
      (indexMap, product) -> {
        final BigInteger quantity = product.getQuantity();
        indexMap.merge(
          new ProductArticleSizeDto(product.getArticle(), product.getSize()),
          Tuples.of(
            updateOneModelConverter.convert(Tuples.of(product, quantity, userName)),
            quantity
          ),
          (oldValue, newValue) -> {
            final BigInteger mergedQuantity = oldValue.getT2().add(newValue.getT2());
            return Tuples.of(
              updateOneModelConverter.convert(Tuples.of(product, mergedQuantity, userName)),
              mergedQuantity
            );
          }

        );
        return indexMap;
      })
    .filterWhen(productIndexFile ->
      productDao.findByArticleIn(extractArticles(productIndexFile.keySet()))
        .<ProductArticleSizeDto>handle(
          (productArticleSizeDto, synchronousSink) -> {
            if (productIndexFile.containsKey(productArticleSizeDto)) {
              synchronousSink.next(productArticleSizeDto);
            } else {
              synchronousSink.error(new UploadProductException(
                String.format(
                  "A file %s does not have an article: %d with size: %s",
                  fileName,
                  productArticleSizeDto.getArticle(),
                  productArticleSizeDto.getSize()
                )
              ));
            }
          })
        .count()
        .handle((sizeDb, synchronousSink) -> {
          final int sizeFile = productIndexFile.size();
          if (sizeDb == sizeFile) {
            synchronousSink.next(Boolean.TRUE);
          } else {
            synchronousSink.error(new UploadProductException(
              String.format(
                "Inconsistency between total element size in MongoDB: %d and a file %s: %d",
                sizeDb,
                fileName,
                sizeFile
              )
            ));
          }
        })
    ).onErrorResume(e -> {
      log.debug("Exception while processExcelFile fileName: {}: {}", fileName, e);
      return Mono.empty();
    }).flatMap(productIndexFile ->
      productPatcherService.incrementProductQuantity(
        fileName,
        productIndexFile.values().stream().map(Tuple2::getT1).collect(Collectors.toList()),
        userName
      )
    ).map(bulkWriteResult -> FileUploadDto.builder()
      .fileName(fileName)
      .matchedCount(bulkWriteResult.getMatchedCount())
      .modifiedCount(bulkWriteResult.getModifiedCount())
      .build()
    );
}
  1. reduce helps us handle duplicate products whose quantities should be summed up;

    reduce可帮助我们处理应汇总数量的重复产品;

  2. Collect a map to get the list of the ProductArticleSizeDto against the pair of the list of the UpdateOneModel and the total quantity for a product. The former is in use for matching an article and its size in the file with those that are in the database via a projection ProductArticleSizeDto;

    收集一个映射,以将ProductArticleSizeDto的列表与UpdateOneModel的列表和产品的总数量相对。 前者用于通过投影ProductArticleSizeDto将文章及其在文件中的大小与数据库中的进行匹配;

  3. Use the atomic merge method of the ConcurrentMap to sum up the quantity of the same products and create a new UpdateOneModel;

    使用ConcurrentMap的原子merge方法总结相同产品的数量并创建一个新的UpdateOneModel

  4. Filter out all products in the file by those product’s articles that are in the database;

    通过数据库中那些产品的文章过滤掉文件中的所有产品;
  5. Each ProductArticleSizeDto found in the storage matches a ProductArticleSizeDto from the file summed up by quantity;

    存储中找到的每个ProductArticleSizeDto与文件中的ProductArticleSizeDto相匹配(按数量累加);

  6. Then count the result after filtration which should be equal to the distinct number of products in the file;

    然后count过滤后的结果,该结果应等于文件中产品的不同数量;

  7. Use the onErrorResume method to continue when any error occurs because we need to process all files as mentioned in the requirements;

    当发生任何错误时,请使用onErrorResume方法继续,因为我们需要按照要求中所述处理所有文件;

  8. Extract the list of the UpdateOneModel from the map collected earlier to be further used in the incrementProductQuantity method;

    提取的列表UpdateOneModel从早期收集在进一步使用的地图incrementProductQuantity方法;

  9. Then run the incrementProductQuantity method as a sub-process within flatMap and map its result in FileUploadDto that our business users are in need of.

    然后,在flatMap作为子过程运行incrementProductQuantity flatMap方法,并将其结果map到我们的业务用户需要的FileUploadDto中。

Even though the filterWhen and the subsequent productDao.findByArticleIn allow us to do some additional validation at an early stage, they come at a price, which is especially noticeable while processing big files in practice. However, the incrementProductQuantity method can compare the number of modified documents and match them against the number of the distinct products in the file. Knowing that, we can implement a more light-weight option to process big files:

即使filterWhen和随后的productDao.findByArticleIn允许我们在早期进行一些其他验证,但它们都是有代价的,这在实践中处理大文件时尤其明显。 但是, incrementProductQuantity方法可以比较已修改文档的数量,并将它们与文件中不同产品的数量进行匹配。 知道了这一点,我们可以实现一个更轻量级的选项来处理大文件:

private Mono<FileUploadDto> processBigExcelFile(
  final String fileName,
  final String userName
) {
  log.debug("processBigExcelFile: {}", fileName);
  return excelFileDao.getProducts(pathToStorage, fileName, userName)
    .reduce(new ConcurrentHashMap<Product, Tuple2<UpdateOneModel<Document>, BigInteger>>(),
      (indexMap, product) -> {
        final BigInteger quantity = product.getQuantity();
        indexMap.merge(
          product,
          Tuples.of(
            updateOneModelConverter.convert(Tuples.of(product, quantity, userName)),
            quantity
          ),
          (oldValue, newValue) -> {
            final BigInteger mergedQuantity = oldValue.getT2().add(newValue.getT2());
            return Tuples.of(
              updateOneModelConverter.convert(Tuples.of(product, mergedQuantity, userName)),
              mergedQuantity
            );
          }

        );
        return indexMap;
      })
    .map(indexMap -> indexMap.values().stream().map(Tuple2::getT1).collect(Collectors.toList()))
    .onErrorResume(e -> {
      log.debug("Exception while processExcelFile: {}: {}", fileName, e);
      return Mono.empty();
    }).flatMap(dtoList ->
      productPatcherService.incrementProductQuantity(
        fileName,
        dtoList,
        userName
      )
    ).map(bulkWriteResult -> FileUploadDto.builder()
      .fileName(fileName)
      .matchedCount(bulkWriteResult.getMatchedCount())
      .modifiedCount(bulkWriteResult.getModifiedCount())
      .build()
    );
}

Here is the ProductAndUserNameToUpdateOneModelConverter that we have used to create an UpdateOneModel:

这是我们用来创建UpdateOneModelProductAndUserNameToUpdateOneModelConverter

@Component
public class ProductAndUserNameToUpdateOneModelConverter implements
  Converter<Tuple3<Product, BigInteger, String>, UpdateOneModel<Document>> {

  @Override
  @NonNull
  public UpdateOneModel<Document> convert(@NonNull Tuple3<Product, BigInteger, String> source) {
    Objects.requireNonNull(source);
    final Product product = source.getT1();
    final BigInteger quantity = source.getT2();
    final String userName = source.getT3();

    return new UpdateOneModel<>(
      Filters.and(
        Filters.eq(Product.SIZE_DB_FIELD, product.getSize().name()),
        Filters.eq(Product.ARTICLE_DB_FIELD, product.getArticle())
      ),
      Document.parse(
        String.format(
          "{ $inc: { %s: %d } }",
          Product.QUANTITY_DB_FIELD,
          quantity
        )
      ).append(
        "$set",
        new Document(
          Product.LAST_MODIFIED_BY_DB_FIELD,
          userName
        )
      ),
      new UpdateOptions().upsert(false)
    );
  }
}
  1. Firstly, find a document by article and size. Figure 2 shows that we have a compound index on the size and article fields of the product collection to facilitate such a search;

    首先,按文章和大小查找文档。 图2显示我们在产品集合的大小和商品字段上具有复合索引,以方便进行此类搜索;
  2. Increment the quantity of the found document and set the name of the user in the lastModifiedBy field;

    增加找到的文档的数量,并在lastModifiedBy字段中设置用户名称;

  3. It is also possible to upsert a document here, but we are interested only in the modification of the existing documents in the storage.

    也可以在此处upsert文档,但是我们仅对存储中现有文档的修改感兴趣。

Now we are ready to implement the central part of our processing which is the incrementProductQuantity method of the ProductPatcherDaoImpl:

现在,我们准备实现处理的核心部分,即ProductPatcherDaoImplincrementProductQuantity方法:

@Override
public Mono<BulkWriteResult> incrementProductQuantity(
  final String fileName,
  final List<UpdateOneModel<Document>> models,
  final String userName
) {
  return transactionalOperator.execute(
    action -> reactiveMongoOperations.getCollection(Product.COLLECTION_NAME)
      .flatMap(collection ->
        Mono.from(collection.bulkWrite(models, new BulkWriteOptions().ordered(true)))

      ).<BulkWriteResult>handle((bulkWriteResult, synchronousSink) -> {
        final int fileCount = models.size();
        if (Objects.equals(bulkWriteResult.getModifiedCount(), fileCount)) {
          synchronousSink.next(bulkWriteResult);
        } else {
          synchronousSink.error(
            new IllegalStateException(
              String.format(
                "Inconsistency between modified doc count: %d and file doc count: %d. Please, check file: %s",
                bulkWriteResult.getModifiedCount(), fileCount, fileName
              )
            )
          );
        }

      }).onErrorResume(
        e -> Mono.fromRunnable(action::setRollbackOnly)
          .log("Exception while incrementProductQuantity: " + fileName + ": " + e)
          .then(Mono.empty())
      )
  ).singleOrEmpty();
}
  1. Use a transactionalOperator to roll back a transaction manually. As has been mentioned before, our goal is to process all files while skipping those causing exceptions;

    使用transactionalOperator手动回滚事务。 如前所述,我们的目标是在处理所有文件的同时跳过引起异常的文件。

  2. Run a single sub-process to bulk write modifications to the database sequentially for fail-fast and less resource-intensive behavior. The word "single" is of paramount importance here because we avoid the dangerous "N+1 Query Problem" leading to spawning a lot of sub-processes on a flux within flatMap;

    运行单个子流程,以按顺序将修改内容批量写入数据库,以实现快速失败和较少资源占用的行为。 “单”一词在这里至关重要,因为我们避免了危险的“ N + 1查询问题”,该问题导致在flatMapflux上产生大量子过程;

  3. Handle the situation when the number of the documents processed does not match the one coming from the distinct number of the products in the file;

    Handle的文档数量与来自文件中不同产品数量的文档数量不匹配的情况;

  4. The onErrorResume method handles the rollback of the transaction and then returns Mono.empty() to skip the current processing;

    onErrorResume方法处理事务的回滚,然后返回Mono.empty()以跳过当前处理;

  5. Expect either a single item or an empty Mono as the result of the transactionalOperator.execute method.

    作为transactionalOperator.execute方法的结果,可能期望单个项目或空的Mono。

One would say: "You called collection.bulkWrite(models, new BulkWriteOptions().ordered(true)), what about setting a session?". The thing is that the SessionAwareMethodInterceptor of the Spring Data MongoDB does it via reflection:

有人会说:“您调用了collection.bulkWrite(models, new BulkWriteOptions().ordered(true)) ,那么如何设置会话?”。 事实是,Spring Data MongoDB的SessionAwareMethodInterceptor通过反射来实现:

ReflectionUtils.invokeMethod(targetMethod.get(), target,
        prependSessionToArguments(session, methodInvocation)

Here is the prependSessionToArguments method:

这是prependSessionToArguments方法:

private static Object[] prependSessionToArguments(ClientSession session, MethodInvocation invocation) {

  Object[] args = new Object[invocation.getArguments().length + 1];

  args[0] = session;
  System.arraycopy(invocation.getArguments(), 0, args, 1, invocation.getArguments().length);

  return args;
}

1) Get the arguments of the MethodInvocation; 2) Add session as a the first element in the args array.

1)获取MethodInvocation的参数; 2)加入session作为在所述第一元件args阵列。

In fact, the following method of the MongoCollectionImpl is called:

实际上, MongoCollectionImpl的以下方法称为:

@Override
public Publisher<BulkWriteResult> bulkWrite(final ClientSession clientSession,
                                            final List<? extends WriteModel<? extends TDocument>> requests,
                                            final BulkWriteOptions options) {
  return Publishers.publish(
    callback -> wrapped.bulkWrite(clientSession.getWrapped(), requests, options, callback));
}

3.测试实施 (3. Test implementation)

So far so good, we can create integration tests to cover our logic.

到目前为止,我们可以创建集成测试来涵盖我们的逻辑。

To begin with, we create ProductControllerITTest to test our public API via the Spring’s WebTestClient and initialize a MongoDB instance to run tests against:

首先,我们创建ProductControllerITTest来通过Spring的WebTestClient测试我们的公共API,并初始化一个MongoDB实例以针对以下条件运行测试:

private static final MongoDBContainer MONGO_DB_CONTAINER =
  new MongoDBContainer("mongo:4.2.8");

1) Use a static field to have single Testcontainers’ MongoDBContainer per all test methods in ProductControllerITTest; 2) We use 4.2.8 MongoDB container version from Docker Hub as it is the latest stable one, otherwise MongoDBContainer defaults to 4.0.10.

1)使用一个静态字段在ProductControllerITTest所有测试方法中只有一个Testcontainers的MongoDBContainer ; 2)我们使用来自Docker Hub的4.2.8 MongoDB容器版本,因为它是最新的稳定版本,否则MongoDBContainer默认为4.0.10。

Then in static methods setUpAll and tearDownAll we start and stop the MongoDBContainer respectively. Even though we do not use Testcontainers' reusable feature here, we leave open the possibility of setting it. Which is why we call MONGO_DB_CONTAINER.stop() only if the reusable feature is turned off.

然后在静态方法setUpAlltearDownAll我们分别启动和停止MongoDBContainer 。 即使我们在这里不使用Testcontainers的可重用功能,也可以设置它。 这就是为什么仅在可重用功能处于关闭状态时才调用MONGO_DB_CONTAINER.stop()原因。

@BeforeAll
static void setUpAll() {
    MONGO_DB_CONTAINER.start();
}

@AfterAll
static void tearDownAll() {
  if (!MONGO_DB_CONTAINER.isShouldBeReused()) {
    MONGO_DB_CONTAINER.stop();
  }
}

Next we set spring.data.mongodb.uri by executing MONGO_DB_CONTAINER.getReplicaSetUrl() in ApplicationContextInitializer:

接下来,通过在ApplicationContextInitializer执行MONGO_DB_CONTAINER.getReplicaSetUrl()来设置spring.data.mongodb.uri

static class Initializer implements ApplicationContextInitializer<ConfigurableApplicationContext> {
  @Override
  public void initialize(@NotNull ConfigurableApplicationContext configurableApplicationContext) {
    TestPropertyValues.of(
      String.format("spring.data.mongodb.uri: %s", MONGO_DB_CONTAINER.getReplicaSetUrl())
    ).applyTo(configurableApplicationContext);
  }
}

Now we are ready to write a first test without any transaction collision, because our test files (see Figure 3) have products whose articles do not clash with one another.

现在,我们准备编写没有任何事务冲突的第一个测试,因为我们的测试文件(请参见图3)具有其文章不会相互冲突的产品。

Figure 3

Figure 3 Excel files causing no collision in the articles of the products

图3产品文件中没有引起冲突的Excel文件

@WithMockUser(
  username = SecurityConfig.ADMIN_NAME,
  password = SecurityConfig.ADMIN_PAS,
  authorities = SecurityConfig.WRITE_PRIVILEGE
)
@Test
void shouldPatchProductQuantity() {
  //GIVEN
  insertMockProductsIntoDb(Flux.just(product1, product2, product3));
  final BigInteger expected1 = BigInteger.valueOf(16);
  final BigInteger expected2 = BigInteger.valueOf(27);
  final BigInteger expected3 = BigInteger.valueOf(88);
  final String fileName1 = "products1.xlsx";
  final String fileName3 = "products3.xlsx";
  final String[] fileNames = {fileName1, fileName3};
  final FileUploadDto fileUploadDto1 = ProductTestUtil.mockFileUploadDto(fileName1, 2);
  final FileUploadDto fileUploadDto3 = ProductTestUtil.mockFileUploadDto(fileName3, 1);

  //WHEN
  final WebTestClient.ResponseSpec exchange = webClient
    .patch()
    .uri(BASE_URL)
    .contentType(MediaType.MULTIPART_FORM_DATA)
    .body(BodyInserters.fromMultipartData(ProductTestUtil.getMultiPartFormData(fileNames)))
    .exchange();

  //THEN
  exchange.expectStatus().isAccepted();

  exchange.expectBodyList(FileUploadDto.class)
    .hasSize(2)
    .contains(fileUploadDto1, fileUploadDto3);

  StepVerifier.create(productDao.findAllByOrderByQuantityAsc())
    .assertNext(product -> assertEquals(expected1, product.getQuantity()))
    .assertNext(product -> assertEquals(expected2, product.getQuantity()))
    .assertNext(product -> assertEquals(expected3, product.getQuantity()))
    .verifyComplete();
}

Finally, let us test a transaction collision in action, keeping in mind Figure 1 and Figure 4 showing such files:

最后,让我们测试实际操作中的事务冲突,并记住显示此类文件的图1和图4:

Figure 4

Figure 4 Excel files causing a collision in the articles of the products

图4导致产品文章冲突的Excel文件

@WithMockUser(
  username = SecurityConfig.ADMIN_NAME,
  password = SecurityConfig.ADMIN_PAS,
  authorities = SecurityConfig.WRITE_PRIVILEGE
)
@Test
void shouldPatchProductQuantityConcurrently() {
  //GIVEN
  TransactionUtil.setMaxTransactionLockRequestTimeoutMillis(
    20,
    MONGO_DB_CONTAINER.getReplicaSetUrl()
  );
  insertMockProductsIntoDb(Flux.just(product1, product2));
  final String fileName1 = "products1.xlsx";
  final String fileName2 = "products2.xlsx";
  final String[] fileNames = {fileName1, fileName2};
  final BigInteger expected120589Sum = BigInteger.valueOf(19);
  final BigInteger expected120590Sum = BigInteger.valueOf(32);
  final BigInteger expected120589T1 = BigInteger.valueOf(16);
  final BigInteger expected120589T2 = BigInteger.valueOf(12);
  final BigInteger expected120590T1 = BigInteger.valueOf(27);
  final BigInteger expected120590T2 = BigInteger.valueOf(11);
  final FileUploadDto fileUploadDto1 = ProductTestUtil.mockFileUploadDto(fileName1, 2);
  final FileUploadDto fileUploadDto2 = ProductTestUtil.mockFileUploadDto(fileName2, 2);

  //WHEN
  final WebTestClient.ResponseSpec exchange = webClient
    .patch()
    .uri(BASE_URL)
    .contentType(MediaType.MULTIPART_FORM_DATA)
    .accept(MediaType.APPLICATION_STREAM_JSON)
    .body(BodyInserters.fromMultipartData(ProductTestUtil.getMultiPartFormData(fileNames)))
    .exchange();

  //THEN
  exchange.expectStatus().isAccepted();
  assertThat(
    extractBodyArray(exchange),
    either(arrayContaining(fileUploadDto1))
      .or(arrayContaining(fileUploadDto2))
      .or(arrayContainingInAnyOrder(fileUploadDto1, fileUploadDto2))
  );

  final List<Product> list = productDao.findAll(Sort.by(Sort.Direction.ASC, "article"))
    .toStream().collect(Collectors.toList());
  assertThat(list.size(), is(2));

  assertThat(
    list.stream().map(Product::getQuantity).toArray(BigInteger[]::new),
    either(arrayContaining(expected120589T1, expected120590T1))
      .or(arrayContaining(expected120589T2, expected120590T2))
      .or(arrayContaining(expected120589Sum, expected120590Sum))
  );
  TransactionUtil.setMaxTransactionLockRequestTimeoutMillis(
    5,
    MONGO_DB_CONTAINER.getReplicaSetUrl()
  );
}
  1. We can specify the maximum amount of time in milliseconds that multi-document transactions should wait to acquire locks required by the operations in the transaction (by default, multi-document transactions wait 5 milliseconds);

    我们可以指定多文档事务应等待的最长时间(以毫秒为单位),以获取事务中的操作所需的锁(默认情况下,多文档事务要等待5毫秒);
  2. As an example here, we might use a helper method to change 5ms with 20ms (see an implementation details below).

    作为此处的示例,我们可以使用辅助方法将5ms更改为20ms(请参见下面的实现详细信息)。

Note that the maxTransactionLockRequestTimeoutMillis setting has no sense for this particular test case and serves the purpose of the example. After running this test class 120 times via a script ./load_test.sh 120 ProductControllerITTest.shouldPatchProductQuantityConcurrently in the tools directory of the project, I got the following figures:

请注意, maxTransactionLockRequestTimeoutMillis设置对该特定测试用例没有意义,并且仅用于示例。 通过脚本./load_test.sh 120 ProductControllerITTest.shouldPatchProductQuantityConcurrently运行此测试类120次后,在项目的工具目录中同时运行./load_test.sh 120 ProductControllerITTest.shouldPatchProductQuantityConcurrently ,我得到了下图:

indicator20ms,

times

5ms(default),

times

T1 successes6156
T2 successes5763
T1 and T2 success21
指示符 20毫秒

5ms(默认),

T1成功 61 56
T2成功 57 63
T1和T2成功 2 1个

Figure 5 Running the shouldPatchProductQuantityConcurrently test 120 times with 20 and 5 ms maxTransactionLockRequestTimeoutMillis respectively

图5运行shouldPatchProductQuantity并同时分别以20和5 ms maxTransactionLockRequestTimeoutMillis测试120次

While going through logs, we may come across something like:

在查看日志时,我们可能会遇到以下情况:

Initiating transaction rollback…

正在启动交易回滚…

Initiating transaction commit…

正在启动交易提交…

About to abort transaction for session…

即将中止会话的交易…

About to commit transaction for session...

即将提交交易以进行会话...

Then, let us test the processing of the big file containing 1 million products in a separate PatchProductLoadITTest:

然后,让我们在单独的PatchProductLoadITTest测试包含一百万个产品的大文件的处理:

@WithMockUser(
  username = SecurityConfig.ADMIN_NAME,
  password = SecurityConfig.ADMIN_PAS,
  authorities = SecurityConfig.WRITE_PRIVILEGE
)
@Test
void shouldPatchProductQuantityBigFile() {
  //GIVEN
  unzipClassPathFile("products_1M.zip");

  final String fileName = "products_1M.xlsx";
  final int count = 1000000;
  final long totalQuantity = 500472368779L;
  final List<Document> products = getDocuments(count);

  TransactionUtil.setTransactionLifetimeLimitSeconds(
    900,
    MONGO_DB_CONTAINER.getReplicaSetUrl()
  );

  StepVerifier.create(
    reactiveMongoTemplate.remove(new Query(), Product.COLLECTION_NAME)
      .then(reactiveMongoTemplate.getCollection(Product.COLLECTION_NAME))
      .flatMapMany(c -> c.insertMany(products))
      .switchIfEmpty(Mono.error(new RuntimeException("Cannot insertMany")))
      .then(getTotalQuantity())
  ).assertNext(t -> assertEquals(totalQuantity, t)).verifyComplete();

  //WHEN
  final Instant start = Instant.now();
  final WebTestClient.ResponseSpec exchange = webClient
    .patch()
    .uri(BASE_URL)
    .contentType(MediaType.MULTIPART_FORM_DATA)
    .accept(MediaType.APPLICATION_STREAM_JSON)
    .body(BodyInserters.fromMultipartData(ProductTestUtil.getMultiPartFormData("products_1M.xlsx")))
    .exchange();

  //THEN
  exchange
    .expectStatus()
    .isAccepted()
    .expectBodyList(FileUploadDto.class)
    .contains(ProductTestUtil.mockFileUploadDto(fileName, count));
  StepVerifier.create(getTotalQuantity())
    .assertNext(t -> assertEquals(totalQuantity * 2, t))
    .verifyComplete();
  log.debug(
    "============= shouldPatchProductQuantityBigFile elapsed {}minutes =============",
    Duration.between(start, Instant.now()).toMinutes()
  );
}
  1. The general setup is similar to the ProductControllerITTest;

    常规设置类似于ProductControllerITTest

  2. Unzip a json file containing 1 million products which requires about 254M on a disk;

    解压缩包含一百万个产品的json文件,这大约需要磁盘上的254M;
  3. Transactions have a lifetime limit as specified by transactionLifetimeLimitSeconds which is 60 seconds by default. We need to increase it here, because generally it takes more than 60 s to process such a file. For this, we use a helper method to change this lifespan to 900 s (see the implementation details below). For your information, the REST call with the file takes GitHub Actions about 9-12 minutes;

    事务具有transactionLifetimeLimitSeconds指定的生存期限制,默认为60秒。 我们需要在此处增加它,因为通常需要60多个秒来处理这样的文件。 为此,我们使用辅助方法将此寿命更改为900 s(请参见下面的实现详细信息)。 就您的信息而言,带有文件的REST调用大约需要9-12分钟的GitHub Actions;

  4. Before processing, we clean up a product collection, insert 1 million products from the json file and then get the total of the quantity;

    在处理之前,我们清理产品集合,从json文件中插入100万个产品,然后得出数量的总数;
  5. Given the products in the json file and the big excel file are equal, we assert that the total quantity of the product after processing should double.

    给定json文件和excel大文件中的产品相等,我们断言经过处理的产品总数应加倍。

Such a test requires a relatively big heap of about 4GB (see Figure 6):

这样的测试需要大约4GB的相对较大的堆(参见图6):

Figure 6

Figure 6 VisualVM Monitor Heap while uploading a 1-million-product file

图6上载100万个产品文件时的VisualVM Monitor Heap

As we can see, it is sensible to configure the maximum amount of disk space allowed for file parts and the maximum number of parts allowed in a given multipart request. Which is why I added properties to a proper application.yml file and then set them in the configureHttpMessageCodecs method of the implemented WebFluxConfigurer. However, adding Rate Limiter and configuring Schedulers might be a better solution in production environment. Note that we use Schedulers.boundedElastic() here having a pool of 10 * Runtime.getRuntime().availableProcessors() threads by default.

如我们所见,明智的是配置文件部分所允许的最大磁盘空间量以及给定的多部分请求中所允许的最大部分数。 这就是为什么我将属性添加到适当的application.yml文件,然后在实现的WebFluxConfigurerconfigureHttpMessageCodecs方法中进行设置的WebFluxConfigurer 。 但是,在生产环境中,添加Rate Limiter和配置Schedulers可能是更好的解决方案。 请注意,默认情况下,我们使用Schedulers.boundedElastic()在此处具有10 * Runtime.getRuntime().availableProcessors()线程池。

Here is TransactionUtilcontaining the above-mentioned helper methods:

这是包含上述辅助方法的TransactionUtil

public class TransactionUtil {
  private TransactionUtil() {
  }

  public static void setTransactionLifetimeLimitSeconds(
    final int duration,
    final String replicaSetUrl
  ) {
    setMongoParameter("transactionLifetimeLimitSeconds", duration, replicaSetUrl);
  }

  public static void setMaxTransactionLockRequestTimeoutMillis(
    final int duration,
    final String replicaSetUrl
  ) {
    setMongoParameter("maxTransactionLockRequestTimeoutMillis", duration, replicaSetUrl);
  }

  private static void setMongoParameter(
    final String param,
    final int duration,
    final String replicaSetUrl
  ) {
    try (final MongoClient mongoReactiveClient = MongoClients.create(
      ConnectionUtil.getMongoClientSettingsWithTimeout(replicaSetUrl)
    )) {

      StepVerifier.create(mongoReactiveClient.getDatabase("admin").runCommand(
        new Document("setParameter", 1).append(param, duration)
      )).expectNextCount(1)
        .verifyComplete();
    }
  }
}

4.如何使用代码? (4. How can I play with the code?)

Small WMS (warehouse management system) on GitHub.

GitHub上的小型WMS(仓库管理系统)

5.对我有什么好处? (5. What’s in it for me?)

  1. The MongoDBContainer takes care of the complexity in the MongoDB replica set initialization allowing the developer to focus on testing. Now we can simply make MongoDB transaction testing part of our CI/CD process;

    MongoDBContainer了MongoDB副本集初始化的复杂性,使开发人员可以专注于测试。 现在,我们只需将MongoDB事务测试纳入CI / CD流程即可;

  2. While processing data, it is sensible to favor MongoDB’s bulk methods, reducing the number of sub-processes within the flatMap method of the Flux and thus to avoid introducing the "N+1 Query problem". However, it also comes at a price because here we need to collect a list of UpdateOneModel and keep it in memory lacking reactive flexibility;

    在处理数据时,明智的做法是偏爱MongoDB的批量方法,以减少FluxflatMap方法中的子流程数量,从而避免引入“ N + 1查询问题”。 但是,这也是有代价的,因为在这里我们需要收集UpdateOneModel的列表,并将其保留在缺乏React灵活性的内存中。

  3. When it comes to skipping processing, one might employ onErrorResume instead of the dangerous onErrorContinue

    当涉及到跳过处理时,可能会使用onErrorResume而不是the dangerous onErrorContinue

  4. Even though are we allowed to set maxTransactionLockRequestTimeoutMillis and transactionLifetimeLimitSeconds as parameters during start-up to mongod, we may achieve the effect by calling the MongoDB's adminCommand via helper methods;

    即使在启动mongod期间允许我们将maxTransactionLockRequestTimeoutMillistransactionLifetimeLimitSeconds设置为参数,我们也可以通过helper方法调用MongoDB的adminCommand来达到效果。

  5. Processing big files is resource-consuming and thus better be limited.

    处理大文件很耗资源,因此最好加以限制。

6.想更深入吗? (6. Want to go deeper?)

To construct a multi-node MongoDB replica set for testing complicated failover cases, consider the mongodb-replica-set project.

要构建用于测试复杂故障转移情况的多节点MongoDB副本集,请考虑使用mongodb-replica-set项目

  1. Reactive Transactions Masterclass by Michael Simons & Mark Paluch

    迈克尔·西蒙斯(Michael Simons)和马克·帕劳奇(Mark Paluch)的无功交易大师班

  2. Spring Data MongoDB — Reference Documentation

    Spring Data MongoDB —参考文档

  3. MongoDB Transactions

    MongoDB事务

  4. MongoDB Collection Methods

    MongoDB收集方法

翻译自: https://habr.com/en/post/513026/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值