需求:
根据传入的documenId集合,查询Document表中哪些documentId是存在元数据的。
由于数据量较大,为降低内存压力,查询到的元数据只返回需要的documentId字段。
一开始使用org.springframework.data.mongodb.core包下的mongoTemplateCDM.find()方法,查询10万数据耗时3分多钟,效率非常低。
public List<Long> queryExistingDocumentIds(List<Long> documentIds) {
long startTime = System.currentTimeMillis();
Query query = new Query();
query.fields().include(Constants.DOCUMENT_ID);
query.addCriteria(Criteria.where(Constants.DOCUMENT_ID).in(documentIds));
List<MongoDocumentImpl> mongoDocumentList = mongoTemplate.find(query, MongoDocumentImpl.class, Constants.COLLECTION_DOCUMENTIMPL);
List<Long> existingDocumentIdList = mongoDocumentList.stream().map(MongoDocumentImpl::getDocumentId).collect(Collectors.toList());
logger.info("queryExistingDocumentIds end, spent time (ms): {}", (System.currentTimeMillis() - startTime));
return existingDocumentIdList;
}
改进后使用com.mongodb.client包下的collection.find()原生方法,查询10万数据耗时仅需六秒多,效率大幅提高。
public List<Long> queryExistingDocumentIds(List<Long> documentIds) {
long startTime = System.currentTimeMillis();
MongoCollection<Document> collection = mongoTemplate.getCollection(Constants.COLLECTION_DOCUMENTIMPL);
Query query = new Query();
//只返回需要的字段
Bson projection = Projections.fields(Projections.include(Constants.DOCUMENT_ID));
query.addCriteria(Criteria.where(Constants.DOCUMENT_ID).in(documentIds));
List<Long> validDocumentIdList = new ArrayList<>();
try (MongoCursor<Document> cursor = collection.find(query.getQueryObject()).projection(projection).cursor()) {
while (cursor.hasNext()) {
Long next = cursor.next().getLong(Constants.DOCUMENT_ID);
validDocumentIdList.add(next);
}
} catch (Exception e) {
logger.warn("queryExistingDocumentIds search metadata from mongo (DOCUMENTIMPL) error ", e);
}
logger.info("queryExistingDocumentIds end, spent time (ms): {}", (System.currentTimeMillis() - startTime));
return validDocumentIdList;
}
部署后遇到报错:
java.lang.ClassCastException: class java.lang.Integer cannot be cast to class java.lang.Long (java.lang.Integer and java.lang.Long are in module java.base of loader ‘bootstrap’)
at org.bson.Document.getLong(Document.java:284) ~[bson-4.6.1.jar!/:na]
把代码优化为:
public List<Long> queryExistingDocumentIds(List<Long> documentIds) {
long startTime = System.currentTimeMillis();
MongoCollection<Document> collection = mongoTemplate.getCollection(Constants.COLLECTION_DOCUMENTIMPL);
Query query = new Query();
Bson projection = Projections.fields(Projections.include(Constants.DOCUMENT_ID));
query.addCriteria(Criteria.where(Constants.DOCUMENT_ID).in(documentIds));
List<Long> validDocumentIdList = new ArrayList<>();
try (MongoCursor<Document> cursor = collection.find(query.getQueryObject()).projection(projection).cursor()) {
while (cursor.hasNext()) {
//直接使用getLong()方法可能遇到类型转换异常
final Object next = cursor.next().get(Constants.DOCUMENT_ID);
if (next instanceof Integer) {
validDocumentIdList.add(((Integer) next).longValue());
} else {
validDocumentIdList.add((Long) next);
}
}
} catch (Exception e) {
logger.warn("queryExistingDocumentIds search metadata from mongo (DOCUMENTIMPL) error ", e);
}
logger.info("queryExistingDocumentIds end, spent time (ms): {}", (System.currentTimeMillis() - startTime));
return validDocumentIdList;
}