背景
今天遇到接口查询比较慢的两个场景,分析一下并如何优化
场景1:导入文件预览,目的将文件中的企业和标签进行导入后分析和标签是否存在,并返回关联关系,展示出来再有操作人观察是否需要修改再提交更新。在查询预览这个接口导入1700+数据每条数据关联标签平均3个,需要近7分钟
场景2:企业查询列表,企业表、标签关联表、标签表、连表查询,企业数据22w+,测试环境(已标签关联数据29)3s内,线上(9000个标签关联数据)直接响应不回来了
场景1
大致逻辑
1、读取文件转为,企业-》标签集合
2、遍历企业,查询企业信息,标签信息(标签及分类),企业不存在过滤掉
这个方式1700+条企业数据,耗时:将近7分钟
整体逻辑没毛病,也都是走索引查询的,但问题就在单条遍历,每天需要查数据库3次(企业、标签、标签分类)按这个逻辑计算1700*3*3,需要交互15300次,
这种方式数据量小百来条还能忍受数据量一大就不行了。
优化1:
思路:因为都能走索引,可以批量查找减少交互次数,从而较少耗时
修改代码第二步将企业遍历改为分片遍历,每次500个企业,标签和分类联合查询,交互次数:for循环变为1700/500=4,企业1次,标签和分类1次,最终4*2=8次
交互次数重15300->8,缩小1900+倍
效果:耗时:2-3s,提升100-200倍,虽然随之数据量增加效果会骤减,但因为是企业数据量不会特别大,所以已经够了。
优化2:
思路:多线程+缓存,标签查询逻辑可加入缓存,如果还行提升可以多线程,将分片交个多个线程再分片执行,后汇总;因为标签不容易变动所以可以加缓存
代码优化前:
public List<KeyEnterpriseRelationTagImportPreviewVO> importRelationTagPreview(MultipartFile file) {
//...文件转为对象List<Object> objects
//企业-》标签集合
Map<String, List<String>> creditCodeAndTagMap = objects.stream().map(object -> (KeyEnterpriseRelationTagImportExcel) object)
.filter(object -> StringUtils.isNotBlank(object.getCreditCode()))
.collect(Collectors.toMap(KeyEnterpriseRelationTagImportExcel::getCreditCode, value -> {
List<String> tags = Lists.newArrayList();
if (StringUtils.isNotEmpty(value.getTagName())) {
tags.addAll(Lists.newArrayList(value.getTagName().split(SPLIT_CHART)));
}
if (StringUtils.isNotEmpty(value.getTagName())) {
tags.addAll(Lists.newArrayList(value.getCompanyTypeName().split(SPLIT_CHART)));
}
return tags;
}));
//按企业遍历,查询企业信息,以及标签集合
List<KeyEnterpriseRelationTagImportPreviewVO> voList = Lists.newArrayList();
for (Map.Entry<String, List<String>> entry : creditCodeAndTagMap.entrySet()) {
KeyEnterpriseRelationTagImportPreviewVO previewVO = new KeyEnterpriseRelationTagImportPreviewVO();
String creditNo = entry.getKey();
BizKeyEnterprise bizKeyEnterprise = bizKeyEnterpriseService.getOne(new LambdaQueryWrapper<BizKeyEnterprise>()
.select(BizKeyEnterprise::getCreditNo,
BizKeyEnterprise::getCompanyName)
.eq(BizKeyEnterprise::getCreditNo, creditNo));
//如果需要标记不存在的企业
if (bizKeyEnterprise == null) {
continue;
}
voList.add(previewVO);
EnterpriseDTO enterpriseDTO = new EnterpriseDTO();
enterpriseDTO.setCreditNo(bizKeyEnterprise.getCreditNo());
enterpriseDTO.setName(bizKeyEnterprise.getCompanyName());
previewVO.setInfo(enterpriseDTO);
List<String> value = entry.getValue();
if (CollectionUtils.isEmpty(value)) {
continue;
}
List<KeyEnterpriseTagAndCategoryVO> tagList = value.stream().distinct().map(tagName -> {
if (StringUtils.isBlank(tagName)) {
return null;
}
BizKeyEnterpriseTag tag = bizKeyEnterpriseTagService.getOne(new LambdaQueryWrapper<BizKeyEnterpriseTag>()
.eq(BizKeyEnterpriseTag::getName, tagName)
.eq(BizKeyEnterpriseTag::getDisabled, false)
);
if (tag != null) {
KeyEnterpriseTagAndCategoryVO keyEnterpriseTagAndCategoryVO = new KeyEnterpriseTagAndCategoryVO();
keyEnterpriseTagAndCategoryVO.setCreditNo(bizKeyEnterprise.getCreditNo());
keyEnterpriseTagAndCategoryVO.setTagId(tag.getId());
keyEnterpriseTagAndCategoryVO.setTagName(tag.getName());
BizKeyEnterpriseTagCategory category = bizKeyEnterpriseTagCategoryService.getOne(new LambdaQueryWrapper<BizKeyEnterpriseTagCategory>()
.eq(BizKeyEnterpriseTagCategory::getId, tag.getCategoryId()));
if (category != null) {
keyEnterpriseTagAndCategoryVO.setCategoryId(category.getId());
keyEnterpriseTagAndCategoryVO.setCategoryName(category.getName());
}
return keyEnterpriseTagAndCategoryVO;
}
return null;
}).filter(Objects::nonNull).collect(Collectors.toList());
previewVO.setTagList(tagList);
}
return voList;
}
优化后:
public List<KeyEnterpriseRelationTagImportPreviewVO> importRelationTagPreview(MultipartFile file) {
//...文件转为对象List<Object> objects
//企业-》标签集合
Map<String, List<String>> creditCodeAndTagMap = objects.stream().map(object -> (KeyEnterpriseRelationTagImportExcel) object)
.filter(object -> StringUtils.isNotBlank(object.getCreditCode()) || (StringUtils.isNotBlank(object.getTagName()) && StringUtils.isNotBlank(object.getCompanyTypeName())))
.collect(Collectors.toMap(KeyEnterpriseRelationTagImportExcel::getCreditCode, value -> {
List<String> tags = Lists.newArrayList();
if (StringUtils.isNotEmpty(value.getTagName())) {
tags.addAll(Lists.newArrayList(value.getTagName().split(SPLIT_CHART)));
}
if (StringUtils.isNotEmpty(value.getCompanyTypeName())) {
tags.addAll(Lists.newArrayList(value.getCompanyTypeName().split(SPLIT_CHART)));
}
return tags;
}));
List<KeyEnterpriseRelationTagImportPreviewVO> voList = Lists.newArrayList();
Set<String> creditNos = creditCodeAndTagMap.keySet();
//优化:单个遍历改为分片,查询企业信息,以及标签集合
int forCount = creditNos.size() % LIMIT == 0 ? creditNos.size() / LIMIT : creditNos.size() / LIMIT + 1;
for (int i = 0; i < forCount; i++) {
List<String> limitCreditNos = creditNos.stream().skip((long) i * LIMIT).limit(LIMIT).collect(Collectors.toList());
List<BizKeyEnterprise> bizKeyEnterprises = bizKeyEnterpriseService.list(new LambdaQueryWrapper<BizKeyEnterprise>()
.select(BizKeyEnterprise::getCreditNo,
BizKeyEnterprise::getCompanyName)
.in(BizKeyEnterprise::getCreditNo, limitCreditNos)
.groupBy(BizKeyEnterprise::getCreditNo)
);
if (CollectionUtils.isEmpty(bizKeyEnterprises)) {
continue;
}
//标签去重
Set<String> tagNames = Sets.newHashSet();
bizKeyEnterprises.forEach(bizKeyEnterprise -> {
tagNames.addAll(creditCodeAndTagMap.get(bizKeyEnterprise.getCreditNo()));
});
List<KeyEnterpriseTagAndCategoryVO> tagList = bizKeyEnterpriseTagService.listTagName(tagNames, null);
Map<String, KeyEnterpriseTagAndCategoryVO> tagNameAndTagVO = tagList.stream().collect(Collectors.toMap(KeyEnterpriseTagAndCategoryVO::getTagName, Function.identity()));
for (BizKeyEnterprise bizKeyEnterprise : bizKeyEnterprises) {
KeyEnterpriseRelationTagImportPreviewVO previewVO = new KeyEnterpriseRelationTagImportPreviewVO();
voList.add(previewVO);
EnterpriseDTO enterpriseDTO = new EnterpriseDTO();
enterpriseDTO.setName(bizKeyEnterprise.getCompanyName());
enterpriseDTO.setCreditNo(bizKeyEnterprise.getCreditNo());
previewVO.setInfo(enterpriseDTO);
List<String> tagNameList = creditCodeAndTagMap.get(bizKeyEnterprise.getCreditNo());
if (CollectionUtils.isEmpty(tagNameList)) {
continue;
}
previewVO.setTagList(tagNameList.stream().map(tagName -> {
KeyEnterpriseTagAndCategoryVO vo = tagNameAndTagVO.get(tagName);
if(vo == null) return null;
vo.setCreditNo(bizKeyEnterprise.getCreditNo());
return vo;
}).filter(Objects::nonNull).collect(Collectors.toList()));
}
}
return voList;
}
场景2
大致逻辑:企业列表,通过企业部分字段、标签、标签类别查询企业,返回企业部分字段关联标签、标签分类
sql优化前,不管是否需要用到标签表都left join,按线上现有数据量计算,220000*9000=1980000000,将近20亿扫描😂,难怪不响应了
一开始有人说给关系表加索引(这种其实没效果,因为区分度不大,标签才71个,有9000条关系),但给同事看,建了索引分析结果如下:
+----+-------------+-------+------------+--------+--------------------------------+---------+---------+----------------------------------+--------+----------+------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+--------------------------------+---------+---------+----------------------------------+--------+----------+------------------------------------------------+
| 1 | SIMPLE | dci | NULL | ALL | credit_no_IDX | NULL | NULL | NULL | 204510 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | bketr | NULL | ALL | credit_no_idx | NULL | NULL | NULL | 8514 | 100.00 | Range checked for each record (index map: 0x2) |
| 1 | SIMPLE | bket | NULL | eq_ref | PRIMARY | PRIMARY | 8 | bketr.tag_id | 1 | 100.00 | Using index |
+----+-------------+-------+------------+--------+--------------------------------+---------+---------+----------------------------------+--------+----------+------------------------------------------------+
未加索引分析结果
+----+-------------+-------+------------+--------+--------------------------------+---------+---------+----------------------------------+--------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+--------------------------------+---------+---------+----------------------------------+--------+----------+----------------------------------------------------+
| 1 | SIMPLE | dci | NULL | ALL | credit_no_IDX | NULL | NULL | NULL | 204510 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | bketr | NULL | ALL | NULL | NULL | NULL | NULL | 8514 | 100.00 | Using where; Using join buffer (Block Nested Loop) |
| 1 | SIMPLE | bket | NULL | eq_ref | PRIMARY | PRIMARY | 8 | bketr.tag_id | 1 | 100.00 | Using index |
+----+-------------+-------+------------+--------+--------------------------------+---------+---------+----------------------------------+--------+----------+----------------------------------------------------+
通过对比,发现虽然加了索引后识别到索引了,但并没有使用(区别度不高,mysql不走索引,走全表扫描)
通过sql分析,查询结果并不需要返回标签相关数据(其实是另外的优化,因为企业和标签1:n想要一个sql写出来也可以通过聚合查询将同个企业标签聚合在一个字段,
但成本很高,且sql也阅读性也差,就先查出企业信息,返回之后再对企业查出对应标签),这是在未加条件的情况下,是不需要联合查询的那么通过iBatis的<if>标签来
动态拼接联合表,这样在没有条件下只需要扫描企业表22W+数据比20亿数据快几千倍。
mapper.xml
优化前
select dci.credit_no, dci.company_name, dci.company_type_name, dci.company_sub_type, dci.street_name, dci.industry_platform_name, dci.industry_type_name
from dwd_company_info dci
left join biz_key_enterprise_tag_relation bketr on dci.credit_no =bketr.credit_no and bketr.source_type=0
left join biz_key_enterprise_tag bket on bketr.tag_id = bket.id
<include refid="pageKeyEnterpriseBaseDataWhere"/>
group by dci.credit_no
优化后
select dci.credit_no, dci.company_name, dci.company_type_name, dci.company_sub_type, dci.street_name, dci.industry_platform_name, dci.industry_type_name
from dwd_company_info dci
<if test="(req.tagIds!= null and req.tagIds.size() > 0) or (req.categoryId!= null and req.categoryId!= '')">
left join biz_key_enterprise_tag_relation bketr on dci.credit_no =bketr.credit_no and bketr.source_type=0
left join biz_key_enterprise_tag bket on bketr.tag_id = bket.id
</if>
<include refid="pageKeyEnterpriseBaseDataWhere"/>
group by dci.credit_no
那么这时又有人问,那加了标签条件非索引字段那不还是要扫描20亿数据,(心想这小子没想到MySQL优化器的存在呀!得给他分析一下,顺便复习一下😄)
例子:筛选条件选择标签分类id=4,分享结果如下:
explain select dci.credit_no, dci.company_name, dci.company_type_name, dci.company_sub_type, dci.street_name, dci.industry_platform_name, dci.industry_type_name from dwd_company_info dci left join biz_key_enterprise_tag_relation bketr on dci.credit_no =bketr.credit_no and bketr.source_type=0 left join biz_key_enterprise_tag bket on bketr.tag_id = bket.id WHERE bket.category_id = 4 group by dci.credit_no;
+----+-------------+-------+------------+--------+--------------------------------+--------------------------------+---------+-------------------------------------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+--------------------------------+--------------------------------+---------+-------------------------------------+------+----------+------------------------------+
| 1 | SIMPLE | bketr | NULL | ALL | NULL | NULL | NULL | NULL |90000 | 10.00 | Using where; Using temporary |
| 1 | SIMPLE | bket | NULL | eq_ref | PRIMARY | PRIMARY | 8 | enterprise-workflow.bketr.tag_id | 1 | 10.00 | Using where |
| 1 | SIMPLE | dci | NULL | ref | dwd_company_info_credit_no_IDX | dwd_company_info_credit_no_IDX | 1023 | enterprise-workflow.bketr.credit_no | 1 | 100.00 | Using index condition |
+----+-------------+-------+------------+--------+--------------------------------+--------------------------------+---------+-------------------------------------+------+----------+------------------------------+
通过分析结果可以看出来MySQL优化器还是比较智能的,会先将关联表过滤一遍得到结果,和标签表id关联(命中主键索引)结果再和企业表关联(命中普通索引credit_no),总共才扫描到9000行
并没有想象中先联合后再过滤出数据,这是MySQL优化器判断出那种方式更优(并不是所有情况都能正确选择到最优解)
🤗 总结归纳
这两个场景都不是直接通过优化sql的手段来提高效率的,都是通过优化编码逻辑技巧优化,情景1:较少数据库请求次数,情景2:减少联合表+MySQL自身优化器。
场景2需要通过工具分析执行计划来论证优化内容。