前言
最近项目中有一个性能相关的优化工作,在这里把经验总结一下。
优化案例
基本需求
将外部系统的数据导入到本系统中,并在地图上显示出来
要根据给定的区域过滤掉不属于这个区域的数据
本区域还要再细化10几个小区域,要将这些雷电数据关联到这些小区域
进一步细化
采用CSV的格式导入
数据量最多为50w
没有明确的时效性要求,但可能是用户在那等着看
基本实现路线
由于有大量数据要在地图上显示,因此直接将数据从后台读过来,再手动刷到地图上是不太现实的,因此考虑前台加载geosever发布的wms服务,wms服务再关联到数据库存储,我们的数据导入到wms服务关联的表即可。
这里面主要可控的就是数据导入wms服务关联的表中的时间,因此下面大多是围绕这个目的展开。
原始思路
基本流程如下图所示
- 先保存原始数据
- 大区域过滤
- 计算数据的小区域归属,生成展示雷电
- 保存展示雷电
@PostMapping("/import")
public ResponseEntity<String> importOriginalThunderbolt(HttpServletRequest httpServletRequest) {
ResponseEntity<String> uploadResponseEntity = uploadOriginalThunderbolt(httpServletRequest);
if (uploadResponseEntity.getStatusCode() != HttpStatus.OK) {
return ResponseEntity.badRequest().body(uploadResponseEntity.getBody());
}
ResponseEntity<String> transformResponseEntity = transformThunderbolt();
if (transformResponseEntity.getStatusCode() != HttpStatus.OK) {
return ResponseEntity.badRequest().body("转换数据失败!");
}
return ResponseEntity.ok().body("导入数据成功!");
}
public ResponseEntity<String> transformThunderbolt() {
OriginalThunderbolt originalThunderboltCondition = new OriginalThunderbolt();
originalThunderboltCondition.setStatus("未同步");
thunderboltService.thunderBoltDataFilter(originalThunderboltService.findAllByCondition(originalThunderboltCondition));
return ResponseEntity.ok().body("转换数据成功!");
}
@Override
public void thunderBoltDataFilter(List<OriginalThunderbolt> originalThunderbolts) {
//删除以前的雷电数据
List<Thunderbolt> deleteThunderboltList = findAll(false);
for (Thunderbolt thunderbolt : deleteThunderboltList) {
thunderboltMapper.deleteById(thunderbolt.getId());
}
List<Point> points = new ArrayList<>();
List<OriginalThunderbolt> dmzThunderboltList = new ArrayList<>();
originalThunderbolts.forEach(th -> {
//雷点在区域内部
if (SpatialUtils.isInPolygon(th.getLongitude(), th.getLatitude(), findParasFromResource())) {
Point point = new Point(String.valueOf(th.getLongitude()), String.valueOf(th.getLatitude()));
dmzThunderboltList.add(th);
points.add(point);
}
});
//传入sim服务,进行雷点与林场进行匹配
List<OrganizationVO> matchedData = simOrgClient.matchAlarmPointWithOrgRegion(points);
//模型转换,插入数据库
for (int i = 0; i < dmzThunderboltList.size(); i++) {
Thunderbolt thunderbolt = new Thunderbolt();
thunderbolt.setId(SnowFlakeIdGenerator.getInstance().nextId());
thunderbolt.setCode(dmzThunderboltList.get(i).getCode());
if (matchedData.get(i) != null) {
thunderbolt.setOrgId(matchedData.get(i).getOrgId());
thunderbolt.setOrgName(matchedData.get(i).getOrgName());
}
thunderbolt.setCreateTime(System.currentTimeMillis());
thunderbolt.setUpdateTime(System.currentTimeMillis());
thunderbolt.setDeviation(dmzThunderboltList.get(i).getDeviation());
thunderbolt.setHeight(dmzThunderboltList.get(i).getHeight());
thunderbolt.setLatitude(dmzThunderboltList.get(i).getLatitude());
thunderbolt.setLongitude(dmzThunderboltList.get(i).getLongitude());
String geoString = thunderbolt.getGeometryText();
thunderbolt.setElementLocation(geoString);
thunderbolt.setLocationMode(dmzThunderboltList.get(i).getLocationMode());
thunderbolt.setLocatorNumber(dmzThunderboltList.get(i).getLocatorNumber());
thunderbolt.setStatus(dmzThunderboltList.get(i).getStatus());
thunderbolt.setSteepness(dmzThunderboltList.get(i).getSteepness());
thunderbolt.setStrength(dmzThunderboltList.get(i).getStrength());
thunderbolt.setTime(dmzThunderboltList.get(i).getTime());
thunderbolt.setType(dmzThunderboltList.get(i).getType());
thunderbolt.setDeleted(false);
thunderboltMapper.insertThunderbolt(thunderbolt);
}
//删除原始数据
for (OriginalThunderbolt originalThunderbolt : originalThunderbolts) {
originalThunderbolt.setStatus("同步");
originalThunderboltService.updateOriginalThunderbolt(originalThunderbolt);
}
}
批量插入优化
通过观察统计分析结果,发现大部分时间都是在执行SQL进性插入和修改数据,再分析一下源码竟然使用的是逐个处理的方式,大家都应该很清楚在数据库层面批量比逐个处理至少有数量级上的提升,所以批量走起来。
这时候我们又发现这里面有OOM的潜在风险,因此在这一步采用小批量的形式彻底排除OOM风险。
基本流程如下图所示
@Override
public void thunderBoltDataFilter(List<OriginalThunderbolt> originalThunderbolts) {
//删除以前的雷电数据
remove(new QueryWrapper<>());
List<Point> points = new ArrayList<>();
List<OriginalThunderbolt> dmzThunderboltList = new ArrayList<>();
originalThunderbolts.forEach(th -> {
//雷点在区域内部
if (SpatialUtils.isInPolygon(th.getLongitude(), th.getLatitude(), findParasFromResource())) {
Point point = new Point(String.valueOf(th.getLongitude()), String.valueOf(th.getLatitude()));
dmzThunderboltList.add(th);
points.add(point);
}
});
//传入sim服务,进行雷点与林场进行匹配
List<OrganizationVO> matchedData = simOrgClient.matchAlarmPointWithOrgRegion(points);
List<Thunderbolt> list = new ArrayList<>();
int batchSize = 500;
//模型转换,插入数据库
for (int i = 0; i < dmzThunderboltList.size(); i++) {
if (list.size() <= batchSize) {
Thunderbolt thunderbolt = new Thunderbolt();
thunderbolt.setId(SnowFlakeIdGenerator.getInstance().nextId());
thunderbolt.setCode(dmzThunderboltList.get(i).getCode());
if (matchedData.get(i) != null) {
thunderbolt.setForestFarmId(matchedData.get(i).getOrgId());
thunderbolt.setForestFarmName(matchedData.get(i).getOrgName());
thunderbolt.setForestryBureauId(matchedData.get(i).getParentOrgId());
thunderbolt.setForestryBureauName(matchedData.get(i).getParentOrgName());
}
thunderbolt.setProvinces(dmzThunderboltList.get(i).getProvinces());
thunderbolt.setCities(dmzThunderboltList.get(i).getCities());
thunderbolt.setCounties(dmzThunderboltList.get(i).getCounties());
thunderbolt.setCreateTime(System.currentTimeMillis());
thunderbolt.setUpdateTime(System.currentTimeMillis());
thunderbolt.setDeviation(dmzThunderboltList.get(i).getDeviation());
thunderbolt.setHeight(dmzThunderboltList.get(i).getHeight());
thunderbolt.setLatitude(dmzThunderboltList.get(i).getLatitude());
thunderbolt.setLongitude(dmzThunderboltList.get(i).getLongitude());
String geoString = thunderbolt.getGeometryText();
thunderbolt.setElementLocation(geoString);
thunderbolt.setLocationMode(dmzThunderboltList.get(i).getLocationMode());
thunderbolt.setLocatorNumber(dmzThunderboltList.get(i).getLocatorNumber());
thunderbolt.setStatus(dmzThunderboltList.get(i).getStatus());
thunderbolt.setSteepness(dmzThunderboltList.get(i).getSteepness());
thunderbolt.setStrength(dmzThunderboltList.get(i).getStrength());
thunderbolt.setTime(dmzThunderboltList.get(i).getTime());
thunderbolt.setType(dmzThunderboltList.get(i).getType());
thunderbolt.setDeleted(false);
list.add(thunderbolt);
} else {
insertBatchThunderbolt(list);
list.clear();
}
}
//删除原始数据
List<OriginalThunderbolt> originalThunderboltList = new ArrayList<>();
for (OriginalThunderbolt originalThunderbolt : originalThunderbolts) {
if (originalThunderboltList.size() <= batchSize) {
originalThunderbolt.setStatus("同步");
originalThunderboltList.add(originalThunderbolt);
} else {
originalThunderboltService.updateBatch(originalThunderboltList);
originalThunderboltList.clear();
}
}
}
减少非必要的操作
经过上面的处理后,2W条数据导入时间从10分钟降低到1分钟,但还是太慢了。于是再次分析源码,发现由于有CSV文件天然的可以作为备份使用,因此原始数据的录入完全没有必要,没有原始数据的插入也就可以节省原始数据的修改和清楚。
基本流程如下图所示
@Transactional
@Override
public void transformCSV(Path filePath) {
try {
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(filePath.toFile())));
CSVParser parser = CSVFormat.DEFAULT.parse(reader);
Iterator<CSVRecord> iterator = parser.iterator();
//跳过表头
if (iterator.hasNext()) {
iterator.next();
}
List<List<String>> csvList = new ArrayList<>();
long count = 0;
while (iterator.hasNext()) {
count++;
CSVRecord record = iterator.next();
//每行的内容
List<String> value = new ArrayList<>();
for (int j = 0; j < record.size(); j++) {
value.add(record.get(j));
}
if (csvList.size() >= BATCH_SIZE) {
count += thunderBoltDataFilter(csvList);
csvList.clear();
}
csvList.add(value);
}
count += thunderBoltDataFilter(csvList);
if (count != parser.getRecordNumber()) {
throw new ThunderTransformException("录入数据与上传数据不一致");
}
} catch (TransactionException e) {
flushDb();
log.error("数据处理错误", e);
} catch (FileNotFoundException e) {
log.error("临时文件未找到", e);
} catch (Exception e) {
log.error("未定义异常", e);
}
}
private void flushDb() {
remove(new QueryWrapper<>());
}
private List<OriginalThunderbolt> loadDataFromCSV(List<List<String>> csvList) {
List<OriginalThunderbolt> originalThunderboltList = new ArrayList<>();
for (List<String> csvString : csvList) {
OriginalThunderbolt originalThunderbolt = new OriginalThunderbolt();
originalThunderbolt.setId(SnowFlakeIdGenerator.getInstance().nextId());
originalThunderbolt.setCode(csvString.get(0));
if (StringUtils.isNotEmpty(csvString.get(1))) {
Long time = TimeUtils.stringToDateLong(csvString.get(1));
originalThunderbolt.setTime(time);
}
originalThunderbolt.setType(csvString.get(2));
originalThunderbolt.setHeight(Double.valueOf(csvString.get(3)));
originalThunderbolt.setStrength(Double.valueOf(csvString.get(4)));
originalThunderbolt.setLatitude(Double.valueOf(csvString.get(5)));
originalThunderbolt.setLongitude(Double.valueOf(csvString.get(6)));
originalThunderbolt.setProvinces(csvString.get(7));
originalThunderbolt.setCities(csvString.get(8));
originalThunderbolt.setCounties(csvString.get(9));
originalThunderbolt.setLocationMode(csvString.get(10));
originalThunderbolt.setSteepness(csvString.get(11));
originalThunderbolt.setDeviation(csvString.get(12));
originalThunderbolt.setLocatorNumber(csvString.get(13));
originalThunderbolt.setStatus("未同步");
originalThunderbolt.setCreateTime(System.currentTimeMillis());
originalThunderbolt.setUpdateTime(System.currentTimeMillis());
originalThunderbolt.setDeleted(false);
originalThunderboltList.add(originalThunderbolt);
}
return originalThunderboltList;
}
/**
* 导入
* @param httpServletRequest httpServletRequest
* @return 状态
*/
@PostMapping("/import")
public ResponseEntity<String> importOriginalThunderbolt(HttpServletRequest httpServletRequest) {
ResponseEntity<String> uploadResponseEntity = uploadOriginalThunderbolt(httpServletRequest);
if (uploadResponseEntity.getStatusCode() != HttpStatus.OK) {
return ResponseEntity.badRequest().body(uploadResponseEntity.getBody());
}
ResponseEntity<String> transformResponseEntity = transformThunderbolt(uploadResponseEntity.getBody());
if (transformResponseEntity.getStatusCode() != HttpStatus.OK) {
return ResponseEntity.badRequest().body("转换数据失败!");
}
return ResponseEntity.ok().body("导入数据成功!");
}
剩余选项
如果性能仍然无法满足用户需求可以考虑以下优化思路:
1.利用合理的数据结果在内存中进行初步筛选
2.扔掉数据库和geoserver,应用geotools或C库实现空间分析并优化前台显示逻辑,通过统计+层级限制的方式提升效果,甚至可以实现异步化加载
基本路线
- 明确要求和资源
- 建立测量基准
- 瓶颈定位
- 调优手段优先级
优先通用工程手段,再次算法和结构,再次处理逻辑,最后是极端参数优化
性能优化基本思路
- 不要过早优化性能
- 识别关键业务指标和影响因素
- 建立有效的测量基准和方法
- 充分利用20/80原则,解决核心瓶颈
- 了解不同路线的极限
- 以业务为导向,适当容许技术债务
PS一些个人经验
1.算法层面的优化是数量级优化,比如冒泡改快排
2.参数优化大多是同级别优化,数据库调优
3.工程技巧型有可能是数量级的优化,比如单条改批量,固态改SSD
4,处理流程的优化有较大可能做到数量级优化
5.丰富自己的 技术栈,切忌手里一把锤子看什么都是钉子,本文中的空间分析主要依赖了postgis再数据库层面上处理,但其实可以考虑采用geoserver的形式解决,甚至考虑用JNI/JNA调用更为高效的C语言函数库。