问题场景
来了个需求,给各业务线提供慢SQL查询提供一个工具,要求慢查询规则可定制,系统要实现SQL聚类及指标排序。本文着重对SQL聚类实现进行解析。
开发环境
idea+Java8+springboot2+perl+MySQL
percona-toolkit.ar.gz
分析
运维把慢SQL日志文件放在了ES集群里,需要每天定时抓取慢SQL进行聚类操作,调研发现2种方案:
- HanLP-面向生产环境的自然语言处理工具包,调用其下的repeatedBisection()算法进行聚类计算
- 采用percona-toolkit 下的pt-fingerprint命令计算SQL指纹后进行group by,考虑到实际应用中SQL指纹也比较长,所以对SQL指纹进行MD5计算,然后对MD5值进行group by效率更好
实验发现第一种方案准确率约为80%,时间很快,第二种方案准确率高达99.99%,时间上尚可接受。
方案一实现
引入maven依赖
<!-- https://mvnrepository.com/artifact/com.hankcs/hanlp -->
<dependency>
<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>portable-1.7.6</version>
</dependency>
示例
String[] array = {"select id from table where id = 1","select id from table where id = 99"};
ClusterAnalyzer<String> analyzer = new ClusterAnalyzer<>();
for (int i = 0; i < array.length; i++) {
analyzer.addDocument(String.valueOf(i), array[i]);
}
List<Set<String>> re = analyzer.repeatedBisection(1.0);
System.out.println(re.size());
for (Set<String> set1 : re) {
System.out.println(set1);
System.out.println(array[Integer.parseInt(set1.iterator().next())]);
}
方案二实现
工具安装
# 如果没有Perl环境需要安装perl-Data-Dumper
- yum install -y perl-Data-Dumper.x86_64
- tar -zxvf percona-toolkit.tar.gz -C /usr/local/
# 创建软链接,不需要忽略
- ln -s /usr/local/percona-toolkit-3.1.0/bin/pt-fingerprint /usr/local/bin/pt-fingerprint
示例:
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.io.FileDeleteStrategy;
import org.apache.commons.lang3.StringUtils;
import java.io.*;
import java.nio.charset.StandardCharsets;
/**
* 〈对SQL生成指纹信息〉
* https://www.percona.com/doc/percona-toolkit/LATEST/index.html
* @author yangyouxing
* @create 2020-02-26
* @since 1.0.0
*/
@Slf4j
public final class PtFingerUtil {
private PtFingerUtil(){}
/**
* 获取SQL指纹信息
* @param sql
* @return
*/
public static String getFinger(String sql) {
String res = null;
if (StringUtils.isNotBlank(sql)) {
String fileName = null;
long maxLength = 30000L;
if (sql.length() > maxLength) {
fileName = "/tmp/random.txt";
// 自行实现写文件
FileUtil.writeData2File(sql, fileName);
}
String[] cmdArray = {"/usr/local/bin/pt-fingerprint", "--query", sql};
Process pr = null;
InputStream is = null;
BufferedReader in = null;
try {
boolean exist = StringUtils.isNotBlank(fileName) && new File(fileName).exists();
if (exist) {
pr = Runtime.getRuntime().exec("/usr/local/bin/pt-fingerprint " + fileName);
} else {
pr = Runtime.getRuntime().exec(cmdArray);
}
is = pr.getInputStream();
in = new BufferedReader(new InputStreamReader(new BufferedInputStream(is), StandardCharsets.UTF_8));
StringBuilder result = new StringBuilder();
String line;
while ((line = in.readLine()) != null) {
result.append(line);
}
res = result.toString();
if (exist) {
FileDeleteStrategy.FORCE.delete(new File(fileName));
}
} catch (Exception e) {
log.error("生成SQL指纹出错, sql: {}, msg: {}", sql, e.getMessage());
} finally {
try {
if (pr != null) {
pr.waitFor();
}
} catch (InterruptedException e) {
log.error("生成SQL指纹出错-cmd被终止, sql: {}, msg: {}", sql, e.getMessage());
}
try {
if (is != null) {
is.close();
}
if (in != null) {
in.close();
}
} catch (IOException e) {
log.error("生成SQL指纹出错-流出错, sql: {}, msg: {}", sql, e.getMessage());
}
if (pr != null) {
pr.destroy();
}
}
}
return res;
}
}
MD5自行计算,入库进行group by sql_md5 即可。