慢SQL查询工具后台实现

最新推荐文章于 2022-10-18 10:59:32 发布

sand_clock

最新推荐文章于 2022-10-18 10:59:32 发布

阅读量467

点赞数

分类专栏： java shell algorithm

本文链接：https://blog.csdn.net/sand_clock/article/details/104670766

版权

java 同时被 3 个专栏收录

39 篇文章 4 订阅

订阅专栏

shell

8 篇文章 0 订阅

订阅专栏

algorithm

4 篇文章 0 订阅

订阅专栏

问题场景

来了个需求，给各业务线提供慢SQL查询提供一个工具，要求慢查询规则可定制，系统要实现SQL聚类及指标排序。本文着重对SQL聚类实现进行解析。

开发环境

idea+Java8+springboot2+perl+MySQL
percona-toolkit.ar.gz

分析

运维把慢SQL日志文件放在了ES集群里，需要每天定时抓取慢SQL进行聚类操作，调研发现2种方案：

HanLP-面向生产环境的自然语言处理工具包,调用其下的repeatedBisection()算法进行聚类计算
采用percona-toolkit 下的pt-fingerprint命令计算SQL指纹后进行group by，考虑到实际应用中SQL指纹也比较长，所以对SQL指纹进行MD5计算，然后对MD5值进行group by效率更好

实验发现第一种方案准确率约为80%，时间很快，第二种方案准确率高达99.99%，时间上尚可接受。

方案一实现

引入maven依赖

<!-- https://mvnrepository.com/artifact/com.hankcs/hanlp -->
<dependency>
    <groupId>com.hankcs</groupId>
    <artifactId>hanlp</artifactId>
    <version>portable-1.7.6</version>
</dependency>

示例

String[] array = {"select id from table where id = 1","select id from table where id = 99"};
ClusterAnalyzer<String> analyzer = new ClusterAnalyzer<>();
for (int i = 0; i < array.length; i++) {
	analyzer.addDocument(String.valueOf(i), array[i]);
}
List<Set<String>> re = analyzer.repeatedBisection(1.0);
System.out.println(re.size());
for (Set<String> set1 : re) {
	System.out.println(set1);
	System.out.println(array[Integer.parseInt(set1.iterator().next())]);
}

方案二实现

工具安装

# 如果没有Perl环境需要安装perl-Data-Dumper
- yum install -y perl-Data-Dumper.x86_64
- tar -zxvf percona-toolkit.tar.gz -C /usr/local/
# 创建软链接，不需要忽略
- ln -s /usr/local/percona-toolkit-3.1.0/bin/pt-fingerprint /usr/local/bin/pt-fingerprint

示例：

import lombok.extern.slf4j.Slf4j;
import org.apache.commons.io.FileDeleteStrategy;
import org.apache.commons.lang3.StringUtils;

import java.io.*;
import java.nio.charset.StandardCharsets;

/**
 * 〈对SQL生成指纹信息〉
 *  https://www.percona.com/doc/percona-toolkit/LATEST/index.html
 * @author yangyouxing
 * @create 2020-02-26
 * @since 1.0.0
 */
@Slf4j
public final class PtFingerUtil {
	private PtFingerUtil(){}

	/**
	 * 获取SQL指纹信息
	 * @param sql
	 * @return
	 */
	public static String getFinger(String sql) {
		String res = null;
		if (StringUtils.isNotBlank(sql)) {
			String fileName = null;
			long maxLength = 30000L;
			if (sql.length() > maxLength) {
				fileName = "/tmp/random.txt";
				// 自行实现写文件
				FileUtil.writeData2File(sql, fileName);
			}
			String[] cmdArray = {"/usr/local/bin/pt-fingerprint", "--query", sql};
			Process pr = null;
			InputStream is = null;
			BufferedReader in = null;
			try {
				boolean exist = StringUtils.isNotBlank(fileName) && new File(fileName).exists();
				if (exist) {
					pr = Runtime.getRuntime().exec("/usr/local/bin/pt-fingerprint " + fileName);
				} else {
					pr = Runtime.getRuntime().exec(cmdArray);
				}
				is = pr.getInputStream();
				in = new BufferedReader(new InputStreamReader(new BufferedInputStream(is), StandardCharsets.UTF_8));
				StringBuilder result = new StringBuilder();
				String line;
				while ((line = in.readLine()) != null) {
					result.append(line);
				}
				res = result.toString();
				if (exist) {
					FileDeleteStrategy.FORCE.delete(new File(fileName));
				}
			} catch (Exception e) {
				log.error("生成SQL指纹出错, sql: {}, msg: {}", sql, e.getMessage());
			} finally {
				try {
					if (pr != null) {
						pr.waitFor();
					}
				} catch (InterruptedException e) {
					log.error("生成SQL指纹出错-cmd被终止, sql: {}, msg: {}", sql, e.getMessage());
				}
				try {
					if (is != null) {
						is.close();
					}
					if (in != null) {
						in.close();
					}
				} catch (IOException e) {
					log.error("生成SQL指纹出错-流出错, sql: {}, msg: {}", sql, e.getMessage());
				}
				if (pr != null) {
					pr.destroy();
				}
			}
		}
		return res;
	}
}

MD5自行计算，入库进行group by sql_md5 即可。

sand_clock

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
慢SQL查询工具后台实现

问题场景来了个需求，给各业务线提供慢SQL查询提供一个工具，要求慢查询规则可定制，系统要实现SQL聚类及指标排序。本文着重对SQL聚类实现进行解析。开发环境idea+Java8+springboot2+perl+MySQLpercona-toolkit.ar.gz分析运维把慢SQL日志文件放在了ES集群里，需要每天定时抓取慢SQL进行聚类操作，调研发现2种方案：HanLP-面向生产...
复制链接

扫一扫

专栏目录