jdbc的batch模式

最新推荐文章于 2023-09-02 07:24:27 发布

lang20150928

最新推荐文章于 2023-09-02 07:24:27 发布

阅读量1.8k

点赞数 4

分类专栏： mysql 文章标签： java mysql 数据库

本文链接：https://blog.csdn.net/m0_37607945/article/details/106091115

版权

mysql 专栏收录该内容

61 篇文章 2 订阅

订阅专栏

当需要批量向数据库插入数据的时候，如果语句是一样的，那么采用批量操作的模式会让速度得到很大的提升，那么其中的原理是什么样的呢？
本文针对的是有一定Java基础的程序猿，需要了解的知识如下：

maven
JDBC基础
首先准备一个mysql数据库，执行如下脚本

SET FOREIGN_KEY_CHECKS=0;

-- ----------------------------
-- Table structure for user
-- ----------------------------
DROP TABLE IF EXISTS `user`;
CREATE TABLE `user` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'Id',
  `username` varchar(45) NOT NULL COMMENT '账号',
  `password` varchar(45) NOT NULL COMMENT '密码',
  `money` decimal(10,0) NOT NULL COMMENT '余额',
  `role` varchar(45) NOT NULL COMMENT '角色',
  `reg_time` datetime DEFAULT NULL COMMENT '注册时间',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8 COMMENT='用户信息表';

下载抓包工具wireshark,网址如下：
https://www.wireshark.org/download.html

考虑到wireshark抓包工具默认情况下不支持抓取本地访问本地数据库的包，需要将mysql数据库安装到虚拟机或远程服务器上才，否则抓包工具查看TCP信息比较麻烦。

可以参考博客：https://blog.csdn.net/weixin_30460489/article/details/99850381

然后搭建一个maven项目，此处不详细叙述，引入maven依赖

<dependency>
	<groupId>mysql</groupId>
	<artifactId>mysql-connector-java</artifactId>
	<version>5.1.42</version>
</dependency>
<dependency>
	<groupId>com.alibaba</groupId>
	<artifactId>druid</artifactId>
	<version>1.0.5</version>
</dependency>

创建一个java类

package com.example.durid.demo;

import java.math.BigDecimal;
import java.sql.Connection;
import java.sql.Date;
import java.sql.PreparedStatement;
import java.sql.SQLException;

import com.alibaba.druid.pool.DruidDataSource;

public class RawJdbcTestMain {

	private static final String cleanSql = "delete from user";

	private static final String insertSql = "insert into user (username, password, money, role, reg_time) values (?,?,?,?,?)";

	public static void main(String[] args) throws SQLException {
		DruidDataSource dataSource = new DruidDataSource();
		dataSource.setUrl("jdbc:mysql://192.168.99.100:3306/ms_user?useSSL=false");
		dataSource.setDriverClassName("com.mysql.jdbc.Driver");
		dataSource.setUsername("xir_trdj");
		dataSource.setPassword("xpar");
		Connection connection = dataSource.getConnection();
		try {
			// 执行清库
			PreparedStatement statement = connection.prepareStatement(cleanSql);
			statement.execute();
			// 批次执行
			PreparedStatement preparedStatement = connection.prepareStatement(insertSql);
			for (int i = 0; i < 10; i++) {
				preparedStatement.setString(1, "username-" + i);
				preparedStatement.setString(2, "password-" + i);
				preparedStatement.setBigDecimal(3, BigDecimal.valueOf(i));
				preparedStatement.setString(4, "single-" + i);
				preparedStatement.setDate(5, new Date(new java.util.Date().getTime()));
				preparedStatement.executeUpdate();
			}
			// 批次执行
			for (int i = 0; i < 10; i++) {
				preparedStatement.setString(1, "username-" + i);
				preparedStatement.setString(2, "password-" + i);
				preparedStatement.setBigDecimal(3, BigDecimal.valueOf(i));
				preparedStatement.setString(4, "batch-" + i);
				preparedStatement.setDate(5, new Date(new java.util.Date().getTime()));
				preparedStatement.addBatch();
			}
			preparedStatement.executeBatch();
			// 进行等待
			try {
				System.in.read();
			} catch (IOException e) {
				e.printStackTrace();
			}
		} finally {
			if (dataSource != null) {
				dataSource.close();
			}
		}
	}
}

打开wireshark抓包工具

在这里插入图片描述

进行接口配置

在软件启动的过程中，会查找所有有效的接口，选取对应的接口并配置
在这里插入图片描述

开始监控

在这里插入图片描述
点击开始之后的页面如下：

在应用显示过滤器中输入如下过滤语句,其中192.168.99.100为本机虚拟机的地址，而3306是mysql服务器的端口

ip.addr == 192.168.99.100 and tcp.port == 3306

在这里插入图片描述
运行上面的RawJdbcTestMain类，查看wireshark可以查看到TCP信息

分析一下抓到的数据

首先是TCP三次握手信息
数据库连接的认证和初始化

为了方便以后学习，将此过程中的执行语句汇总到下面

/* mysql-connector-java-5.1.42 ( Revision: 1f61b0b0270d9844b006572ba4e77f19c0f230d4 ) */SELECT  @@session.auto_increment_increment AS auto_increment_increment, @@character_set_client AS character_set_client, @@character_set_connection AS character_set_connection, @@character_set_results AS character_set_results, @@character_set_server AS character_set_server, @@init_connect AS init_connect, @@interactive_timeout AS interactive_timeout, @@license AS license, @@lower_case_table_names AS lower_case_table_names, @@max_allowed_packet AS max_allowed_packet, @@net_buffer_length AS net_buffer_length, @@net_write_timeout AS net_write_timeout, @@query_cache_size AS query_cache_size, @@query_cache_type AS query_cache_type, @@sql_mode AS sql_mode, @@system_time_zone AS system_time_zone, @@time_zone AS time_zone, @@tx_isolation AS tx_isolation, @@wait_timeout AS wait_timeout;
SET NAMES latin1;
SET character_set_results = NULL;
SET autocommit=1;
SET sql_mode='NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES';
SELECT @@session.tx_isolation;

以上语句最后的;号在原监控语句中不存在，为了sql脚本正确，本人添加

执行客户端请求

由于之前的测试存在问题，从这里开始博客是修改的内容。在对于mysql数据库要开启批量模式必须设置参数rewriteBatchedStatements。这次使用了另一台服务器进行测试，服务器的地址为191.168.1.14，数据库端口为3306.所以过滤参数为：ip.addr == 191.168.1.14 and tcp.port == 3306
在这里插入图片描述
所以修改了数据源的信息如下

dataSource.setUrl("jdbc:mysql://191.168.1.14:3306/ms_user?useSSL=false&rewriteBatchedStatements=true");
dataSource.setDriverClassName("com.mysql.jdbc.Driver");
dataSource.setUsername("xir_trdj");
dataSource.setPassword("xpar");

对应的mysql驱动版本为5.1.42。在executeUpdate（单次发送执行）和executeBatch（批次执行）处打上断点。
在这里插入图片描述
还有mysql驱动源码com.mysql.jdbc.PreparedStatement#executeBatchInternal打上断点

通过debug运行以上程序，在第一个断点处，此时会监控到一些TCP连接初始化信息以及删除信息。

继续debug，插入第一条数据。结果如下
在这里插入图片描述
接下来的多次单条执行结果都差不多，就是每次执行executeUpdate就会往数据库发送一条协议包。

接下来就是批次执行了。debug到相应的断点处，此时可以在属性batchedArgs中包含了10条信息，在批量模式下，不会直接将结果发送出去，而是先放到本地一个列表当中，具体参考源码com.mysql.jdbc.PreparedStatement#addBatch(java.lang.String)。
在这里插入图片描述
从这些列表对象中可以看出它们是公用的编译语句对象，另外还包含了参数。

此时还可以查看wireshark中的监控信息，说明此时没有往服务器发送任何信息。

继续debug项目，因为rewriteBatchedStatements设置为true，所以最后进入了executeBatchedInserts方法。
在这里插入图片描述
执行这个方法之后，就会进入finally块做一些善后工作，查看监控数据

从以上的监控结果，完全可以看出这批数据都是在一个MySQL协议包中发出的。

如果我们将rewriteBatchedStatements参数修改为false或者不设置（默认为false）会如何呢？首先单次执行还是与以前是一样的，其次在执行executeBatchInternal之前也是一样的，会将需要发送的语句都统一存放在本地的com.mysql.jdbc.StatementImpl#batchedArgs属性当中。此时会进入到executeBatchSerially方法当中。
在这里插入图片描述
在executeBatchSerially方法当中并不会一次性将缓存结果发送到服务器，而是一条一条的发送。

所以最后的批量模式也就变成了假的批量模式。

总结一下，对于MySQL，如果采用的是非批量模式，那么每次执行executeUpdate都会将数据包发送到服务器，而在批量模式下，要参考多个参数，在这里我们重点谈到了rewriteBatchedStatements参数。

批量模式下，rewriteBatchedStatements为false或者未设置，那么通过java.sql.PreparedStatement#addBatch方法会将执行的语句以及参数存储在本地的缓存列表当中，执行java.sql.Statement#executeBatch方法的时候，会一条一条的发送到服务器。
批量模式下，rewriteBatchedStatements为true，那么通过java.sql.PreparedStatement#addBatch方法会将执行的语句以及参数存储在本地的缓存列表当中，执行java.sql.Statement#executeBatch方法的时候，会将缓存中的数据一次性打包发送到服务器。

主要参考的源码如下

@Override
protected long[] executeBatchInternal() throws SQLException {
    synchronized (checkClosed().getConnectionMutex()) {

        if (this.connection.isReadOnly()) {
            throw new SQLException(Messages.getString("PreparedStatement.25") + Messages.getString("PreparedStatement.26"),
                    SQLError.SQL_STATE_ILLEGAL_ARGUMENT);
        }

        if (this.batchedArgs == null || this.batchedArgs.size() == 0) {
            return new long[0];
        }

        // we timeout the entire batch, not individual statements
        int batchTimeout = this.timeoutInMillis;
        this.timeoutInMillis = 0;

        resetCancelledState();

        try {
            statementBegins();

            clearWarnings();
			
			1. 参数batchHasPlainStatements=false与rewriteBatchedStatements=true
            if (!this.batchHasPlainStatements && this.connection.getRewriteBatchedStatements()) {

                2. com.mysql.jdbc.PreparedStatement.ParseInfo#canRewriteAsMultiValueInsert 
                if (canRewriteAsMultiValueInsertAtSqlLevel()) {
                    return executeBatchedInserts(batchTimeout);
                }
				
				3. 数据库版本 缓存数据条数
                if (this.connection.versionMeetsMinimum(4, 1, 0) && !this.batchHasPlainStatements && this.batchedArgs != null
                        && this.batchedArgs.size() > 3 /* cost of option setting rt-wise */) {
                    return executePreparedBatchAsMultiStatement(batchTimeout);
                }
            }

            return executeBatchSerially(batchTimeout);
        } finally {
            this.statementExecuting.set(false);

            clearBatch();
        }
    }
}

从这里还可以看出几个参数batchHasPlainStatements

/**
 * Does the batch (if any) contain "plain" statements added by
 * Statement.addBatch(String)?
 * 
 * If so, we can't re-write it to use multi-value or multi-queries.
 */
protected boolean batchHasPlainStatements = false;

这个参数默认为false，但是调用com.mysql.jdbc.PreparedStatement#addBatch(java.lang.String)方法会变成true，而我们前面使用的是java.sql.PreparedStatement#addBatch方法。

@Override
public void addBatch(String sql) throws SQLException {
    synchronized (checkClosed().getConnectionMutex()) {
        this.batchHasPlainStatements = true;

        super.addBatch(sql);
    }
}

当然了这个参数是可以重置为false的，通过方法com.mysql.jdbc.PreparedStatement#clearBatch，在上面的executeBatchInternal当中最后都会进行重置的。

@Override
public void clearBatch() throws SQLException {
    synchronized (checkClosed().getConnectionMutex()) {
        this.batchHasPlainStatements = false;

        super.clearBatch();
    }
}

即使以上两个条件为true，接下来还要看canRewriteAsMultiValueInsert参数，这个参数取决于

this.canRewriteAsMultiValueInsert = PreparedStatement.canRewrite(sql, this.isOnDuplicateKeyUpdate, this.locationOfOnDuplicateKeyUpdate,
                        this.statementStartPos) && !this.parametersInDuplicateKeyClause;

protected static boolean canRewrite(String sql, boolean isOnDuplicateKeyUpdate, int locationOfOnDuplicateKeyUpdate, int statementStartPos) {
    // Needs to be INSERT or REPLACE.
    // Can't have INSERT ... SELECT or INSERT ... ON DUPLICATE KEY UPDATE with an id=LAST_INSERT_ID(...).

    if (StringUtils.startsWithIgnoreCaseAndWs(sql, "INSERT", statementStartPos)) {
        if (StringUtils.indexOfIgnoreCase(statementStartPos, sql, "SELECT", "\"'`", "\"'`", StringUtils.SEARCH_MODE__MRK_COM_WS) != -1) {
            return false;
        }
        if (isOnDuplicateKeyUpdate) {
            int updateClausePos = StringUtils.indexOfIgnoreCase(locationOfOnDuplicateKeyUpdate, sql, " UPDATE ");
            if (updateClausePos != -1) {
                return StringUtils.indexOfIgnoreCase(updateClausePos, sql, "LAST_INSERT_ID", "\"'`", "\"'`", StringUtils.SEARCH_MODE__MRK_COM_WS) == -1;
            }
        }
        return true;
    }

    return StringUtils.startsWithIgnoreCaseAndWs(sql, "REPLACE", statementStartPos)
            && StringUtils.indexOfIgnoreCase(statementStartPos, sql, "SELECT", "\"'`", "\"'`", StringUtils.SEARCH_MODE__MRK_COM_WS) == -1;
}

如果是INSERT ... SELECT ...或者INSERT ... ON DUPLICATE KEY UPDATE并包含LAST_INSERT_ID的语句这个参数都会返回false。此时也不会进入到上面的executeBatchedInserts方法了，接下来会判断数据库的版本是不是高于4.1并且缓存的数据条数大于3，才会进入executePreparedBatchAsMultiStatement方法（Rewrites the already prepared statement into a multi-statement query of 'statementsPerBatch' values and executes the entire batch using this new statement），否则就会降级为一个一个的发送（与上面rewriteBatchedStatements不设置一样的情形）。

lang20150928

关注

4
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
3
评论
jdbc的batch模式

当需要批量向数据库插入数据的时候，如果语句是一样的，那么采用批量操作的模式会让速度得到非常的提升，那么其中的原理是什么样的呢？本人针对的是有一定Java基础的程序猿，需要了解的知识如下：mavenJDBC基础首先准备一个mysql数据库，执行如下脚本SET FOREIGN_KEY_CHECKS=0;-- ------------------------------ Table structure for user-- ----------------------------DROP
复制链接

扫一扫