Mysql批量插入分析

前言
最近发现几个项目中都有批次插入数据库的功能,每个项目中批次插入的写法有一些差别,所以本文打算对Mysql的批次插入做一个详细的分析。

准备
1.jdk1.7,mysql5.6.38
2.准备库和表

create database db3;
 
CREATE TABLE `travelrecord` (
  `id` int(11) NOT NULL,
  `name` varchar(255) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

测试与分析
下面准备几种插入的方式来分析优劣:
1.Statement插入方式

public class JDBCBatch2 {
 
    public static void main(String[] args) {
        String url = "jdbc:mysql://192.168.237.128:3306/db3";
        String username = "root";
        String password = "root";
        Connection con = null;
        try {
            con = DriverManager.getConnection(url, username, password);
            Statement statemenet = (Statement) con.createStatement();
            con.setAutoCommit(false);
            List<Travelrecord> list = new ArrayList<Travelrecord>();
            for (int j = 0; j < 100; j++) {
                list.add(new Travelrecord(j + 1, "hsfhsdhfsdhfhsdhfhsdfhsdhfhsdhfhsdhfhsd"));
            }
            long startTime = System.currentTimeMillis();
            for (Travelrecord t : list) {
                statemenet.execute("insert into travelrecord (id,name) values (" + t.getId() + ",\"" + t.getName() + "\")");
            }
            long endTime = System.currentTimeMillis();
            con.commit();
            System.out.println("times=" + (endTime - startTime));
        } catch (Exception se) {
            if (con != null) {
                try {
                    System.out.println("rollback");
                    con.rollback();
                } catch (SQLException e) {
                    System.err.println("rollback error");
                }
            }
            se.printStackTrace();
        }
    }
}

准备数据,然后通过Statement方式插入数据,插入10000条数据大概在6秒多左右,同时可以监控服务器数据包;
监控命令:

tcpdump -i any port 3306

日志如下:

23:51:06.590979 IP 192.168.237.1.54031 > 192.168.237.128.mysql: Flags [P.], seq 13601:13635, ack 9361, win 255, length 34
23:51:06.591056 IP 192.168.237.128.mysql > 192.168.237.1.54031: Flags [P.], seq 9361:9438, ack 13635, win 266, length 77
23:51:06.591238 IP 192.168.237.1.54031 > 192.168.237.128.mysql: Flags [P.], seq 13635:13728, ack 9438, win 254, length 92
23:51:06.591342 IP 192.168.237.128.mysql > 192.168.237.1.54031: Flags [P.], seq 9438:9449, ack 13728, win 266, length 11

以上截取了其中一条插入语句的数据包日志,详细的数据包可以通过如下命令监控:

tcpdump -i any port 3306 -X

详细日志:

23:55:36.791580 IP 192.168.237.1.54315 > 192.168.237.128.mysql: Flags [P.], seq 1418:1452, ack 913, win 253, length 34
    0x0000:  4500 004a 7d67 4000 8006 2173 c0a8 ed01  E..J}g@...!s....
    0x0010:  c0a8 ed80 d42b 0cea f108 592e 9348 0672  .....+....Y..H.r
    0x0020:  5018 00fd b524 0000 1e00 0000 0373 656c  P....$.......sel
    0x0030:  6563 7420 4040 7365 7373 696f 6e2e 7478  ect.@@session.tx
    0x0040:  5f72 6561 645f 6f6e 6c79 0000 0000 0000  _read_only......
    0x0050:  0000 0000 0000 0000 0000                 ..........
23:55:36.791897 IP 192.168.237.128.mysql > 192.168.237.1.54315: Flags [P.], seq 913:990, ack 1452, win 266, length 77
......
23:55:36.795176 IP 192.168.237.1.54315 > 192.168.237.128.mysql: Flags [P.], seq 1452:1544, ack 990, win 252, length 92
    0x0000:  4500 0084 7d69 4000 8006 2137 c0a8 ed01  E...}i@...!7....
    0x0010:  c0a8 ed80 d42b 0cea f108 5950 9348 06bf  .....+....YP.H..
    0x0020:  5018 00fc 1daa 0000 5800 0000 0369 6e73  P.......X....ins
    0x0030:  6572 7420 696e 746f 2074 7261 7665 6c72  ert.into.travelr
    0x0040:  6563 6f72 6420 2869 642c 6e61 6d65 2920  ecord.(id,name).
    0x0050:  7661 6c75 6573 2028 312c 2268 7366 6873  values.(1,"hsfhs
    0x0060:  6468 6673 6468 6668 7364 6866 6873 6466  dhfsdhfhsdhfhsdf
    0x0070:  6873 6468 6668 7364 6866 6873 6468 6668  hsdhfhsdhfhsdhfh
    0x0080:  7364 2229 0000 0000 0000 0000 0000 0000  sd")............
    0x0090:  0000 0000                           
23:55:36.796093 IP 192.168.237.128.mysql > 192.168.237.1.54315: Flags [P.], seq 990:1001, ack 1544, win 266, length 11

可以发现每个sql语句包前面都有一个select.@@session.tx_read_only包,这是因为mysql jdbc驱动设置useLocalSessionState=false,每一次都需要检测目标数据库isReadOnly的状态,
所以每次都发送select.@@session.tx_read_only包,可以设置useLocalSessionState=true使用连接对象本地的状态,可以修改url如下:

jdbc:mysql://192.168.237.128:3306/db3?useLocalSessionState=true

再次运行,观察日志:

00:11:32.342852 IP 192.168.237.1.54949 > 192.168.237.128.mysql: Flags [P.], seq 1857:1949, ack 957, win 252, length 92
00:11:32.343076 IP 192.168.237.128.mysql > 192.168.237.1.54949: Flags [P.], seq 957:968, ack 1949, win 266, length 11

日志中省掉了select.@@session.tx_read_only的过程,提升插入的性能,具体代码可以参考ConnectionImpl的isReadOnly方法:

public boolean isReadOnly(boolean useSessionStatus) throws SQLException {
        if (useSessionStatus && !this.isClosed && versionMeetsMinimum(5, 6, 5) && !getUseLocalSessionState()) {
            java.sql.Statement stmt = null;
            java.sql.ResultSet rs = null;
 
            try {
                try {
                    stmt = getMetadataSafeStatement();
 
                    rs = stmt.executeQuery("select @@session.tx_read_only");
                    if (rs.next()) {
                        return rs.getInt(1) != 0; // mysql has a habit of tri+ state booleans
                    }
                    ......

2.PreparedStatement方式

public class JDBCBatch {
 
    public static void main(String[] args) {
        String url = "jdbc:mysql://192.168.237.128:3306/db3";
        String username = "root";
        String password = "root";
        String sql = "insert into travelrecord (id,name) values (?,?)";
        Connection con = null;
        try {
            con = DriverManager.getConnection(url, username, password);
            PreparedStatement pstmt = con.prepareStatement(sql);
            con.setAutoCommit(false);
            List<Travelrecord> list = new ArrayList<Travelrecord>();
            for (int j = 0; j < 100; j++) {
                list.add(new Travelrecord(j + 1, "hsfhsdhfsdhfhsdhfhsdfhsdhfhsdhfhsdhfhsd"));
            }
 
            long startTime = System.currentTimeMillis();
            for (Travelrecord t : list) {
                pstmt.setInt(1, t.getId());
                pstmt.setString(2, t.getName());
                pstmt.addBatch();
            }
            pstmt.executeBatch();
            long endTime = System.currentTimeMillis();
            con.commit();
            System.out.println("times=" + (endTime - startTime));
        } catch (Exception se) {
            if (con != null) {
                try {
                    System.out.println("rollback");
                    con.rollback();
                } catch (SQLException e) {
                    System.err.println("rollback error");
                }
            }
            se.printStackTrace();
        }
    }
}

PreparedStatement比起Statement有很多优势,其中一条就是PreparedStatement比Statement更快,SQL语句会预编译在数据库系统中,执行计划同样会被缓存起来,它允许数据库做参数化查询。同样插入10000条数据,时间大概在5秒多左右,比起Statement有一定优势,但是不明显;PreparedStatement使用的是批次提交,速度不应该这么查,同样观察日志:

00:23:57.679444 IP 192.168.237.1.62510 > 192.168.237.128.mysql: Flags [P.], seq 2460:2494, ack 1694, win 255, length 34
00:23:57.679617 IP 192.168.237.128.mysql > 192.168.237.1.62510: Flags [P.], seq 1694:1771, ack 2494, win 266, length 77
00:23:57.680139 IP 192.168.237.1.62510 > 192.168.237.128.mysql: Flags [P.], seq 2494:2586, ack 1771, win 255, length 92
00:23:57.680349 IP 192.168.237.128.mysql > 192.168.237.1.62510: Flags [P.], seq 1771:1782, ack 2586, win 266, length 11

发现和Statement没有区别,一条语句对应了一个包,没有批次的效果,查看PreparedStatement的executeBatch方法,部分代码如下:

try {
    statementBegins();
    clearWarnings();
    if (!this.batchHasPlainStatements && this.connection.getRewriteBatchedStatements()) {           
        if (canRewriteAsMultiValueInsertAtSqlLevel()) {
            return executeBatchedInserts(batchTimeout);
        }   
        if (this.connection.versionMeetsMinimum(4, 1, 0) 
                    && !this.batchHasPlainStatements
                    && this.batchedArgs != null
                    && this.batchedArgs.size() > 3 /* cost of option setting rt-wise */) {
            return executePreparedBatchAsMultiStatement(batchTimeout);
        }
    }
    return executeBatchSerially(batchTimeout);
} finally {
    this.statementExecuting.set(false);         
    clearBatch();
}

其中大致逻辑就是如果canRewriteAsMultiValueInsertAtSqlLevel()为true,那么执行批次插入(executeBatchedInserts),否则执行串联插入(executeBatchSerially);具体可以通过url上添加参数rewriteBatchedStatements

jdbc:mysql://192.168.237.128:3306/db3?rewriteBatchedStatements=true

再次运行,插入10000条数据只需要100ms左右,观察日志:

00:37:35.489633 IP 192.168.237.1.63528 > 192.168.237.128.mysql: Flags [P.], seq 212694:263794, ack 1069, win 252, length 51100
00:37:35.489670 IP 192.168.237.128.mysql > 192.168.237.1.63528: Flags [.], ack 263794, win 2730, length 0
00:37:35.489716 IP 192.168.237.1.63528 > 192.168.237.128.mysql: Flags [.], seq 263794:279854, ack 1069, win 252, length 16060
00:37:35.489849 IP 192.168.237.128.mysql > 192.168.237.1.63528: Flags [.], ack 279854, win 2777, length 0

可以发现数据包不是原来的92个字节了,每个包的大小大幅度提升,具体分多少次提交,每次提交多少数据量,可以查看PreparedStatement的computeBatchSize方法:

protected int computeBatchSize(int numBatchedArgs) throws SQLException {
    synchronized (checkClosed().getConnectionMutex()) {
        long[] combinedValues = computeMaxParameterSetSizeAndBatchSize(numBatchedArgs);
         
        long maxSizeOfParameterSet = combinedValues[0];
        long sizeOfEntireBatch = combinedValues[1];
         
        int maxAllowedPacket = this.connection.getMaxAllowedPacket();
         
        if (sizeOfEntireBatch < maxAllowedPacket - this.originalSql.length()) {
            return numBatchedArgs;
        }
         
        return (int)Math.max(1, (maxAllowedPacket - this.originalSql.length()) / maxSizeOfParameterSet);
    }
}

此方法计算每次提交批量数据中的多少条数据,其中一个maxAllowedPacket参数,此参数在服务器端配置用来限制客户端每个包的最大字节数;
查询maxAllowedPacket:

mysql> show VARIABLES like '%max_allowed_packet%';
+--------------------------+------------+
| Variable_name            | Value      |
+--------------------------+------------+
| max_allowed_packet       | 4194304    |
| slave_max_allowed_packet | 1073741824 |
+--------------------------+------------+

设置maxAllowedPacket:

mysql> set global max_allowed_packet =1024*10;
Query OK, 0 rows affected, 1 warning (0.00 sec)

此方式可以很好的执行批量数据的插入,但是如果数据量很大,一下执行所有数据的批次插入,很容易造成客户端内存的溢出,所以也可以使用第三种方式;

3.PreparedStatement分批次方式
部分代码如下:

for (int i = 0; i < 10; i++) {
    for (int k = 0; k < 1000; k++) {
        pstmt.setInt(1, list.get(i * 1000 + k).getId());
        pstmt.setString(2, list.get(i * 1000 + k).getName());
        pstmt.addBatch();
    }
    pstmt.executeBatch();
}

同样是插入10000条数据,但是这种方式是,分10次批次插入数据,有效的控制了内存的消耗,可以做一个简单的实验;

设置启动参数

-Xms5000k -Xmx5000k

然后分别使用第二种方式和第三种方式插入10w条数据,第二种方式直接内存溢出,而第三种方式可以完整的将数据插入;当然分批次插入肯定比一次性插入速度慢,所以可以在内存和速度方面做一个简单的权衡。

总结
本文通过三种方式来插入数据,从而了解Mysql批次插入的过程,了解到useLocalSessionState和rewriteBatchedStatements参数对性能的影响,以及maxAllowedPacket对数据包的大小限制;最后建议要在内存和速度方面做一个权衡。

 

转载于:https://my.oschina.net/OutOfMemory/blog/1635273

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值