mysql流式读取大数据量与批量插入数据分析

最新推荐文章于 2024-07-02 23:45:54 发布

置顶 eff666

最新推荐文章于 2024-07-02 23:45:54 发布

阅读量6.5k

点赞数 1

分类专栏：数据库文章标签： mysql 大数据 java

本文链接：https://blog.csdn.net/eff666/article/details/52432289

版权

数据库专栏收录该内容

26 篇文章 4 订阅

订阅专栏

1、流式读取
java从mysql读取大量数据，当结果从myql服务端返回后立即对其进行处理，这样应用就不需要大量内存来存储这个结果集。此时应该用流式读取。

PreparedStatement ps = connection.prepareStatement("select .. from ..", 
            ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY); 

/*
TYPE_FORWARD_ONLY和CONCUR_READ_ONLY是mysql 驱动的默认值，所以不指定也是可以的 比如： PreparedStatement ps = connection.prepareStatement("select .. from .."); 
*/

//可以修改jdbc url通过defaultFetchSize参数来设置，这样默认所以的返回结果都是通过流方式读取
ps.setFetchSize(Integer.MIN_VALUE); 
ResultSet rs = ps.executeQuery(); 

while (rs.next()) { 
　　System.out.println(rs.getString("fieldName")); 
}

/*
mysql判断是否开启流式读取结果的方法，有三个条件forward-only，read-only和fatch size是Integer.MIN_VALUE。我们可以看看它的源码：
/**
 * We only stream result sets when they are forward-only, read-only, and the
 * fetch size has been set to Integer.MIN_VALUE
 *
 * @return true if this result set should be streamed row at-a-time, rather
 * than read all at once.
 */
protected boolean createStreamingResultSet() {
    try {
        synchronized(checkClosed().getConnectionMutex()) {
            return ((this.resultSetType == java.sql.ResultSet.TYPE_FORWARD_ONLY)
                 && (this.resultSetConcurrency == java.sql.ResultSet.CONCUR_READ_ONLY) 
                 && (this.fetchSize == Integer.MIN_VALUE));
        }
    } catch (SQLException e) {
        // we can't break the interface, having this be no-op in case of error is ok

        return false;
    }
}
*/

2、批量写入
如果应用程序是一条一条的执行insert来写入数据，写入是很慢的。主要原因是单条写入时候需要应用于db之间大量的请求响应交互。每个请求都是一个独立的事务提交。这样网络延迟大的情况下多次请求会有大量的时间消耗的网络延迟上。第二个是由于每个事务db都会有刷新磁盘操作写事务日志，保证事务的持久性。由于每个事务只是写入一条数据所以磁盘io利用率不高，因为对于磁盘io是按块来的，所以连续写入大量数据效率更好。所以必须改成批量写入的方式，减少请求数与事务数。下面是批量插入的例子：

int batchSize = 1000;
PreparedStatement ps = connection.prepareStatement("insert into tb1 (c1,c2,c3...) values (?,?,?...)");

for (int i = 0; i < list.size(); i++) {

    ps.setXXX(list.get(i).getC1());
    ps.setYYY(list.get(i).getC2());
    ps.setZZZ(list.get(i).getC3());

    ps.addBatch();

    if ((i + 1) % batchSize == 0) {
        ps.executeBatch();
    }
}

if (list.size() % batchSize != 0) {
    ps.executeBatch();
}
//注意：jdbc连接串须加：rewriteBatchedStatements=true

上面代码示例是每1000条数据发送一次请求。mysql驱动内部在应用端会把多次addBatch()的参数合并成一条multi value的insert语句发送给db去执行。比如insert into tb1(c1,c2,c3) values (v1,v2,v3),(v4,v5,v6),(v7,v8,v9)…，这样可以比每条一个insert 明显少很多请求。减少了网络延迟消耗时间与磁盘io时间，从而提高了tps。

3、代码展示

public class TestInsert {

    public static void main(String[] args) throws SQLException {

        int batchSize = 1000;
        int insertCount = 1000;

        testDefault(batchSize, insertCount);

        testRewriteBatchedStatements(batchSize,insertCount);

    }

    //默认方式插入
    private static void testDefault(int batchSize, int insertCount) throws SQLException{  

        long start = System.currentTimeMillis();

        doBatchedInsert(batchSize, insertCount,"");

        long end = System.currentTimeMillis();
        System.out.println("default:" + (end -start) + "ms");
    }


    //批量插入
    private static void testRewriteBatchedStatements(int batchSize, int insertCount) throws SQLException {

        long start = System.currentTimeMillis();

        doBatchedInsert(batchSize, insertCount, "rewriteBatchedStatements=true");

        long end = System.currentTimeMillis();
        System.out.println("rewriteBatchedStatements:" + (end -start) + "ms");
    }


    private static void doBatchedInsert(int batchSize, int insertCount, String mysqlProperties) throws SQLException {
        DruidDataSource dataSource = new DruidDataSource();
        dataSource.setUrl("jdbc:mysql://ip:3306/test?" + mysqlProperties);
        dataSource.setUsername("name");
        dataSource.setPassword("password");

        dataSource.init();

        Connection connection = dataSource.getConnection();

        PreparedStatement preparedStatement = connection.prepareStatement("insert into Test (name,gmt_created,gmt_modified) values (?,now(),now())");

        for (int i = 0; i < insertCount; i++) {
            preparedStatement.setString(1, i+" ");
            preparedStatement.addBatch();
            if((i+1) % batchSize == 0) {
                preparedStatement.executeBatch();
            }
        }
        preparedStatement.executeBatch();

        connection.close();   
        dataSource.close();
    }

}