有一个文本文件,按照某些字符进行分割,需要导入数据库。下面给出比较通用的方法。
public static void batchInsert(Connection connection, File file, String splitStr, int batchSize) throws Exception {
long start = System.currentTimeMillis();
String fileName = file.getName();
BufferedReader reader = new BufferedReader(new FileReader(file));
//保存文件名和表字段个数的关系
Integer numOfQuote = map.get(fileName);
String[] quotes = new String[numOfQuote];
Arrays.fill(quotes,"?");
String sql = "INSERT INTO `" + fileName.replace(".csv", "") + "` VALUES (" + Joiner.on(",").join(quotes) + ")";
System.out.println("sql:" + sql);
connection.setAutoCommit(false);
PreparedStatement cmd = connection.prepareStatement(sql);
int count = 0;
try {
String line = null;
while (StrUtil.isNotEmpty(line = reader.readLine())) {
//split的第二个参数设为-1,才能处理分割之后很多空串的情况,否则可能字段少了
//https://blog.csdn.net/wx1528159409/article/details/92796234
String[] split = line.split(splitStr, -1);
//不匹配的抛弃
if (split.length != numOfQuote) {
continue;
}
count++;
for (int j = 0; j < numOfQuote; j++) {
cmd.setString(j + 1, split[j]);
}
cmd.addBatch();
if (count % batchSize == 0) {
cmd.executeBatch();
cmd.clearBatch();
}
}
cmd.executeBatch();
cmd.clearBatch();
connection.commit();
} finally {
IoUtil.close(cmd);
IoUtil.close(reader);
}
long end = System.currentTimeMillis();
System.out.println("批量插入需要时间:" + (end - start));
}
JDBC连接URL字符串中需要新增一个参数:rewriteBatchedStatements=true。并保证5.1.13以上版本的驱动,才能实现高性能的批量插入。MySQL JDBC驱动在默认情况下会无视executeBatch()语句,把我们期望批量执行的一组sql语句拆散,一条一条地发给MySQL数据库,批量插入实际上是单条插入,直接造成较低的性能。