前言
最近项目中有一个需求,需要从一千多万的原始数据中,每条数据都是用户的办事数据,处理出事项与事项之间的关系
思路一
程序一
把数据表增加索引,提高查询效率
把数据排好序,测试查询速度
使用临时表保存排序好的数据
int size = groupByList.size();
int time = size/100000;//提交次数
// 开启事务
String sql111 = "START TRANSACTION;";
runner.execute(sql111);
for (int i = 1; i <= time; i++) {
StringBuilder insertSql1 = new StringBuilder("insert into jzts_item_temp (SPSXBH,HZSXBH,SBSJ,SQRZJHM) values ");
for (int j = (i-1)*100000; j < i * 100000; j++) {
insertSql1.append("(");
insertSql1.append("'").append((String) groupByList.get(j).get("SPSXBH")).append("'");
insertSql1.append(",");
insertSql1.append("'").append((String) groupByList.get(j).get("HZSXBH")).append("'");
insertSql1.append(",");
insertSql1.append("'").append((Timestamp) groupByList.get(j).get("SBSJ")).append("'");
insertSql1.append(",");
insertSql1.append("'").append((String) groupByList.get(j).get("SQRZJHM")).append("'");
insertSql1.append(")");
if(j != i * 100000 -1){
insertSql1.append(",");
}
}
runner.update(insertSql1.toString());
long end1 = System.currentTimeMillis();
logger.info("已处理{}0w条数据,花费 {} ms", i, (end1 - start));
}
// 提交事务
String commitSql = "commit;";
runner.execute(commitSql);
经过测试,处理插入100w条数据大概413秒,7分钟
理想情况下,推测1000w条数据大概要1个多小时能跑完。
程序二
4.在临时表的“事项编号”建索引,遍历临时表,根据事项分类,一个事项一个表,大概2000+个表,分别把包含后置事项的记录插入2000+个表中(看到这2000+个表也许解决崩溃了)
跑100w条数据大概210秒
QueryRunner runner = QueryRunnerUtils.getRunner("basicdb");
QueryRunner runnerB = QueryRunnerUtils.getRunner("thing2thing");
String sql = "select DISTINCT SPSXBH from jzts_item_temp";
Set<String> set = new HashSet<>(1000);
runner.query(sql,rs -> {
while (rs.next()){
set.add(rs.getString("SPSXBH"));
}
return null;
});
logger.info("所有事项为:{}",set);
// set.forEach(s -> {
for (String s : set) {
if(!"1100236002".equals(s)){
continue;
}
s = s.replaceAll(" ","");
StringBuilder checkSql = new StringBuilder();
checkSql.append("DROP TABLE IF EXISTS ").append("item_").append(s.replaceAll("-","_")).append(";"