之前做的一个项目,数据库抽出了40多万条数据,然后从csv文件抽出了大概也是40多万条数据,进行对比分析 之前代码如下:
List<String> keys = new ArrayList<String>();
int isize = msTaiyousr.size();
for (int i=0;i<isize;i++) {
Map<String, Object> msyaiyousr = msTaiyousr.get(i);
String id = (String) msyaiyousr.get("taiyousrid");
String usrtorokukbn = (String) msyaiyousr.get("usrtorokukbn");
keys.add(usrtorokukbn+":"+id);
}
int jsize = wkTaiyousr.size();
for (int j=0;j<jsize;j++) {
Map<String, Object> wktaiyousr = wkTaiyousr.get(j);
String id = (String) wktaiyousr.get("taiyousrid");
String usrtorokukbn = (String) wktaiyousr.get("usrtorokukbn");
if (keys.contains(usrtorokukbn+":"+id)) {
updateList.add(wktaiyousr);
} else {
insertList.add(wktaiyousr);
}
}
由于 第二个for循环使用了 ArrayList的contains方法,跑完第二个for循环使用了 12分钟左右,我的个天,第一个循环不到1秒。然后使用了 HashSet 代替 ArrayList 代码如下:
Set<String> keys = new HashSet<String>();
int isize = msTaiyousr.size();
for (int i=0;i<isize;i++) {
Map<String, Object> msyaiyousr = msTaiyousr.get(i);
String id = (String) msyaiyousr.get("taiyousrid");
String usrtorokukbn = (String) msyaiyousr.get("usrtorokukbn");
keys.add(usrtorokukbn+":"+id);
}
int jsize = wkTaiyousr.size();
for (int j=0;j<jsize;j++) {
Map<String, Object> wktaiyousr = wkTaiyousr.get(j);
String id = (String) wktaiyousr.get("taiyousrid");
String usrtorokukbn = (String) wktaiyousr.get("usrtorokukbn");
if (keys.contains(usrtorokukbn+":"+id)) {
updateList.add(wktaiyousr);
} else {
insertList.add(wktaiyousr);
}
}
结果不到1秒,两个for循环瞬间跑完。果然大数据的时候还是不要用到ArrayList的contains方法,改用HashSet的吧。