前边说了如何把较大的xml格式的数据(大约50G)进行解析,并导入到mongodb数据库中,此刻怎样更有效的对这么大的数据进行快速的匹配?
方法1:
首先从mongodb数据库A中拿出要进行匹配的所有字段,mongodb数据库B是要进行匹配的原始数据,把数据从mongodb数据库中拿出来后在相应的平台上运行,也就是处理平台是在第三者的平台上,对处理小数据库还行,如果对较大数据那就困难的多了,速度很慢,代码如下:
Logger monglogger = Logger.getLogger("org.mongodb.driver");
monglogger.setLevel(Level.SEVERE);
//获取集合NAME_中的匹配字段
Mongo mongo = new Mongo("localhost", port);
DB db = mongo.getDB("databaseName");
DBCollection dbCollection = db.getCollection("collectionName");
DBCursor dbCursor = dbCollection.find();
dbCursor.addOption(com.mongodb.Bytes.QUERYOPTION_NOTIMEOUT);
DBObject baDBObject = new BasicDBObject();
Iterator<DBObject> iterator = dbCursor.iterator();
List<String> list = new ArrayList<String>();
while(iterator.hasNext()){
baDBObject = iterator.next();
String cityName = baDBObject.get("NAME_1").toString();
String lowercityName = cityName.toLowerCase();
String trimcityName = lowercityName.replaceAll(" ", "");
//把NAME_1中所有的字段都放入list中
list.add(trimcityName);
System.out.println(trimcityName);
System.out.println("******************************");
}
DBCollection dbCollection1 = db.getCollection("testdata1");
DBCollection dbCollection2 = db.getCollection("resultdata");
DBCursor dbCursor1 = dbCollection1.find();
dbCursor1.addOption(com.mongodb.Bytes.QUERYOPTION_NOTIMEOUT);
DBObject baDbObject1 = new BasicDBObject();
Iterator<DBObject> iterator1 = dbCursor1.iterator();
List<DBObject> list1 = new ArrayList<DBObject>();
while(iterator1.hasNext()){
baDbObject1 = iterator1.next();
String titlecityName = baDbObject1.get("title").toString();
//System.out.println(titlecityName);
String trimtitlecityName = titlecityName.replaceAll(" ", "");
String lowerTitlcityName = trimtitlecityName.toLowerCase();
System.out.println(lowerTitlcityName);
System.out.println("字段匹配开始");
for(int i=0;i<list.size();i++){
if(lowerTitlcityName.equals(list.get(i))){
list1.add(baDbObject1);
break;
}
dbCollection2.insert(list1);
break;
}
System.out.println("字段匹配完成");
}
mongo.close();
}
上述情况运行比较慢,改进一下:对mongodb数据库B中的要匹配的数据建立索引,此处我对title字段建立的索引,因为我就是为了要匹配title字段
即把mongodb数据库A中的对应字段放到mongodb数据库B中,让mongodb数据匹配,就已经很快很快了,快的不是等级啊
<pre name="code" class="java">Logger monglogger = Logger.getLogger("org.mongodb.driver");
monglogger.setLevel(Level.SEVERE);
@SuppressWarnings("deprecation")
Mongo mongo = new Mongo("locaohost", port);
@SuppressWarnings("deprecation")
DB db = mongo.getDB("databaseName");
DBCollection dbCollection = db.getCollection("collectionName");
DBCursor dbCursor = dbCollection.find();
dbCursor.addOption(com.mongodb.Bytes.QUERYOPTION_NOTIMEOUT);
Iterator<DBObject> iterator = dbCursor.iterator();
DBObject dbObject = new BasicDBList();
List<String> list = new ArrayList<String>();
while(iterator.hasNext()){
dbObject = iterator.next();
String cityName = dbObject.get("NAME_1").toString();
list.add(cityName);
System.out.println(cityName);
}
DBCollection dbCollection3 = db.getCollection("resultdata");
for(int i=0;i<list.size();i++){
DBCollection dbCollection2 = db.getCollection("testdata1");
List<DBObject> list2 = new ArrayList<DBObject>();
System.out.println("字段匹配开始");
BasicDBObject query = new BasicDBObject();
//把list.get(i)中的cityName字段与testdata1中的title向匹配
DBObject objput = (DBObject) query.put("title", list.get(i).toString());
DBCursor dbCursor2 = dbCollection2.find(query);
//解决MongCursorNotFoundException
dbCursor2.addOption(com.mongodb.Bytes.QUERYOPTION_NOTIMEOUT);
while(dbCursor2.hasNext()){
objput = dbCursor2.next();
//System.out.println(objput);
list2.add(objput);
if(list2 !=null){
dbCollection3.insert(list2);
}
System.out.println("字段匹配结束");
}
}
mongo.close();
System.out.println("mongodb数据库已关闭,字段匹配已经结束,请添加其他cityName字段");
}