一、前言
前一个项目中,涉及到了一次数据迁移,这次迁移需要从mongodb迁移到另一个mongodb实例上,两个源的数据结构是不一样的。涉及到增量和全量数据迁移,整体迁移数据量在5亿左右。本篇即讲理论,也讲实战,往下看↓!
二、迁移思路
通常的增量和全量迁移,思路基本一致:
- 在开启全量的时候,开始增量监听,记录下增量的主键id
- 当全量执行结束的时候,从新跑一边记录的增量主键id的记录,根据getbyId查询一下最新的记录,再upsert到新库中。
思路就是这么样的。
三、同步实战
全量同步
全量的操作是比较简单的,这里我们需要找到一个排序的键,然后一直从旧库不断的捞数据,改变数据个时候,再更新到新表中,我这里的主键是递增的,所以我会根据主键id来进行循环获取,开始的时候还会记录一下最大的主键,当我们执行到最大的主键的时候,全量就结束了。
我写了一个http接口,主动调用来全量同步数据。
/**
* 全量同步
*
* @return 是否成功
*/
@RequestMapping("/fullData")
public Boolean fullData() {
//获取主键最大值
Query query = new Query();
query.with(new Sort(Sort.Direction.DESC, "_id"));
query.limit(1);
UserCompleteCountDTO max = firstMongoTemplate.findOne(query, UserCompleteCountDTO.class);
Integer maxUserId = max.getUserId();
Integer step = 100;
Integer beginId = 1;
Integer totalCount = 0;
while (true) {
logger.info("beginId:{}", beginId);
Criteria criteria = new Criteria().where("_id").gte(beginId);
Query queryStep = new Query(criteria);
queryStep.limit(step).with(new Sort(new Sort.Order(Sort.Direction.ASC, "_id")));
List<UserCompleteCountDTO> historyDTOS = new ArrayList<>();
try {
historyDTOS = firstMongoTemplate.find(queryStep, UserCompleteCountDTO.class);
} catch (Exception e) {
List<UserCompleteCountIdDTO> historyIdDTOS = new ArrayList<>();
historyIdDTOS = firstMongoTemplate.find(queryStep, UserCompleteCountIdDTO.class);
if (!CollectionUtils.isEmpty(historyIdDTOS)) {
for (UserCompleteCountIdDTO idDTO : historyIdDTOS) {
int userId = idDTO.getUserId();
try {
Criteria criteriaE = new Criteria().where("_id").is(userId);
Query queryE = new Query(criteriaE);
UserCompleteCountDTO one = firstMongoTemplate.findOne(queryE, UserCompleteCountDTO.class);
if (null != one) {
historyDTOS.add(one);
}
} catch (Exception e1) {
logger.error("全量查询失败:id:{}", userId);
errorIdMapper.insert(userId);
}
}
}
}
totalCount = fullSync(historyDTOS, totalCount);
//判断全量是否完成
if ((CollectionUtils.isEmpty(historyDTOS) || historyDTOS.size() < step) && (beginId + step) >= maxUserId) {
logger.info("全量同步结束!");
break;
}
UserCompleteCountDTO last = historyDTOS.get(historyDTOS.size() - 1);
beginId = last.getUserId() + 1;
try {
Thread.sleep(5);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
return true;
}
private Integer fullSync(List<UserCompleteCountDTO> list, Integer totalCount) {
if (CollectionUtils.isEmpty(list)) {
return totalCount;
}
//同步数据库
List<DataDTO> insertDataList = new ArrayList<>();
for (UserCompleteCountDTO old : list) {
List<DataDTO> dataDTOS = coverDataDTOList(old);
//赋值
insertDataList.addAll(dataDTOS);
}
ExecutorService executor = Executors.newFixedThreadPool(20);
try {
if (!CollectionUtils.isEmpty(insertDataList)) {
List<List<DataDTO>> partition = Lists.partition(insertDataList, 100);
CountDownLatch countDownLatch = new CountDownLatch(partition.size());
for (List<DataDTO> partList : partition) {
ImportTask task = new ImportTask(partList, countDownLatch);
executor.execute(task);
}
countDownLatch.await();
totalCount = totalCount + list.size();
}
logger.info("totalCount:{}", totalCount);
} catch (Exception e) {
logger.error("批量插入数据失败");
} finally {
// 关闭线程池,释放资源
executor.shutdown();
}
return totalCount;
}
class ImportTask implements Runnable {
private List list;
private CountDownLatch countDownLatch;
public ImportTask(List data, CountDownLatch countDownLatch) {
this.list = data;
this.countDownLatch = countDownLatch;
}
@Override
public void run() {
if (null != list) {
// 业务逻辑,例如批量insert或者update
BulkOperations operations = secondMongoTemplate.bulkOps(BulkOperations.BulkMode.UNORDERED, "xxxxx");
operations.insert(list);
BulkWriteResult result = operations.execute();
}
// 发出线程任务完成的信号
countDownLatch.countDown();
}
}
增量同步
增量同步需要我们来监听mongodb的日志:oplog。
什么是oplog?
oplog用于存储mongodb的增删改 和一些系统命令,查询的不会记录。类似于mysql的binlog日志。
mongodb的副本同步就是利用oplog进行同步的。主节点接收请求操作,然后记录在oplog中,副本节点异步复制这些操作。
oplog存在哪里?
oplog在local库:
master/slave 架构下
local.oplog.$main;
replica sets 架构下:
local.oplog.rs
sharding 架构下,mongos下不能查看oplog,可到每一片去看。
mongodb 监听代码:
我这里是把监听到的主键id记录到mysql中,当全量完成后,再开启增量同步,读取mysql中数据,同步到新表中
package com.soybean.data.service;
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import com.mongodb.BasicDBObject;
import com.mongodb.CursorType;
import com.mongodb.MongoClient;
import com.mongodb.client.FindIterable;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoCursor;
import com.mongodb.client.MongoDatabase;
import com.soybean.data.mapper.IdMapper;
import com.soybean.data.util.MongoDBUtil;
import org.bson.BsonTimestamp;
import org.bson.Document;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.stereotype.Component;
import org.springframework.util.CollectionUtils;
import org.springframework.util.StringUtils;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
@Component
public class MongoDBOpLogService implements CommandLineRunner {
private static final Logger logger = LoggerFactory.getLogger(MongoDBOpLogService.class);
private static MongoClient mongoClient;
@Autowired
private IdMapper idMapper;
/**
* 服务启动记录增量数据到mysql
* @param strings
* @throws Exception
*/
@Override
public void run(String... strings) throws Exception {
initMongoClient();
//获取local库
MongoDatabase database = getDatabase("local");
//监控库oplog.$main
MongoCollection<Document> runoob = getCollection(database, "oplog.rs");
try {
//处理
dataProcessing(runoob);
} catch (Exception e) {
logger.error("error:", e);
}
}
private static void initMongoClient() {
try {
mongoClient = MongoDBUtil.initMongoHasUser();
} catch (IOException e) {
e.printStackTrace();
}
}
public static MongoDatabase getDatabase(String dataBase) {
if (!mongoClient.getDatabaseNames().contains(dataBase)) {
throw new RuntimeException(dataBase + " no exist !");
}
MongoDatabase mongoDatabase = mongoClient.getDatabase(dataBase);
return mongoDatabase;
}
/**
* 获取表对象
*
* @param mongoDatabase
* @param testCollection
* @return
*/
public static MongoCollection<Document> getCollection(MongoDatabase mongoDatabase, String testCollection) {
MongoCollection<Document> collection = null;
try {
//获取数据库dataBase下的集合collecTion,如果没有将自动创建
collection = mongoDatabase.getCollection(testCollection);
} catch (Exception e) {
throw new RuntimeException("获取" + mongoDatabase.getName() + "数据库下的" + testCollection + "集合 failed !" + e);
}
return collection;
}
/**
* 获取数据流处理标准化
*
* @param collection
* @throws InterruptedException
*/
public void dataProcessing(MongoCollection<Document> collection) throws InterruptedException {
//-1倒叙,初始化程序时,取最新的ts时间戳,监听mongodb日志,进行过滤,这里的ts尽量做到,服务停止时,存储到文件或者库,获取最新下标
FindIterable<Document> tsCursor = collection.find().sort(new BasicDBObject("$natural", -1)).limit(1);
Document tsDoc = tsCursor.first();
BsonTimestamp queryTs = (BsonTimestamp) tsDoc.get("ts");
try {
Integer index = 1;
List<Integer> batchIds = new ArrayList<>();
while (true) {
BasicDBObject query = new BasicDBObject("ts", new BasicDBObject("$gt", queryTs));
MongoCursor docCursor = collection.find(query)
.cursorType(CursorType.TailableAwait) //没有数据时阻塞休眠
.noCursorTimeout(true) //防止服务器在不活动时间(10分钟)后使空闲的游标超时。
.oplogReplay(true) //结合query条件,获取增量数据,这个参数比较难懂,见:https://docs.mongodb.com/manual/reference/command/find/index.html
.maxAwaitTime(1, TimeUnit.SECONDS) //设置此操作在服务器上的最大等待执行时间
.iterator();
while (docCursor.hasNext()) {
Document document = (Document) docCursor.next();
//更新查询时间戳
queryTs = (BsonTimestamp) document.get("ts");
String op = document.getString("op");
String database = document.getString("ns");
if (!"resourcebehaviorsystem.playCompleted".equalsIgnoreCase(database)) {
continue;
}
Document context = (Document) document.get("o");
Document where = null;
Integer id = null;
if (op.equals("u")) {
where = (Document) document.get("o2");
id = Integer.valueOf(String.valueOf(where.get("_id")));
if (context != null) {
context = (Document) context.get("$set");
}
}
if (op.equals("i")) {
if (context != null) {
id = Integer.valueOf(String.valueOf(context.get("_id")));
context = (Document) context.get("$set");
}
}
logger.info("操作时间戳:" + queryTs.getTime());
logger.info("操作类 型:" + op);
logger.info("数据库.集合:" + database);
logger.info("更新条件:" + JSON.toJSONString(where));
//logger.info("文档内容:" + JSON.toJSONString(context));
logger.info("文档_id:" + JSON.toJSONString(id));
//增量插入数据
if (index <= 10) {
if (StringUtils.isEmpty(id)){
continue;
}
batchIds.add(id);
index = index + 1;
batchIds = batchIds.stream().distinct().collect(Collectors.toList());
continue;
}
syncData(batchIds);
index = 1;
batchIds = new ArrayList<>();
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
public void syncData(List<Integer> ids) {
idMapper.batchInsert(ids);
}
/**
* 解析操作类型
*
* @param op
* @return
*/
private static String getEventType(String op) {
switch (op) {
case "i":
return "insert";
case "u":
return "update";
case "d":
return "delete";
default:
return "other";
}
}
/**
* 数据解析、格式封装,返回所有insert、update新数据,delete的老数据,做输出为逻辑删除,condition字段为空
*
* @return JSONObject
*/
private static JSONObject resultRow(Document document, JSONObject result, String eventType) {
JSONObject columns = new JSONObject();// 存放变化后的字段
result.put("columns", columns);
result.put("condition", new JSONObject()); // 条件
for (Map.Entry<String, Object> entry : document.entrySet()) {
if (entry.getKey().equalsIgnoreCase("_id")) {
columns.put(entry.getKey(), (entry.getValue()).toString());
continue;
}
columns.put(entry.getKey(), entry.getValue());
}
return result;
}
}
mongoClient类:
package com.soybean.data.util;
import com.mongodb.MongoClient;
import com.mongodb.MongoClientOptions;
import com.mongodb.MongoCredential;
import com.mongodb.ServerAddress;
import com.mongodb.WriteConcern;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;
public class MongoDBUtil {
private static MongoClient mongoClient;
private static Properties properties;
private static WriteConcern concern;
static {
try {
InputStream inputStream = MongoDBUtil.class.getClassLoader().getResourceAsStream("mongo-config.properties");
properties = new Properties();
properties.load(inputStream);
concern = new WriteConcern(Integer.parseInt(properties.getProperty("write")),
Integer.parseInt(properties.getProperty("writeTimeout")));
concern.withJournal(Boolean.valueOf(properties.getProperty("journal")));//读取journal参数值
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* 初始化,返回客户端
*/
public static MongoClient initMongoHasUser() throws IOException {
List<ServerAddress> adds = new ArrayList<>();
String[] address = properties.getProperty("hostConfString").split(":");//读取服务IP地址和端口号
ServerAddress serverAddress = new ServerAddress(address[0], Integer.valueOf(address[1]));
adds.add(serverAddress);
List<MongoCredential> credentials = new ArrayList<>();
MongoCredential mongoCredential = MongoCredential.createScramSha1Credential(
properties.getProperty("userName"),
properties.getProperty("useCollection"),
properties.getProperty("passWord").toCharArray());
credentials.add(mongoCredential);
MongoClientOptions options = MongoClientOptions.builder()
.connectionsPerHost(Integer.parseInt(properties.getProperty("connectionsPerHost")))
.connectTimeout(Integer.parseInt(properties.getProperty("connectTimeout")))
.cursorFinalizerEnabled(Boolean.valueOf(properties.getProperty("cursorFinalizerEnabled")))
.maxWaitTime(Integer.parseInt(properties.getProperty("maxWaitTime")))
.threadsAllowedToBlockForConnectionMultiplier(Integer.parseInt(properties
.getProperty("threadsAllowedToBlockForConnectionMultiplier")))
.socketTimeout(Integer.valueOf(properties.getProperty("socketTimeout")))
.socketKeepAlive(Boolean.valueOf(properties.getProperty("socketKeepAlive")))
.writeConcern(concern)
.build();
if (adds.size() > 1){
mongoClient = new MongoClient(adds, credentials, options);
}else {
mongoClient = new MongoClient(adds.get(0), credentials, options);
}
return mongoClient;
}
}
配置文件:
connectionsPerHost=10
connectTimeout=10000
cursorFinalizerEnabled=true
maxWaitTime=120000
threadsAllowedToBlockForConnectionMultiplier=5
readSecondary=false
socketTimeout=0
socketKeepAlive=false
write=0
writeTimeout=0
journal=false
hostConfString=2222222.2222.2222.222.222
userName=sssssssss
useCollection=admin
passWord=xxxxxxxxxxxx
增量同步开始:
/**
* 增量同步-带去重操作
*
* @return 是否成功
*/
@GetMapping("/increData")
public Boolean increData(@RequestParam(name = "id", required = true) Integer id,
@RequestParam(name = "endId", required = true) Integer endId) {
Integer step = 100;
Integer beginId = id;
while (true) {
logger.info("incre beginId:{}", beginId);
List<IdDTO> dtos = idMapper.getBatchIdsByIdAndSort(beginId, step);
if (!CollectionUtils.isEmpty(dtos)) {
IdDTO last = dtos.get(dtos.size() - 1);
beginId = last.getId() + 1;
List<Integer> tidList = dtos.stream().map(IdDTO::getTid).distinct().collect(Collectors.toList());
increSync(tidList);
if (last.getId() >= endId) {
logger.info("last.getId:{},endId:{}", last.getId(), endId);
logger.info("增量结束!beginId:{},endId:{}", id, endId);
break;
}
} else {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
return true;
}
private void increSync(List<Integer> tidList) {
if (CollectionUtils.isEmpty(tidList)) {
return;
}
List<UserCompleteCountDTO> historyDTOS = new ArrayList<>();
//查
try {
Criteria criteria = new Criteria().where("_id").in(tidList);
Query queryStep = new Query(criteria);
historyDTOS = firstMongoTemplate.find(queryStep, UserCompleteCountDTO.class);
} catch (Exception e) {
for (Integer eId : tidList) {
try {
Criteria criteriaE = new Criteria().where("_id").is(eId);
Query queryE = new Query(criteriaE);
UserCompleteCountDTO one = firstMongoTemplate.findOne(queryE, UserCompleteCountDTO.class);
if (null != one) {
historyDTOS.add(one);
}
} catch (Exception e1) {
logger.error("增量查询失败:id:{}", eId);
errorIdMapper.insert(eId);
}
}
}
//同步数据库
List<Pair<Query, Update>> upsertCondition = new ArrayList<>();
for (UserCompleteCountDTO old : historyDTOS) {
Date baseCreateTime = StringUtils.isEmpty(old.getCreateTime()) ? new Date() : old.getCreateTime();
Date baseUpdateTime = StringUtils.isEmpty(old.getUpdateTime()) ? new Date() : old.getCreateTime();
List<DataDTO> dataDTOS = coverDataDTOList(old);
if (!CollectionUtils.isEmpty(dataDTOS)) {
for (DataDTO dto : dataDTOS) {
String upsertKey = dto.getUpsertKey();
//赋值
Query query = new Query();
query.addCriteria(Criteria.where("upsertKey").is(upsertKey));
Update update = new Update();
update.set("upsertKey", dto.getUpsertKey());
Pair<Query, Update> pair = Pair.of(query, update);
upsertCondition.add(pair);
//WriteResult coreDataSync = secondMongoTemplate.upsert(query, update, "coreDataSync");
}
}
}
if (!CollectionUtils.isEmpty(upsertCondition)) {
List<List<Pair<Query, Update>>> upsertConditionSub = Lists.partition(upsertCondition, 5000);
for (int i = 0; i < upsertConditionSub.size(); i++) {
List<Pair<Query, Update>> pairs = upsertConditionSub.get(i);
BulkOperations operations = secondMongoTemplate.bulkOps(BulkOperations.BulkMode.UNORDERED, "xxxxxx");
operations.upsert(pairs);
BulkWriteResult result = operations.execute();
}
}
}
四、小结
理念很简单,实践很重要。可以动手实践一下,然后评论中讨论讨论~~