mongodb监听oplog 全量+增量同步

一、前言

前一个项目中,涉及到了一次数据迁移,这次迁移需要从mongodb迁移到另一个mongodb实例上,两个源的数据结构是不一样的。涉及到增量和全量数据迁移,整体迁移数据量在5亿左右。本篇即讲理论,也讲实战,往下看↓!

二、迁移思路

通常的增量和全量迁移,思路基本一致:

  1. 在开启全量的时候,开始增量监听,记录下增量的主键id
  2. 当全量执行结束的时候,从新跑一边记录的增量主键id的记录,根据getbyId查询一下最新的记录,再upsert到新库中。
    在这里插入图片描述
    思路就是这么样的。

三、同步实战

全量同步

全量的操作是比较简单的,这里我们需要找到一个排序的键,然后一直从旧库不断的捞数据,改变数据个时候,再更新到新表中,我这里的主键是递增的,所以我会根据主键id来进行循环获取,开始的时候还会记录一下最大的主键,当我们执行到最大的主键的时候,全量就结束了。

我写了一个http接口,主动调用来全量同步数据。

    /**
     * 全量同步
     *
     * @return 是否成功
     */
    @RequestMapping("/fullData")
    public Boolean fullData() {
    	//获取主键最大值
        Query query = new Query();
        query.with(new Sort(Sort.Direction.DESC, "_id"));
        query.limit(1);
        UserCompleteCountDTO max = firstMongoTemplate.findOne(query, UserCompleteCountDTO.class);
        Integer maxUserId = max.getUserId();

        Integer step = 100;
        Integer beginId = 1;
        Integer totalCount = 0;
        while (true) {
            logger.info("beginId:{}", beginId);
            Criteria criteria = new Criteria().where("_id").gte(beginId);
            Query queryStep = new Query(criteria);
            queryStep.limit(step).with(new Sort(new Sort.Order(Sort.Direction.ASC, "_id")));
            List<UserCompleteCountDTO> historyDTOS = new ArrayList<>();
            try {
                historyDTOS = firstMongoTemplate.find(queryStep, UserCompleteCountDTO.class);
            } catch (Exception e) {
                List<UserCompleteCountIdDTO> historyIdDTOS = new ArrayList<>();
                historyIdDTOS = firstMongoTemplate.find(queryStep, UserCompleteCountIdDTO.class);
                if (!CollectionUtils.isEmpty(historyIdDTOS)) {
                    for (UserCompleteCountIdDTO idDTO : historyIdDTOS) {
                        int userId = idDTO.getUserId();
                        try {
                            Criteria criteriaE = new Criteria().where("_id").is(userId);
                            Query queryE = new Query(criteriaE);
                            UserCompleteCountDTO one = firstMongoTemplate.findOne(queryE, UserCompleteCountDTO.class);
                            if (null != one) {
                                historyDTOS.add(one);
                            }
                        } catch (Exception e1) {
                            logger.error("全量查询失败:id:{}", userId);
                            errorIdMapper.insert(userId);
                        }
                    }
                }
            }
            totalCount = fullSync(historyDTOS, totalCount);
            //判断全量是否完成
            if ((CollectionUtils.isEmpty(historyDTOS) || historyDTOS.size() < step) && (beginId + step) >= maxUserId) {
                logger.info("全量同步结束!");
                break;
            }
            UserCompleteCountDTO last = historyDTOS.get(historyDTOS.size() - 1);
            beginId = last.getUserId() + 1;

            try {
                Thread.sleep(5);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
        return true;
    }

 private Integer fullSync(List<UserCompleteCountDTO> list, Integer totalCount) {
        if (CollectionUtils.isEmpty(list)) {
            return totalCount;
        }

        //同步数据库
        List<DataDTO> insertDataList = new ArrayList<>();
        for (UserCompleteCountDTO old : list) {
            List<DataDTO> dataDTOS = coverDataDTOList(old);
            //赋值
            insertDataList.addAll(dataDTOS);
        }
        ExecutorService executor = Executors.newFixedThreadPool(20);

        try {
            if (!CollectionUtils.isEmpty(insertDataList)) {
                List<List<DataDTO>> partition = Lists.partition(insertDataList, 100);
                CountDownLatch countDownLatch = new CountDownLatch(partition.size());
                for (List<DataDTO> partList : partition) {
                    ImportTask task = new ImportTask(partList, countDownLatch);
                    executor.execute(task);
                }
                countDownLatch.await();
                totalCount = totalCount + list.size();
            }
            logger.info("totalCount:{}", totalCount);
        } catch (Exception e) {
            logger.error("批量插入数据失败");
        } finally {
            // 关闭线程池,释放资源
            executor.shutdown();
        }

        return totalCount;
    }


    class ImportTask implements Runnable {
        private List list;
        private CountDownLatch countDownLatch;

        public ImportTask(List data, CountDownLatch countDownLatch) {
            this.list = data;
            this.countDownLatch = countDownLatch;
        }

        @Override
        public void run() {
            if (null != list) {
                // 业务逻辑,例如批量insert或者update
                BulkOperations operations = secondMongoTemplate.bulkOps(BulkOperations.BulkMode.UNORDERED, "xxxxx");
                operations.insert(list);
                BulkWriteResult result = operations.execute();
            }
            // 发出线程任务完成的信号
            countDownLatch.countDown();
        }
    }

增量同步

增量同步需要我们来监听mongodb的日志:oplog。

什么是oplog?
oplog用于存储mongodb的增删改 和一些系统命令,查询的不会记录。类似于mysql的binlog日志。
mongodb的副本同步就是利用oplog进行同步的。主节点接收请求操作,然后记录在oplog中,副本节点异步复制这些操作。

在这里插入图片描述

oplog存在哪里?

oplog在local库:
master/slave 架构下
local.oplog.$main;
replica sets 架构下:
local.oplog.rs
sharding 架构下,mongos下不能查看oplog,可到每一片去看。

mongodb 监听代码:
我这里是把监听到的主键id记录到mysql中,当全量完成后,再开启增量同步,读取mysql中数据,同步到新表中

package com.soybean.data.service;

import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import com.mongodb.BasicDBObject;
import com.mongodb.CursorType;
import com.mongodb.MongoClient;
import com.mongodb.client.FindIterable;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoCursor;
import com.mongodb.client.MongoDatabase;
import com.soybean.data.mapper.IdMapper;
import com.soybean.data.util.MongoDBUtil;
import org.bson.BsonTimestamp;
import org.bson.Document;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.stereotype.Component;
import org.springframework.util.CollectionUtils;
import org.springframework.util.StringUtils;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;

@Component
public class MongoDBOpLogService implements CommandLineRunner {

    private static final Logger logger = LoggerFactory.getLogger(MongoDBOpLogService.class);

    private static MongoClient mongoClient;

    @Autowired
    private IdMapper idMapper;

    /**
     * 服务启动记录增量数据到mysql
     * @param strings
     * @throws Exception
     */
    @Override
    public void run(String... strings) throws Exception {
        initMongoClient();
        //获取local库
        MongoDatabase database = getDatabase("local");
        //监控库oplog.$main
        MongoCollection<Document> runoob = getCollection(database, "oplog.rs");
        try {
            //处理
            dataProcessing(runoob);
        } catch (Exception e) {
            logger.error("error:", e);
        }

    }


    private static void initMongoClient() {
        try {
            mongoClient = MongoDBUtil.initMongoHasUser();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public static MongoDatabase getDatabase(String dataBase) {
        if (!mongoClient.getDatabaseNames().contains(dataBase)) {
            throw new RuntimeException(dataBase + " no exist !");
        }
        MongoDatabase mongoDatabase = mongoClient.getDatabase(dataBase);
        return mongoDatabase;
    }

    /**
     * 获取表对象
     *
     * @param mongoDatabase
     * @param testCollection
     * @return
     */
    public static MongoCollection<Document> getCollection(MongoDatabase mongoDatabase, String testCollection) {
        MongoCollection<Document> collection = null;
        try {
            //获取数据库dataBase下的集合collecTion,如果没有将自动创建
            collection = mongoDatabase.getCollection(testCollection);
        } catch (Exception e) {
            throw new RuntimeException("获取" + mongoDatabase.getName() + "数据库下的" + testCollection + "集合 failed !" + e);
        }
        return collection;
    }


    /**
     * 获取数据流处理标准化
     *
     * @param collection
     * @throws InterruptedException
     */
    public void dataProcessing(MongoCollection<Document> collection) throws InterruptedException {
        //-1倒叙,初始化程序时,取最新的ts时间戳,监听mongodb日志,进行过滤,这里的ts尽量做到,服务停止时,存储到文件或者库,获取最新下标

        FindIterable<Document> tsCursor = collection.find().sort(new BasicDBObject("$natural", -1)).limit(1);
        Document tsDoc = tsCursor.first();
        BsonTimestamp queryTs = (BsonTimestamp) tsDoc.get("ts");
        try {
            Integer index = 1;
            List<Integer> batchIds = new ArrayList<>();
            while (true) {
                BasicDBObject query = new BasicDBObject("ts", new BasicDBObject("$gt", queryTs));
                MongoCursor docCursor = collection.find(query)
                        .cursorType(CursorType.TailableAwait) //没有数据时阻塞休眠
                        .noCursorTimeout(true) //防止服务器在不活动时间(10分钟)后使空闲的游标超时。
                        .oplogReplay(true) //结合query条件,获取增量数据,这个参数比较难懂,见:https://docs.mongodb.com/manual/reference/command/find/index.html
                        .maxAwaitTime(1, TimeUnit.SECONDS) //设置此操作在服务器上的最大等待执行时间
                        .iterator();

                while (docCursor.hasNext()) {
                    Document document = (Document) docCursor.next();
                    //更新查询时间戳
                    queryTs = (BsonTimestamp) document.get("ts");
                    String op = document.getString("op");
                    String database = document.getString("ns");
                    if (!"resourcebehaviorsystem.playCompleted".equalsIgnoreCase(database)) {
                        continue;
                    }

                    Document context = (Document) document.get("o");
                    Document where = null;
                    Integer id = null;
                    if (op.equals("u")) {
                        where = (Document) document.get("o2");
                        id = Integer.valueOf(String.valueOf(where.get("_id")));
                        if (context != null) {
                            context = (Document) context.get("$set");
                        }
                    }

                    if (op.equals("i")) {
                        if (context != null) {
                            id = Integer.valueOf(String.valueOf(context.get("_id")));
                            context = (Document) context.get("$set");
                        }
                    }
                    logger.info("操作时间戳:" + queryTs.getTime());
                    logger.info("操作类  型:" + op);
                    logger.info("数据库.集合:" + database);
                    logger.info("更新条件:" + JSON.toJSONString(where));
                    //logger.info("文档内容:" + JSON.toJSONString(context));
                    logger.info("文档_id:" + JSON.toJSONString(id));
                    //增量插入数据
                    if (index <= 10) {
                        if (StringUtils.isEmpty(id)){
                            continue;
                        }
                        batchIds.add(id);
                        index = index + 1;
                        batchIds = batchIds.stream().distinct().collect(Collectors.toList());
                        continue;
                    }
                    syncData(batchIds);
                    index = 1;
                    batchIds = new ArrayList<>();
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }


    public void syncData(List<Integer> ids) {
        idMapper.batchInsert(ids);
    }


    /**
     * 解析操作类型
     *
     * @param op
     * @return
     */
    private static String getEventType(String op) {
        switch (op) {
            case "i":
                return "insert";
            case "u":
                return "update";
            case "d":
                return "delete";
            default:
                return "other";
        }
    }

    /**
     * 数据解析、格式封装,返回所有insert、update新数据,delete的老数据,做输出为逻辑删除,condition字段为空
     *
     * @return JSONObject
     */
    private static JSONObject resultRow(Document document, JSONObject result, String eventType) {
        JSONObject columns = new JSONObject();// 存放变化后的字段
        result.put("columns", columns);
        result.put("condition", new JSONObject()); // 条件
        for (Map.Entry<String, Object> entry : document.entrySet()) {
            if (entry.getKey().equalsIgnoreCase("_id")) {
                columns.put(entry.getKey(), (entry.getValue()).toString());
                continue;
            }
            columns.put(entry.getKey(), entry.getValue());
        }
        return result;
    }
}

mongoClient类:

package com.soybean.data.util;


import com.mongodb.MongoClient;
import com.mongodb.MongoClientOptions;
import com.mongodb.MongoCredential;
import com.mongodb.ServerAddress;
import com.mongodb.WriteConcern;

import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;


public class MongoDBUtil {
    private static MongoClient mongoClient;
    private static Properties properties;
    private static WriteConcern concern;
    static {
        try {
            InputStream inputStream = MongoDBUtil.class.getClassLoader().getResourceAsStream("mongo-config.properties");
            properties = new Properties();
            properties.load(inputStream);
            concern = new WriteConcern(Integer.parseInt(properties.getProperty("write")),
                    Integer.parseInt(properties.getProperty("writeTimeout")));
            concern.withJournal(Boolean.valueOf(properties.getProperty("journal")));//读取journal参数值

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    /**
     * 初始化,返回客户端
     */
    public static MongoClient initMongoHasUser() throws IOException {
        List<ServerAddress> adds = new ArrayList<>();
        String[] address = properties.getProperty("hostConfString").split(":");//读取服务IP地址和端口号
        ServerAddress serverAddress = new ServerAddress(address[0], Integer.valueOf(address[1]));
        adds.add(serverAddress);
        List<MongoCredential> credentials = new ArrayList<>();
        MongoCredential mongoCredential = MongoCredential.createScramSha1Credential(
                properties.getProperty("userName"),
                properties.getProperty("useCollection"),
                properties.getProperty("passWord").toCharArray());
        credentials.add(mongoCredential);
        MongoClientOptions options = MongoClientOptions.builder()
                .connectionsPerHost(Integer.parseInt(properties.getProperty("connectionsPerHost")))
                .connectTimeout(Integer.parseInt(properties.getProperty("connectTimeout")))
                .cursorFinalizerEnabled(Boolean.valueOf(properties.getProperty("cursorFinalizerEnabled")))
                .maxWaitTime(Integer.parseInt(properties.getProperty("maxWaitTime")))
                .threadsAllowedToBlockForConnectionMultiplier(Integer.parseInt(properties
                        .getProperty("threadsAllowedToBlockForConnectionMultiplier")))
                .socketTimeout(Integer.valueOf(properties.getProperty("socketTimeout")))
                .socketKeepAlive(Boolean.valueOf(properties.getProperty("socketKeepAlive")))
                .writeConcern(concern)
                .build();
        if (adds.size() > 1){
            mongoClient = new MongoClient(adds, credentials, options);
        }else {
            mongoClient = new MongoClient(adds.get(0), credentials, options);
        }
        return mongoClient;
    }
}

配置文件:

connectionsPerHost=10
connectTimeout=10000
cursorFinalizerEnabled=true
maxWaitTime=120000
threadsAllowedToBlockForConnectionMultiplier=5
readSecondary=false
socketTimeout=0
socketKeepAlive=false
write=0
writeTimeout=0
journal=false
hostConfString=2222222.2222.2222.222.222
userName=sssssssss
useCollection=admin
passWord=xxxxxxxxxxxx

增量同步开始:

    /**
     * 增量同步-带去重操作
     *
     * @return 是否成功
     */
    @GetMapping("/increData")
    public Boolean increData(@RequestParam(name = "id", required = true) Integer id,
                             @RequestParam(name = "endId", required = true) Integer endId) {
        Integer step = 100;
        Integer beginId = id;

        while (true) {
            logger.info("incre beginId:{}", beginId);
            List<IdDTO> dtos = idMapper.getBatchIdsByIdAndSort(beginId, step);
            if (!CollectionUtils.isEmpty(dtos)) {
                IdDTO last = dtos.get(dtos.size() - 1);
                beginId = last.getId() + 1;
                List<Integer> tidList = dtos.stream().map(IdDTO::getTid).distinct().collect(Collectors.toList());
                increSync(tidList);
                if (last.getId() >= endId) {
                    logger.info("last.getId:{},endId:{}", last.getId(), endId);
                    logger.info("增量结束!beginId:{},endId:{}", id, endId);
                    break;
                }
            } else {
                try {
                    Thread.sleep(1000);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        }

        return true;
    }

    private void increSync(List<Integer> tidList) {
        if (CollectionUtils.isEmpty(tidList)) {
            return;
        }
        List<UserCompleteCountDTO> historyDTOS = new ArrayList<>();
        //查
        try {
            Criteria criteria = new Criteria().where("_id").in(tidList);
            Query queryStep = new Query(criteria);
            historyDTOS = firstMongoTemplate.find(queryStep, UserCompleteCountDTO.class);
        } catch (Exception e) {
            for (Integer eId : tidList) {
                try {
                    Criteria criteriaE = new Criteria().where("_id").is(eId);
                    Query queryE = new Query(criteriaE);
                    UserCompleteCountDTO one = firstMongoTemplate.findOne(queryE, UserCompleteCountDTO.class);
                    if (null != one) {
                        historyDTOS.add(one);
                    }
                } catch (Exception e1) {
                    logger.error("增量查询失败:id:{}", eId);
                    errorIdMapper.insert(eId);
                }
            }
        }

        //同步数据库
        List<Pair<Query, Update>> upsertCondition = new ArrayList<>();

        for (UserCompleteCountDTO old : historyDTOS) {
            Date baseCreateTime = StringUtils.isEmpty(old.getCreateTime()) ? new Date() : old.getCreateTime();
            Date baseUpdateTime = StringUtils.isEmpty(old.getUpdateTime()) ? new Date() : old.getCreateTime();
            List<DataDTO> dataDTOS = coverDataDTOList(old);

            if (!CollectionUtils.isEmpty(dataDTOS)) {

                for (DataDTO dto : dataDTOS) {
                    String upsertKey = dto.getUpsertKey();
                    //赋值
                    Query query = new Query();
                    query.addCriteria(Criteria.where("upsertKey").is(upsertKey));
                    Update update = new Update();

                    update.set("upsertKey", dto.getUpsertKey());
                   
                    Pair<Query, Update> pair = Pair.of(query, update);
                    upsertCondition.add(pair);
                    //WriteResult coreDataSync = secondMongoTemplate.upsert(query, update, "coreDataSync");
                }

            }
        }


        if (!CollectionUtils.isEmpty(upsertCondition)) {

            List<List<Pair<Query, Update>>> upsertConditionSub = Lists.partition(upsertCondition, 5000);
            for (int i = 0; i < upsertConditionSub.size(); i++) {
                List<Pair<Query, Update>> pairs = upsertConditionSub.get(i);
                BulkOperations operations = secondMongoTemplate.bulkOps(BulkOperations.BulkMode.UNORDERED, "xxxxxx");
                operations.upsert(pairs);
                BulkWriteResult result = operations.execute();
            }

        }
    }

四、小结

理念很简单,实践很重要。可以动手实践一下,然后评论中讨论讨论~~

借鉴:
MongoDB系列-- SpringBoot 中MongoDB多数据源配置

  • 1
    点赞
  • 15
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

你个佬六

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值