如何批量处理大数据量数据库历史数据

最新推荐文章于 2024-07-30 08:02:53 发布

huiye_2017

最新推荐文章于 2024-07-30 08:02:53 发布

阅读量1.8k

点赞数 3

分类专栏：数据处理文章标签： java 队列 queue mysql

本文链接：https://blog.csdn.net/huiye_2017/article/details/106587724

版权

数据处理专栏收录该内容

1 篇文章 0 订阅

订阅专栏

如何批量处理大数据量数据库历史数据

如何批量处理大数据量数据库历史数据，多线程生产者消费者模式，毒丸法关闭消费者线程

最近，在写项目的时候，遇到这样一个需求，即对有大数据量表的历史数据替换数据需求，这张表有790万条数据，大小有80个G 有一个字段是text类型的json字符串其结构是这样的：

[
	{
		"note": "",
		"material": "201CRC",
		"port": "九江港，三水港",
		"price": "1000",
		"selectNote": [],
		"raise": "-",
		"place": "宏旺",
		"breed": "板卷",
		"spec": "2.0*1240*C",
		"mcodes": [
			{
				"code": "CN440601TG50444812316500",
				"value": "1000",
				"key": "",
				"status": 1
			}
		],
		"rowId": "68742083badb4c1e73b49cabcce9dbd2",
		"status": 1
	},
	{
		"note": "",
		"material": "304CRC",
		"port": "九江港，三水港",
		"price": "1500",
		"selectNote": [],
		"raise": "-",
		"place": "宏旺",
		"breed": "板卷",
		"spec": "2.0*1240*C",
		"mcodes": [
			{
				"code": "CN440601TG50444812416600",
				"value": "1500",
				"key": "",
				"status": 1
			}
		],
		"rowId": "941afe24b8ccf673d21382ddac801053",
		"status": 1
	},
	{
		"note": "今日封盘，暂无报价",
		"material": "430CRC",
		"port": "九江港，三水港",
		"price": "1100.27",
		"selectNote": [],
		"raise": "-",
		"place": "酒钢",
		"breed": "板卷",
		"spec": "2.0*1240*C",
		"mcodes": [
			{
				"code": "CN440601TG50424812516700",
				"value": "1100.27",
				"key": "",
				"status": 1
			}
		],
		"rowId": "87e4d7a5239b142dae89f7c63f0b4ef8",
		"status": 1
	}
]

我的任务是替换与另一种表只相同的 code 对应的rowId，业务很简单，但是要处理这么大量的数据，对于查询和更新数据库都是一个大的挑战。

需要用到的技术点

分页查询，数据量太多，不可以一次性出处理，这样不仅查询非常慢，还会撑爆内存，造成oom
多线程操作，既然单个查询太慢，就多开几个线程同时操作了。这里要注意一些多线程操作，注意线程安全，已经数据库锁表，锁行
数据库批处理，因为要对分页查询出的数据就行替换并更新，一条条更新的话太慢，批量提交更新，能大大提高效率
消费者，生产者设计模式
连接池、线程池思想

解决思路

先来看看我前期的一些想法：

开多线程分页查询需要替换表数据，每次查询10000条数据后处理逻辑，这种查询方式，越查到后面查询越慢，所以我放弃了。

        //查询MRK_CONTENT数据总数
        int maxNum = getMaxNum();
        int num = 5000;
        //需要查询次数
        int times = maxNum / num;
        if (maxNum % num != 0) {
            times = times + 1;
        }
        //开始查询行数
        int startIndex = 3580000;
        //使一个线程等待其他线程各自执行完毕后再执行
        CountDownLatch end = new CountDownLatch(times);
        //定义固定长度的线程池  防止线程过多，这个线程池使用的是无界队列，阿里是不推荐使用的，这里我是知道我的总查询次数
        ExecutorService executor = Executors.newFixedThreadPool(15);
        for (int i=0; i < times; i++) {
           Runnable updateContentThread = new UpdateContentThread(hashMap, end, startIndex, num);
           executor.execute(updateContentThread);
           startIndex = startIndex + num;
        }
        logger.info("任务已下发！");
        end.await();
        //执行完关闭线程池
        executor.shutdown();

线程类

/**
 * 更新数据具体操作线程
 */
public class UpdateContentThread implements Runnable {

    private static final Logger logger = LoggerFactory.getLogger(UpdateContentThread.class);
    private Map<Long, List<MrkRowId>> hashMap;
    private CountDownLatch end;
    private int startIndex;
    int num;


    public UpdateContentThread(Map<Long, List<MrkRowId>> hashMap, CountDownLatch end, int startIndex, int num) {
        this.hashMap = hashMap;
        this.end = end;
        this.startIndex = startIndex;
        this.num = num;
    }
    @Override
    public void run() {
        long startTime = System.currentTimeMillis();
        List<MrkContent> mrkContents = updateContentReadFile.queryMarketContentByPage(startIndex, num);
        long endTime = System.currentTimeMillis();
        logger.info("分页查询完成：" + startIndex/10000 + "万条" + "，花费时间" + (endTime - startTime) + "毫秒");
        Connection connection = null;
        PreparedStatement pst = null;
        try {
            connection = flashRowId.getConnection();
            connection.setAutoCommit(false);
            //这里用插入代替更新用于测试
            //这里使用jdbc 批处理方便测试，使用mybatis也可以
            pst = connection.prepareStatement("INSERT INTO MRK_CONTENT_copy values (?,?,?,?,?,?,?,?,?,?,?,?)");
            int task = 0;
            for (MrkContent mrkContent : mrkContents) {
                //提前生成map数据，用于匹配数据，与业务相关，无需关注
                List<MrkRowId> mrkRowIds = hashMap.get(mrkContent.getTableId());
                if (!CollectionUtils.isEmpty(mrkRowIds)) {
                    String conttent = mrkContent.getContent();
                    JSONArray conttentJson = JSONObject.parseArray(conttent);
                    String content = replaceContent(conttentJson, mrkRowIds);
                    //未更改则无需更新
                    if ("NO".equals(content)) {
                        continue;
                    }
                    pst.setLong(1, mrkContent.getMarketId());
                    pst.setString(2, "");
                    pst.setString(3, "");
                    pst.setString(4, "");
                    pst.setString(5, content);
                    pst.setString(6, "");
                    pst.setString(7, "");
                    pst.setString(8, "");
                    pst.setString(9, "");
                    pst.setString(10, "");
                    pst.setString(11, "");
                    pst.setString(12, "");
                    pst.addBatch();
                    task++;
                }
                if(task !=0 && task%1000 ==0) {
                    logger.info("正在拼装更新语句：" + task);
                }
            }
            pst.executeBatch();
            connection.commit();
            long endTime2 = System.currentTimeMillis();
            logger.info("已完成跟新任务条数：" + task + ",线程名：" + Thread.currentThread().getName() + ",更新花费时间：" + (endTime2 - endTime) + "毫秒");
        } catch (SQLException e) {
            try {
                connection.rollback();
            } catch (SQLException ex) {
                ex.printStackTrace();
            }
            logger.error("批量更新MRK_CONTENT sql报错！", e);
        }finally {
            if (pst != null) {
                try {
                    pst.close();
                } catch (SQLException e) {
                    e.printStackTrace();
                }
            }
            if(connection != null) {
                try {
                    connection.close();
                } catch (SQLException e) {
                    e.printStackTrace();
                }
            }
            end.countDown();
        }

    }

    private static String replaceContent(JSONArray conttentJson, List<MrkRowId> mrkRowIds) {
        boolean updated = false;
        for (int i = 0; i < conttentJson.size(); i++) {
            JSONObject jsonObject = conttentJson.getJSONObject(i);
            JSONArray mcodes = jsonObject.getJSONArray("mcodes");
            if (mcodes != null) {
                String code = mcodes.getJSONObject(0).getString("code");
                if (StringUtils.isNotEmpty(code)) {
                    for (MrkRowId mrkRowId : mrkRowIds) {
                        if (mrkRowId.getCode().equals(code)) {
                            jsonObject.put("rowId", mrkRowId.getNewRowId());
                            updated = true;
                        }
                    }
                }
            }
        }
        if (updated) {
            return conttentJson.toString();
        }
        return "NO";
    }

}

分页查询方法

    public static List<MrkContent> queryMarketContentByPage(int startIndex, int num) {
        Connection connection = flashRowId.getConnection();
        PreparedStatement stmt = null;
        ResultSet resultSet = null;
        List<MrkContent> mrkContents = new ArrayList<>();
        try {
        //这个查询语句是为了优化大数据查询所写的，比普通limit方式快一些，但是越查到后面查询越慢所以limit查询方式不适合大数据量查询
            String sql = "select marketId,tableId,content from MRK_CONTENT mc LEFT JOIN MRK_MARKET mm ON mc.marketId = mm.id where mm.id>(select id from MRK_MARKET order by id limit ?,1) ORDER BY marketId limit ?";
            stmt = connection.prepareStatement(sql);
            //stmt.setInt(1, startIndex);
            //stmt.setInt(2, num);
            stmt.setLong(1,11537860);
            resultSet = stmt.executeQuery();
            while(resultSet.next()){
                Long marketId = resultSet.getLong(1);
                long tableId = resultSet.getLong(2);
                String content = resultSet.getString(3);

                MrkContent mrkContent = new MrkContent();
                mrkContent.setMarketId(marketId);
                mrkContent.setContent(content);
                mrkContent.setTableId(tableId);
                mrkContents.add(mrkContent);
            }
        } catch (SQLException e) {
            logger.error("分页查询mrkContent sql查询报错！", e);
        }finally {
            if (resultSet != null) {
                try {
                    resultSet.close();
                } catch (SQLException e) {
                    e.printStackTrace();
                }
            }
            if (stmt != null) {
                try {
                    stmt.close();
                } catch (SQLException e) {
                    e.printStackTrace();
                }
            }
            if (connection != null) {
                try {
                    connection.close();
                } catch (SQLException e) {
                    e.printStackTrace();
                }
            }
        }
        return mrkContents;
    }

按主键查询，分批处理，由于我只到该表的同一tableId下带数据量最大有4000条左右，所以我按tableId分批查询，这样不会查询越来越慢。

public class UpdateMarketContentThread implements Callable<Integer> {
    private static final Logger logger = LoggerFactory.getLogger(UpdateMarketContentThread.class);
    private List<Long> tableIds;
    private CountDownLatch end;
    private Map<Long, List<MrkRowId>> hashMap;

    public UpdateMarketContentThread(List<Long> tableIds, CountDownLatch end, Map<Long, List<MrkRowId>> hashMap) {
        this.tableIds = tableIds;
        this.end = end;
        this.hashMap = hashMap;
    }

    @Override
    public Integer call() throws Exception {
        int taskCount = 0;
        int result = 0;
        int count = 0;
        int size = tableIds.size();
        //pst = connection.prepareStatement("UPDATE MRK_CONTENT_copy SET content = ? WHERE marketId = ?");
        for (Long tableId : tableIds) {
            long startTime = System.currentTimeMillis();
            List<MrkRowId> mrkRowIds = hashMap.get(tableId);
            if (CollectionUtils.isEmpty(mrkRowIds)) {
                continue;
            }
            List<MrkContent> mrkContents = flashRowId.queryMrkContentByTableId(tableId);
            long endTime = System.currentTimeMillis();
            logger.info("线程：" + Thread.currentThread().getName() + "，查询到MRK_CONTENT数为：" + mrkContents.size() + ",处理时间为：" + (endTime - startTime) + "毫秒");
            if (!CollectionUtils.isEmpty(mrkContents)) {
                Connection connection = null;
                PreparedStatement pst = null;
                try {
                    connection = flashRowId.getConnection();
                    connection.setAutoCommit(false);
                    pst = connection.prepareStatement("INSERT INTO MRK_CONTENT_copy values (?,?,?,?,?,?,?,?,?,?,?,?)");
                    boolean hadReplace = false;
                    for (MrkContent mrkContent : mrkContents) {
                        String conttent = mrkContent.getContent();
                        JSONArray conttentJson = JSONObject.parseArray(conttent);
                        String content = replaceContent(conttentJson, mrkRowIds);
                        //未更改则无需更新
                        if ("NO".equals(content)) {
                            continue;
                        }
                        pst.setLong(1, mrkContent.getMarketId());
                        pst.setString(2, "");
                        pst.setString(3, "");
                        pst.setString(4, "");
                        pst.setString(5, content);
                        pst.setString(6, "");
                        pst.setString(7, "");
                        pst.setString(8, "");
                        pst.setString(9, "");
                        pst.setString(10, "");
                        pst.setString(11, "");
                        pst.setString(12, "");
                        pst.addBatch();
                        taskCount++;
                        hadReplace = true;
                        //logger.info("线程：" + Thread.currentThread().getName() + "，已完成替换任务数：" + taskCount);
                    }
                    if (hadReplace) {
                        int[] ints = pst.executeBatch();
                        for (int i = 0; i < ints.length; i++) {
                            if (ints[i] > 0) {
                                result = result + ints[i];
                            } else {
                                //logger.error("线程：" + Thread.currentThread().getName() + "，更新返回结果为：" + ints[i]);
                            }
                        }
                        long endTime2 = System.currentTimeMillis();
                        logger.info("线程：" + Thread.currentThread().getName() + "，已完成替换任务数：" + taskCount + "插入时间为：" + (endTime2 - endTime)/1000 + "秒");
                    }
                    connection.commit();
                } catch (SQLException e) {
                    try {
                        connection.rollback();
                    } catch (SQLException ex) {
                        ex.printStackTrace();
                    }
                    logger.error("更新MRK_CONTENT sql 执行错误！" + e);
                } finally {
                    if (pst != null) {
                        try {
                            pst.close();
                        } catch (SQLException e) {
                            e.printStackTrace();
                        }
                    }
                    if (connection != null) {
                        try {
                            connection.close();
                        } catch (SQLException e) {
                            e.printStackTrace();
                        }
                    }
                }
            }
            count++;
            if (count % 10 == 0) {
                logger.info("线程：" + Thread.currentThread().getName() + ",总tableId数：" + size + ",已完成tableId数：" + count);
            }

        }
        end.countDown();
        logger.info("插入数据执行完成!" + Thread.currentThread().getName() + "更新MRK_CONTENT数据条数为：" + result);
        return result;
    }

    private String replaceContent(JSONArray conttentJson, List<MrkRowId> mrkRowIds) {
        boolean updated = false;
        for (int i = 0; i < conttentJson.size(); i++) {
            JSONObject jsonObject = conttentJson.getJSONObject(i);
            JSONArray mcodes = jsonObject.getJSONArray("mcodes");
            if (mcodes != null) {
                String code = mcodes.getJSONObject(0).getString("code");
                if (StringUtils.isNotEmpty(code)) {
                    for (MrkRowId mrkRowId : mrkRowIds) {
                        if (mrkRowId.getCode().equals(code)) {
                            jsonObject.put("rowId", mrkRowId.getNewRowId());
                            updated = true;
                        }
                    }
                }
            }
        }
        if (updated) {
            return conttentJson.toString();
        }
        return "NO";
    }

}

这样查询结果还是太慢，因为查询操作，匹配逻辑以及更新新插入处理都在同一线程中处理，其中一项处理慢，则拖慢这一整条线路，所以最好是将这几个逻辑分开处理，在操作慢的地方，多开几个线程处理。

最后的解决思路

使用生产者消费者模式，将逻辑分开操作，我这里使用多生产者，多消费者模式：

首先是创建连接池，我这里使用的是Druid，Java语言中最好的数据库连接池，以properties方式创建，具体可以参照其他教程，这里不详述：

/**
 * 从连接池中获取连接
 */
public class DataSoureUtil {
    public static DataSource ds;
    public static final ThreadLocal<Connection> tl = new ThreadLocal<Connection>();

    // 把流操作提取到静态代码块里面。
    static {
        InputStream in = null;
        // 创建连接池对象
        try {
            // 创建出Properties对象
            Properties prop = new Properties();
            in = DataSoureUtil.class.getResourceAsStream("/webProperties/marketContentDatasource.properties");
            // 读取文件中的数据
            prop.load(in);
            // 创建连接池对象
            ds = DruidDataSourceFactory.createDataSource(prop);
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
            throw new RuntimeException(e);
        } catch (ClassNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
            throw new RuntimeException(e);
        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } finally {
            if (in != null) {
                try {
                    in.close();
                } catch (IOException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }
            }
        }
    }

    /**
     * @return
     */
    public static Connection getConnection() {
        //优先从线程中获取连接
        Connection conn = tl.get();
        // 如果没有拿到这个线程，就说明是业务类在调用
        //如果从线程中获取失败，则再新建和数据库的连接
        if (conn == null) {
            try {
                // 用数据库连接池来获取数据库连接
                conn = ds.getConnection();
                tl.set(conn);
            } catch (SQLException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
                throw new RuntimeException(e);
            }
        }

        return conn;
    }

    /**
     * @param conn
     * @param ps
     * @param rs
     */
    public static void toClose(Connection conn, PreparedStatement ps, ResultSet rs) {
        try {
            if (rs != null) {
                rs.close();
            }
        } catch (SQLException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        // 释放ps对象，关闭资源的时候需要注意顺序，后创建的对象，要先关闭。
        try {
            if (ps != null) {
                ps.close();
            }
        } catch (SQLException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        try {
            if (conn != null) {
                conn.close();
                //关闭连接，要从线程中移除
                tl.remove();
            }
        } catch (SQLException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }

    public static void toClose(Connection conn, PreparedStatement ps) {
        toClose(conn, ps, null);
    }
}

配置文件：
注意：需要开启rewriteBatchedStatements=true 才能开启批量提交

url=jdbc:mysql://##.##.##.##:3306/#####?useUnicode=true&characterEncoding=gb2312&allowMultiQueries=true&rewriteBatchedStatements=true
username=######
password=#########
#初始连接数
initialSize=16
#最大连接数
maxActive=100
#最小空闲链接数
minIdle=8
#检测连接池是否有效的sql 
validationQuery=select 1
#申请连接时执行validationQery检测连接是否有效,配置true会降低
testOnBorrow=false
#归还连接时执行validationQuery检测连接是否有效，配置为true会降低性能
testOnReturn=false
#申请连接的时候检测
testWhileIdle=true
#配置间隔多久才进行一次检测,检测需要关闭的空闲链接,单位是毫秒:
timeBetweenEvictionRunsMillis = 60000

@Controller
public class FlashMrkContentController {
    private static final Logger logger = LoggerFactory.getLogger(FlashMrkContentController.class);

    //毒丸对象 用于杀死消费者。用于生成者生成完成后，关闭消费者线程
    public static final Object POISON_PILL = new Object();
    @Autowired
    private MrkMarketMapper mrkMarketMapper;
    //获取当前系统核心数
    //private static int runSize = Runtime.getRuntime().availableProcessors();
    private static int runSize = 16;
    private static ExecutorService executor = Executors.newFixedThreadPool(runSize);
    private int productCount = 10;
    private int consumerCount = runSize - productCount;


    public String flashContent() throws InterruptedException {
        BlockingQueue<Object> queue = new ArrayBlockingQueue<Object>(5);
        //线程计数器，使一个线程等待其他线程各自执行完毕后再执行
        CountDownLatch end = new CountDownLatch(productCount);
        long startTime = System.currentTimeMillis();
        //查询所有tableId,分配给生产者
        List<Long> tableIds = mrkMarketMapper.queryAllMarketTableId();
        int listSize = tableIds.size();
        logger.info("获取tableIds数量为：" + listSize);
        //给tableIds 分段
        int taskNum = listSize/productCount;
        Map<Long, List<MrkRowId>> marketRowIdMap = getMarketRowIdMap();
        //均分list,使生成者任务数相差不大
        List<List<Long>> lists = averageAssign(tableIds, productCount);
        //开启生产者
        for (List<Long> ids : lists) {
            Producter producter = new Producter(ids, queue, end);
            executor.execute(producter);
        }
        //开启消费者
        for(int i = 0; i < consumerCount; i++){
            Consumer consumer = new Consumer(queue, marketRowIdMap);
            executor.execute(consumer);
        }
        logger.info("任务已下发！");
        //等待所有生产者生产完成！
        end.await();
        long endTime = System.currentTimeMillis();
        logger.info("所有生产者已完成生产，开始放入毒丸，杀死消费者,花费时间为：" + (endTime - startTime)/1000 + "秒");
        queue.put(POISON_PILL);
        return "任务已完成！";
    }

    /**
     * 将一个list均分成n个list,主要通过偏移量来实现的
     *
     * @param source
     * @return
     */
    public static <T> List<List<T>> averageAssign(List<T> source, int n) {
        List<List<T>> result = new ArrayList<List<T>>();
        int remaider = source.size() % n;  //(先计算出余数)
        int number = source.size() / n;  //然后是商
        int offset = 0;//偏移量
        for (int i = 0; i < n; i++) {
            List<T> value = null;
            if (remaider > 0) {
                value = source.subList(i * number + offset, (i + 1) * number + offset + 1);
                remaider--;
                offset++;
            } else {
                value = source.subList(i * number + offset, (i + 1) * number + offset);
            }
            result.add(value);
        }
        return result;
    }

/**
*获取code映射tableId,业务相关
*
**/
    public Map<Long, List<MrkRowId>> getMarketRowIdMap() {
        List<MrkRowId> mrkRowIds = MrkMarketMapper.queryAllMrkRow();
        Map<Long, List<MrkRowId>> hashMap = new HashMap<>();

        for (int i=0; i<mrkRowIds.size(); i++) {
            MrkRowId mrkRowId = mrkRowIds.get(i);
            Long tableId = mrkRowId.getTableId();
            if (hashMap.containsKey(tableId)) {
                List<MrkRowId> mrkRowIds1 = hashMap.get(tableId);
                mrkRowIds1.add(mrkRowId);
                hashMap.put(tableId, mrkRowIds1);
            }else {
                List<MrkRowId> mrkRowIds2 = new ArrayList<>();
                mrkRowIds2.add(mrkRowId);
                hashMap.put(tableId, mrkRowIds2);
            }
        }
        logger.info("获取hashMap大小：" + hashMap.size());
        return hashMap;
    }
}

生产者：

public class Producter implements Runnable {
    private static final Logger logger = LoggerFactory.getLogger(Producter.class);

    private List<Long> tableIds;
    //线程安全阻塞队列，详细用法可自行查询
    private BlockingQueue<Object> queue;
    private CountDownLatch end;
    private Map<Long, List<MrkRowId>> mrkRowIdMap;


    public Producter(List<Long> tableIds, BlockingQueue<Object> queue, CountDownLatch end) {
        this.tableIds = tableIds;
        this.queue = queue;
        this.end = end;
        this.mrkRowIdMap = mrkRowIdMap;
    }
    @Override
    public void run() {
        for (Long tableId : tableIds) {
            long startTime = System.currentTimeMillis();
            int pageNum = 0;
            int pageSzie = 1000;
            List<MrkContent> mrkContentTotal = new ArrayList<>();
            while (true) {
                List<MrkContent> mrkContents = MrkMarketMapper.queryMrkContentByTableId(tableId, pageNum*pageSzie, pageSzie);
                if(CollectionUtils.isEmpty(mrkContents)) {
                    break;
                }
                mrkContentTotal.addAll(mrkContents);
                pageNum++;
            }
            long endTime = System.currentTimeMillis();
            logger.info("生产者：" + Thread.currentThread().getName() + "tableId:" + tableId + "，查询到MRK_CONTENT数为：" + mrkContentTotal.size() + ",处理时间为：" + (endTime - startTime) + "毫秒");
            if(!CollectionUtils.isEmpty(mrkContentTotal)) {
                try {
                	//队列满时则线程阻塞
                    queue.put(mrkContentTotal);
                    logger.info("生产者：" + Thread.currentThread().getName() + ",当前队列剩余可存储元素数量为：" + queue.remainingCapacity());
                    //更新MRK_ROWID表，该tableId 状态变为已更新状态
                    MrkMarketMapper.updateMrkTableRowStatus(tableId);
                } catch (Exception e) {
                    logger.error("放入队列异常！tableId为：" + tableId, e);
                }
            }
        }
        logger.info("生产者：" + Thread.currentThread().getName() + "生产者任务执行完毕！");
        end.countDown();
    }
}

消费者：

public class Consumer implements Runnable {
    private static final Logger logger = LoggerFactory.getLogger(Consumer.class);

    private BlockingQueue<Object> queue;
    private Map<Long, List<MrkRowId>> mrkRowIdMap;

    public Consumer(BlockingQueue<Object> queue, Map<Long, List<MrkRowId>> mrkRowIdMap) {
        this.queue = queue;
        this.mrkRowIdMap = mrkRowIdMap;
    }

    @Override
    public void run() {
        logger.info("线程" + Thread.currentThread().getName() + ",消费者线程开启！");
        int taskCount = 0;
        try {
            while (true) {
                Object item = queue.take();
                //如果对象为毒丸则跳出循环，在放毒丸回quene
                if (item == FlashMrkContentController.POISON_PILL) {
                    queue.put(item);
                    logger.info("消费者:" + Thread.currentThread().getName() + ",消费者线程结束！");
                    break;
                }
                List<MrkContent> mrkContents = (List<MrkContent>) item;
                List<MrkRowId> mrkRowIds = mrkRowIdMap.get(mrkContents.get(0).getTableId());
                if (CollectionUtils.isEmpty(mrkRowIds)) {
                    continue;
                }
                Connection connection = null;
                PreparedStatement pst = null;
                long startTime = System.currentTimeMillis();
                try {
                    connection = DataSoureUtil.getConnection();
                    connection.setAutoCommit(false);
                    //测试用，最终需要变为更新
                    pst = connection.prepareStatement("INSERT INTO MRK_CONTENT_copy1 values (?,?,?,?,?,?,?,?,?,?,?,?)");
                    boolean hadReplace = false;
                    int taskSingle = 0;
                    for (MrkContent mrkContent : mrkContents) {
                        String conttent = mrkContent.getContent();
                        JSONArray conttentJson = JSONObject.parseArray(conttent);
                        String content = replaceContent(conttentJson, mrkRowIds);
                        //未更改则无需更新
                        if ("NO".equals(content)) {
                            continue;
                        }
//                            pst.setString(1, content);
//                            pst.setLong(2, mrkContent.getMarketId());
                        pst.setLong(1, mrkContent.getMarketId());
                        pst.setString(2, "");
                        pst.setString(3, "");
                        pst.setString(4, "");
                        pst.setString(5, content);
                        pst.setString(6, "");
                        pst.setString(7, "");
                        pst.setString(8, "");
                        pst.setString(9, "");
                        pst.setString(10, "");
                        pst.setString(11, "");
                        pst.setString(12, "");
                        pst.addBatch();
                        taskCount++;
                        taskSingle++;
                        hadReplace = true;
                    }
                    if (hadReplace) {
                        pst.executeBatch();
                    }
                    connection.commit();
                    long endTime = System.currentTimeMillis();
                    logger.info("消费者：" + Thread.currentThread().getName() + "，已完成替换任务总数：" + taskCount + "单次插入时间为：" + (startTime - endTime)/1000 + "秒，" +
                            "单次任务数" + mrkContents.size() + "单次完成任务数：" + taskSingle);
                } catch (SQLException e) {
                    try {
                        connection.rollback();
                    } catch (SQLException ex) {
                        ex.printStackTrace();
                    }
                    logger.error("更新MRK_CONTENT sql 执行错误！", e);
                } finally {
                    DataSoureUtil.toClose(connection, pst);
                }

            }
        } catch (InterruptedException e) {
            logger.error("消费者" + Thread.currentThread().getName() + "消费任务报错！", e);
        }
    }


    private String replaceContent(JSONArray conttentJson, List<MrkRowId> mrkRowIds) {
        boolean updated = false;
        for (int i = 0; i < conttentJson.size(); i++) {
            JSONObject jsonObject = conttentJson.getJSONObject(i);
            JSONArray mcodes = jsonObject.getJSONArray("mcodes");
            if (mcodes != null) {
                String code = mcodes.getJSONObject(0).getString("code");
                if (StringUtils.isNotEmpty(code)) {
                    for (MrkRowId mrkRowId : mrkRowIds) {
                        if (mrkRowId.getCode().equals(code)) {
                            jsonObject.put("rowId", mrkRowId.getNewRowId());
                            updated = true;
                        }
                    }
                }
            }
        }
        if (updated) {
            return conttentJson.toString();
        }
        return "NO";
    }

}

查询更新类：

@Service
public class MrkMarketMapper {
    private static final Logger logger = LoggerFactory.getLogger(MrkMarketMapper.class);

    private static final String QUERY_ALL_MARKETTABLEID = "select DISTINCT tableId from MRK_ROWID WHERE updated = 0 ORDER BY tableId ASC";

    private static final String QUERY_MRKCONTENT_BY_TABLEID_PAGE = "select marketId, content FROM MRK_CONTENT mc LEFT JOIN MRK_MARKET mm ON mc.marketId = mm.id where mm.tableId = ? ORDER BY marketId LIMIT ?,?";

    private static final String UPDATE_MRKTABLEROW_STATUS = "UPDATE MRK_ROWID SET updated = 1 WHERE tableId = ?";

    public List<Long> queryAllMarketTableId() {
        Connection connection = DataSoureUtil.getConnection();
        List<Long> tableIds = new ArrayList<>();
        PreparedStatement stmt = null;
        ResultSet resultSet = null;
        try {
            connection.setAutoCommit(true);
            stmt = connection.prepareStatement(QUERY_ALL_MARKETTABLEID);
            resultSet = stmt.executeQuery();
            while(resultSet.next()){
                Long tableId = resultSet.getLong(1);
                tableIds.add(tableId);
            }
        } catch (SQLException e) {
            logger.error("queryAllMarketTableId sql查询报错！", e);
        } finally {
            DataSoureUtil.toClose(connection, stmt, resultSet);
        }
        return tableIds;
    }

    /**
     * 根据TableId 分页查询MrkContent
     * @param tableId
     * @return
     */
    public static List<MrkContent> queryMrkContentByTableId(Long tableId, int startIndex, int pageSize) {
        Connection connection = DataSoureUtil.getConnection();
        List<MrkContent> mrkContents = new ArrayList<>();
        PreparedStatement stmt = null;
        ResultSet resultSet = null;
        try {
            connection.setAutoCommit(true);
            stmt = connection.prepareStatement(QUERY_MRKCONTENT_BY_TABLEID_PAGE);
            stmt.setLong(1, tableId);
            stmt.setInt(2, startIndex);
            stmt.setInt(3, pageSize);
            resultSet = stmt.executeQuery();
            while(resultSet.next()){
                Long marketId = resultSet.getLong(1);
                String content = resultSet.getString(2);
                MrkContent mrkContent = new MrkContent();
                mrkContent.setMarketId(marketId);
                mrkContent.setContent(content);
                mrkContent.setTableId(tableId);
                mrkContents.add(mrkContent);
            }
        } catch (SQLException e) {
            logger.error("查询 MrkContentsql报错！tableId = " + tableId, e);
        }finally {
            DataSoureUtil.toClose(connection, stmt, resultSet);
        }
        return mrkContents;
    }

    public static int updateMrkTableRowStatus(Long tableId) {
        Connection connection = DataSoureUtil.getConnection();
        PreparedStatement stmt = null;
        ResultSet resultSet = null;
        try {
            connection.setAutoCommit(true);
            stmt = connection.prepareStatement(UPDATE_MRKTABLEROW_STATUS);
            stmt.setLong(1, tableId);
            int i = stmt.executeUpdate();
            return i;
        } catch (SQLException e) {
            logger.error("更新 MRK_ROWID 更新状态 sql报错！tableId = " + tableId, e);
        }finally {
            DataSoureUtil.toClose(connection, stmt, resultSet);
        }
        return 0;
    }

    public static List<MrkRowId> queryAllMrkRow() {
        Connection connection = DataSoureUtil.getConnection();
        List<MrkRowId> mrkRowIds = new ArrayList<>();
        PreparedStatement stmt = null;
        ResultSet resultSet = null;
        try {
            connection.setAutoCommit(true);
            stmt = connection.prepareStatement("select * from MRK_ROWID");
            resultSet = stmt.executeQuery();
            while(resultSet.next()){
                Long tableId = resultSet.getLong(1);
                String oldRowId = resultSet.getString(2);
                String newRowId = resultSet.getString(3);
                String code = resultSet.getString(4);
                Integer updated = resultSet.getInt(5);
                MrkRowId mrkRowId = new MrkRowId();
                mrkRowId.setTableId(tableId);
                mrkRowId.setOldRowId(oldRowId.trim());
                mrkRowId.setNewRowId(newRowId);
                mrkRowId.setCode(code.trim());
                mrkRowId.setUpdated(updated);
                mrkRowIds.add(mrkRowId);
            }
        } catch (SQLException e) {
            logger.error("queryAllMrkRow sql查询报错！", e);
        }finally {
            DataSoureUtil.toClose(connection, stmt, resultSet);
        }
        logger.info("mrkRowIds大小：" + mrkRowIds.size());
        return mrkRowIds;
    }
}

这里我用jdbc,只为测试操作，使用框架也是一个道理！第一次写文章，粗陋之处请多多包涵！谢谢！

huiye_2017

关注

3
点赞
踩
7

收藏

觉得还不错? 一键收藏
1
评论
如何批量处理大数据量数据库历史数据

如何批量处理大数据量数据库历史数据如何批量处理大数据量数据库历史数据，多线程生产者消费者模式，毒丸法关闭消费者线程最近，在写项目的时候，遇到这样一个需求，即对有大数据量表的历史数据替换数据需求，这张表有790万条数据，大小有80个G 有一个字段是text类型的json字符串其结构是这样的：[ { "note": "", "material": "201CRC", "port": "九江港，三水港", "price": "1000", "selectNote": [], "ra
复制链接

扫一扫