（六）使用多线程批量数据库千万级别业务数据与mysql数据量大时使用limit分页

最新推荐文章于 2024-08-23 09:24:56 发布

mtsbv110

最新推荐文章于 2024-08-23 09:24:56 发布

阅读量1.1k

点赞数 1

分类专栏：多线程文章标签： java

本文链接：https://blog.csdn.net/mtsbv110/article/details/105855954

版权

多线程专栏收录该内容

8 篇文章 0 订阅

订阅专栏

在实际业务中，需要从数据库中（千万级别）的数据，批量进行处理，将数据同步到索引服务器中。

核心问题：

1、使用多线程查询数据库中的数据，例如我们有10个线程，每一个线程每次分页步长为10，数据一个10000条。代码如下：

public class Demo18Sequence {

	private static int nThreads = 10;
	private static int pageSize = 10;
	private static int totalNum = 10000;
	
	public static void main(String[] args) {
		try {
			CountDownLatch latch = new CountDownLatch(nThreads);
			ExecutorService newFixedThreadPool = Executors.newFixedThreadPool(nThreads);
			long startTime = System.currentTimeMillis();
			TaskRunnable.start = -pageSize;
			for (int i = 0; i < nThreads; i++) {
				TaskRunnable taski = new TaskRunnable(i+1,pageSize,totalNum,latch);
				newFixedThreadPool.execute(taski);
			}
			latch.await();
			System.out.println(String.format("任务执行成功,耗时{%s}毫秒", System.currentTimeMillis() - startTime));
			newFixedThreadPool.shutdown();
		} catch (InterruptedException e) {
			e.printStackTrace();
		}
	}

}

class TaskRunnable implements Runnable {
	private  int pageSize;
	private  int totalNum;
	private  int startNum;
	private CountDownLatch latch;
	public static Integer start;
	private int threadi;
	public TaskRunnable(int threadi,int pageSize, int totalNum,CountDownLatch latch) {
		this.latch = latch;
		this.pageSize = pageSize;
		this.totalNum =totalNum;
		this.threadi = threadi;
	}

	public void getNext(){
		while (start+this.pageSize < totalNum) {
			synchronized(start) {
				start += this.pageSize;
				startNum = start;
			}
			int result = start+pageSize;
			System.out.println(Thread.currentThread().getName()+"["+threadi+"] "+"startNum="+startNum+" pageSize="+result);
		}
	}
	
	@Override
	public void run() {
		try {
			getNext();
		}catch(Exception ex) {
			ex.printStackTrace();
		}finally {
			latch.countDown();
		}
	}
}

2、mysql数据量大时使用limit分页，随着页码的增大，查询效率越低下。本文给大家分享的是作者在使用mysql进行千万级数据量分页查询的时候进行性能优化的方法。

传统分页查询：SELECT c1,c2,cn… FROM table LIMIT n,m

MySQL的limit工作原理就是先读取前面n条记录，然后抛弃前n条，读后面m条想要的，所以n越大，偏移量越大，性能就越差。

如果我们也要查询所有列，有两种方法，一种是id>=的形式，另一种就是利用join，看下实际情况

<!-- 表数据行数较多自增ID分页 -->
	<select id="selectBigDataPageForIncrementId" resultType="map">
		SELECT
        <include refid="answerSheetColumns"/>
        FROM answer_sheet a
        <where>
        	id >= (SELECT id FROM answer_sheet ORDER BY id LIMIT #{pageNo}, 1)
        </where>
        ORDER BY a.id
        LIMIT #{size}
	</select>
	
<!-- 表数据行数较多分页 -->
	<select id="selectBigDataPage" resultType="map">
		SELECT
		<include refid="answerSheetColumns"/>
		FROM
			answer_sheet a
		JOIN (
			SELECT
				id
			FROM
				answer_sheet
			ORDER BY
				id
			LIMIT #{pageNo}, #{size}
		) b ON a.id = b.id
	</select>