在实际业务中,需要从数据库中(千万级别)的数据,批量进行处理,将数据同步到索引服务器中。
核心问题:
1、使用多线程查询数据库中的数据,例如我们有10个线程,每一个线程每次分页步长为10,数据一个10000条。代码如下:
public class Demo18Sequence {
private static int nThreads = 10;
private static int pageSize = 10;
private static int totalNum = 10000;
public static void main(String[] args) {
try {
CountDownLatch latch = new CountDownLatch(nThreads);
ExecutorService newFixedThreadPool = Executors.newFixedThreadPool(nThreads);
long startTime = System.currentTimeMillis();
TaskRunnable.start = -pageSize;
for (int i = 0; i < nThreads; i++) {
TaskRunnable taski = new TaskRunnable(i+1,pageSize,totalNum,latch);
newFixedThreadPool.execute(taski);
}
latch.await();
System.out.println(String.format("任务执行成功,耗时{%s}毫秒", System.currentTimeMillis() - startTime));
newFixedThreadPool.shutdown();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
class TaskRunnable implements Runnable {
private int pageSize;
private int totalNum;
private int startNum;
private CountDownLatch latch;
public static Integer start;
private int threadi;
public TaskRunnable(int threadi,int pageSize, int totalNum,CountDownLatch latch) {
this.latch = latch;
this.pageSize = pageSize;
this.totalNum =totalNum;
this.threadi = threadi;
}
public void getNext(){
while (start+this.pageSize < totalNum) {
synchronized(start) {
start += this.pageSize;
startNum = start;
}
int result = start+pageSize;
System.out.println(Thread.currentThread().getName()+"["+threadi+"] "+"startNum="+startNum+" pageSize="+result);
}
}
@Override
public void run() {
try {
getNext();
}catch(Exception ex) {
ex.printStackTrace();
}finally {
latch.countDown();
}
}
}
2、mysql数据量大时使用limit分页,随着页码的增大,查询效率越低下。本文给大家分享的是作者在使用mysql进行千万级数据量分页查询的时候进行性能优化的方法。
传统分页查询:SELECT c1,c2,cn… FROM table LIMIT n,m
MySQL的limit工作原理就是先读取前面n条记录,然后抛弃前n条,读后面m条想要的,所以n越大,偏移量越大,性能就越差。
如果我们也要查询所有列,有两种方法,一种是id>=的形式,另一种就是利用join,看下实际情况
<!-- 表数据行数较多自增ID分页 -->
<select id="selectBigDataPageForIncrementId" resultType="map">
SELECT
<include refid="answerSheetColumns"/>
FROM answer_sheet a
<where>
id >= (SELECT id FROM answer_sheet ORDER BY id LIMIT #{pageNo}, 1)
</where>
ORDER BY a.id
LIMIT #{size}
</select>
<!-- 表数据行数较多分页 -->
<select id="selectBigDataPage" resultType="map">
SELECT
<include refid="answerSheetColumns"/>
FROM
answer_sheet a
JOIN (
SELECT
id
FROM
answer_sheet
ORDER BY
id
LIMIT #{pageNo}, #{size}
) b ON a.id = b.id
</select>