一、背景:
从京东接口拉去的上百万条商品信息,有些商品图片拉去不完整,需要找出这些商品并重新拉取;
二、实现步骤:
- springboot项目,自定义线程池,添加配置类:
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.scheduling.annotation.EnableAsync;
import org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor;
import java.util.concurrent.Executor;
@Configuration
@EnableAsync // 启用异步任务
public class AsyncConfiguration {
private final static Logger logger = LoggerFactory.getLogger(AsyncConfiguration.class);
// 声明一个线程池(并指定线程池的名字)
@Bean("taskExecutor")
public Executor asyncExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
//核心线程数5:线程池创建时候初始化的线程数
executor.setCorePoolSize(5);
//最大线程数5:线程池最大的线程数,只有在缓冲队列满了之后才会申请超过核心线程数的线程
executor.setMaxPoolSize(5);
//缓冲队列500:用来缓冲执行任务的队列
executor.setQueueCapacity(500);
//允许线程的空闲时间60秒:当超过了核心线程出之外的线程在空闲时间到达之后会被销毁
executor.setKeepAliveSeconds(60);
//线程池名的前缀:设置好了之后可以方便我们定位处理任务所在的线程池
executor.setThreadNamePrefix("DailyAsync-");
//初始化
executor.initialize();
logger.info("线程池初始化成功");
return executor;
}
}
- 在service层定义方法,加上注解@Async(“taskExecutor”)
private final static String urlReg = "<img[^<>]*?\\ssrc=['\"]?(.*?)['\"]";
private final static Pattern ATTR_PATTERN = Pattern.compile(urlReg,Pattern.CASE_INSENSITIVE);
@Async("taskExecutor")
public void checkPart(List<product> list,CountDownLatch countDL){
private List<String> urlList = new ArrayList<String>();
Out:
for(Product product:list){
urlList .clean();
// 获取商品详情html字符串
String str = product.getIntroduction();
if(StringUtils.hasText(s){
// 正则表达式匹配,并暂存在urlList
Matcher matcher = ATTR_PATTERN.matcher(s);
while (matcher.find()) {
urlList.add(matcher.group(1));
}
}
for(String item:urlList ){
File file = new File(item);
if(!file.exists()){
logger.info(product.getSku() + "不存在");
...将图片不完整的商品保存到数据库
continue Out;
}
}
}
// 该线程执行完毕后countDL减一
countDL.countDown();
}
- 调用异步方法:
private final static String last_value = "d:/last_value.txt";
public void checkImg() throws Exception {
File file = new File(last_value );
BufferedReader BufferedReader = new BufferedReader(new FileReader((file)));
String lasr_value = BufferedReader.readLine();
BufferedReader.close();
int maxIn = 3000000; //需要写sql去查最大的id(自增id)
//超出最大id就不再检查了
if(Integer.valueOf(last_value) > maxIn){
return;
}
// 查询下一批要查询的商品(where id > last_value and id <= last_value+1000)
List<product> checkList = productMapper.query(last_value);
if(checkList != null && checkList.size() > 0){
CountDownLatch countDL = new CountDownLatch(10);
// 利用splitList方法将checkList平分为10份(这里不再赘述)
List<List<product>> splitProduct = splitList(checkList);
for(int i = 0 ; i < 10 ; i++){
checkPart(splitProduct .get(i),countDL );
}
// 10个线程全部完成才能继续
countDL.await();
}
BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter(file));
// 存入最新的last_value
bufferedWriter.write(String.valueOf(Integer.valueOf(lasr_value) + 1000));
bufferedWriter.close();
// 由于过程中变量操作频繁,垃圾回收机制很可能更不上,每次调用结束之后最好暂停一定时间,以免内存溢出
Thread.sleep(500);
// 循环调用
checkImg();
}
注:由于源码在内网环境(本地由于没有图片,无法自测),代码都是手打出来的,如有写错请见谅,大致思路是清晰的。