批量添加测试数据
网页生成
这个链接一次可以生成1000条数据
(blank是空白数据的占比,100%就是全null;往下滑format可以选数据格式我选的sql)
for循环
一开始想的是批量生成测试数据后,直接粘贴到控制台执行就可以了。但是50w条数据得点500次。因为对测试数据得内容没有要求,所以就想着弄个for循环来插入数据。但是等了十分钟都没搞好。。
线程池
找了个以前的作业demo来配置好线程池
yml文件
# springboot异步线程配置
async:
executor:
thread:
core_pool_size: 5 #配置核心线程数
keep_alive_seconds: 60 #设置线程活跃时间(秒)
max_pool_size: 10 #配置最大线程数
name_prefix: async-service- #配置线程池中的线程的名称前缀
queue_capacity: 100 #配置队列大小
线程池配置类
@Configuration
@EnableAsync
public class ExecutorConfig {
private static final Logger logger = LoggerFactory.getLogger(ExecutorConfig.class);
@Value("${async.executor.thread.core_pool_size}")
private int corePoolSize;
@Value("${async.executor.thread.max_pool_size}")
private int maxPoolSize;
@Value("${async.executor.thread.queue_capacity}")
private int queueCapacity;
@Value("${async.executor.thread.name_prefix}")
private String namePrefix;
@Value("${async.executor.thread.keep_alive_seconds}")
private Integer keepAliveSeconds;
@Bean(name = "asyncServiceExecutor")
public ThreadPoolTaskExecutor asyncServiceExecutor() {
logger.info("start asyncServiceExecutor");
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
// 配置核心线程数
executor.setCorePoolSize(corePoolSize);
// 配置最大线程数
executor.setMaxPoolSize(maxPoolSize);
// 设置线程活跃时间(秒)
executor.setKeepAliveSeconds(keepAliveSeconds);
// 配置队列大小
executor.setQueueCapacity(queueCapacity);
// 配置线程池中的线程的名称前缀
executor.setThreadNamePrefix(namePrefix);
// 等待所有任务结果候再关闭线程池
executor.setWaitForTasksToCompleteOnShutdown(true);
// 任务的等待时间 如果超过这个时间还没有销毁就 强制销毁,以确保应用最后能够被关闭,而不是阻塞住
executor.setAwaitTerminationSeconds(60);
// 设置拒绝策略
// rejection-policy:当pool已经达到max size的时候,如何处理新任务
// CALLER_RUNS:不在新线程中执行任务,而是有调用者所在的线程来执行
executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
// 执行初始化,初始化 core 线程
executor.initialize();
return executor;
}
}
自动导包要注意别导错了
这里自动导包导错了
应该是这俩
控制层
@Controller
public class TestController {
private static final Logger logger= LoggerFactory.getLogger(TestController.class);
@Autowired
AsyncService asyncService;
@LogAround("测试插入数据")
@PostMapping("/test")
public void test(){
CountDownLatch countDownLatch=new CountDownLatch(10000);
LocalDateTime start = LocalDateTime.now();
for(int i=0;i<10000;i++){
asyncService.executeAsyncInsertData(countDownLatch);
}
try{
countDownLatch.await();
}catch (Exception e){
logger.info("线程异常:{}",e.getMessage());
}
}
接口
public interface AsyncService {
public void executeAsyncInsertData(CountDownLatch countDownLatch);
}
实现
@Service
public class AsyncServiceImpl implements AsyncService{
@Autowired
TestDaoImpl testDao; //Mapper来的,这个demo没有就用Dao
private static final Logger logger = LoggerFactory.getLogger(AsyncServiceImpl.class);
@Async("asyncServiceExecutor")
@Override
public void executeAsyncInsertData(CountDownLatch countDownLatch) {
try{
logger.info("异步线程开始");
testDao.addTestData();
logger.info("异步线程结束");
}finally {
countDownLatch.countDown();
}
}
}
这里是用靠Spring异步注解@Async实现的,配置类Bean名字要和异步方法上的@Async名字一致
运行
。。。。上面省略。。。。。。
2024-07-07 15:03:19.087 [async-service-2] INFO [4b7c79c074bd4fc3b7d7b56d483f2a37] c.b.s.m.t.s.AsyncServiceImpl - [executeAsyncInsertData,32] - 异步线程结束
2024-07-07 15:03:19.087 [async-service-9] INFO [c291e06257c84ca9ab408f495e1f2016] c.b.s.m.t.s.AsyncServiceImpl - [executeAsyncInsertData,32] - 异步线程结束
2024-07-07 15:03:19.087 [async-service-4] INFO [75bd66d059154c068a46aea05f34d1a3] c.b.s.m.t.s.AsyncServiceImpl - [executeAsyncInsertData,32] - 异步线程结束
2024-07-07 15:03:19.087 [http-nio-8076-exec-1] INFO [] c.b.s.m.t.c.TestController - [logAround,40] - 方法结束:测试插入数据----test, 用时:12102毫秒
相比于之前for循环是快多了,数据库插入数据成功
Faker造假数据
引入Faker依赖
<dependency>
<groupId>com.github.javafaker</groupId>
<artifactId>javafaker</artifactId>
<version>1.0.2</version>
</dependency>
引入雪花漂移算法依赖(因为项目中基本用的雪花算法/UUID生成id,这里造假数据也用这个了)
<dependency>
<groupId>com.github.yitter</groupId>
<artifactId>yitter-idgenerator</artifactId>
<version>1.0.6</version>
</dependency>
控制层
@Controller
public class TestController {
@Autowired
AsyncService asyncService;
@GetMapping("/testCreateData")
public void testCreateData() throws InterruptedException {
short workId = 1;
IdGeneratorOptions options = new IdGeneratorOptions(workId);
YitIdHelper.setIdGenerator(options);
Faker faker = new Faker(); //创建JavaFaker对象
ArrayList<User> users = new ArrayList<>();
for (int i = 0; i < 500000; i++) {
Long id = YitIdHelper.nextId();
String name = faker.name().fullName();
Integer gender = (int) (10 * Math.random() % 2);
String streetAddress = faker.address().streetAddress();
String paragraph = faker.lorem().paragraph(50);
users.add(new User(id, name, gender, streetAddress, paragraph));
}
// 异步多线程 插入数据库
int batchSize = 1000;
int number = 50000;
CountDownLatch countDownLatch = new CountDownLatch(number);
for(int i = 0;i<number;i++){
List<User> batchList;
batchList = users.subList(i*batchSize,(i+1)*batchSize);
asyncService.asyncBatchAddUser(countDownLatch,batchList);
}
countDownLatch.await();
System.out.println("true")
}
}
看一下整体用时
2024-07-09 22:51:42.481 INFO 10796 --- [nio-8060-exec-1] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring DispatcherServlet 'dispatcherServlet'
2024-07-09 22:51:42.482 INFO 10796 --- [nio-8060-exec-1] o.s.web.servlet.DispatcherServlet : Initializing Servlet 'dispatcherServlet'
2024-07-09 22:51:42.483 INFO 10796 --- [nio-8060-exec-1] o.s.web.servlet.DispatcherServlet : Completed initialization in 1 ms
================================省略===================
2024-07-09 22:59:37.883 DEBUG 10796 --- [async-service-8] t.n.d.m.TestDataMapper.addUserBatch : <== Updates: 1000
2024-07-09 22:59:37.935 DEBUG 10796 --- [async-service-5] t.n.d.m.TestDataMapper.addUserBatch : <== Updates: 1000
2024-07-09 22:59:37.935 DEBUG 10796 --- [async-service-2] t.n.d.m.TestDataMapper.addUserBatch : <== Updates: 1000
================================省略===================
造数据用时
2024-07-09 23:09:30.481 INFO 4264 --- [nio-8060-exec-1] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring DispatcherServlet 'dispatcherServlet'
2024-07-09 23:09:30.481 INFO 4264 --- [nio-8060-exec-1] o.s.web.servlet.DispatcherServlet : Initializing Servlet 'dispatcherServlet'
2024-07-09 23:09:30.482 INFO 4264 --- [nio-8060-exec-1] o.s.web.servlet.DispatcherServlet : Completed initialization in 1 ms
================================省略===================
2024-07-09 23:15:58.487 ERROR 4264 --- [nio-8060-exec-1] o.a.c.c.C.[.[.[/].[dispatcherServlet] : Servlet.service() for servlet
这里因为System.out.println()失灵了,所以抛出异常来记录用时。整体用了八分钟,其中造50w条数据竟然用了六分钟。。后面试试用python来造测试数据吧
发现了一个事情,数据早就插入数据库成功了,但是控制台只有Mapper插入相关的日志,没有System.out.println()的输出。
去查了相关信息,意外发现System.out.println()内部用了synchronized
后面想想可能是因为输出顺序是乱的被刷上去了/在日志缓冲区里还没输出?
关于ArrayList.subList()
我一开始分割用户列表的时候,用的ArrayList去接收,但是发现会报错。后面点进去看了一下,发现它内部不是建了一个ArrayList返回,而是它自己的内部类SubList,所以它虽然返回了个List但是要清除实际上是SubList,如果对分割后的列表进行改动的话,会影响到原列表。
private static class SubList<E> extends AbstractList<E> implements RandomAccess {
private final ArrayList<E> root;
private final SubList<E> parent;
private final int offset;
private int size;
/**
* Constructs a sublist of an arbitrary ArrayList.
*/
public SubList(ArrayList<E> root, int fromIndex, int toIndex) {
this.root = root;
this.parent = null;
this.offset = fromIndex;
this.size = toIndex - fromIndex;
this.modCount = root.modCount;
}
使用EasyExcel导出数据
分页查询:
SQL优化
User类
@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
public class User {
@ExcelProperty(index = 0, converter = UserNoConverter.class)
private Integer id;
@ExcelProperty(value = "名")
private String firstName;
@ExcelProperty(value = "姓")
private String lastName;
@ExcelProperty(value = "邮箱")
private String email;
@ExcelProperty(index = 4, converter = GenderConverter.class)
private String gender;
@ExcelProperty(index = 5)
private String ipAddress;
@ExcelIgnore
private String contract;
@ExcelProperty(index = 7)
private String product;
}
``
UserNoConverter转换器
/**
* @Description 从excel-》数据库在id前加上uno: 从数据库(含uno:)-》excel不做操作(也可以拆字符串不要uno:)
* @Author supeki
* @Date 2024/7/10 23:07
* @Version 1.0
*/
public class UserNoConverter implements Converter<Long> {
/**
* 支持的java类型
* @return
*/
@Override
public Class<?> supportJavaTypeKey() {
return Long.class;
}
/**
* 支持的excel值类型
* @return
*/
@Override
public CellDataTypeEnum supportExcelTypeKey() {
return CellDataTypeEnum.STRING;
}
/**
* 这里读的时候会调用
*
* @param context
* @return
*/
@Override
public Long convertToJavaData(ReadConverterContext<?> context) {
return Long.valueOf(context.getReadCellData().getStringValue());
}
/**
* 这里是写的时候会调用(数据库啥样,excel啥样)
*
* @return
*/
@Override
public WriteCellData<?> convertToExcelData(WriteConverterContext<Long> context) {
return new WriteCellData<>(String.valueOf(context.getValue()));
}
}
GenderConverter性别转换器
/**
* @Description 数据库存的0-》excel存男;如果要换成male/female来表示男女的话,记得接口泛型要改成String;
* @Author supeki
* @Date 2024/7/10 23:05
* @Version 1.0
*/
public class GenderConverter implements Converter<Integer> { //java里什么类型这里就填什么类型
@Override
public Class<?> supportJavaTypeKey() {
return Integer.class;
}
@Override
public CellDataTypeEnum supportExcelTypeKey() {
return CellDataTypeEnum.STRING;
}
/**
* 从excel读数据时候调用
* @param cellData
* @param contentProperty
* @param globalConfiguration
* @return
*/
@Override
public Integer convertToJavaData(ReadCellData<?> cellData, ExcelContentProperty contentProperty,
GlobalConfiguration globalConfiguration) {
String value = cellData.getStringValue();
if (StringUtils.isBlank(value)){
// 未知
return -1;
}
if (value.indexOf(“男”) != -1) {
return 0;
}
if (value.indexOf(”女“) != -1) {
return 1;
}
return -1;
}
/**
* 写数据到excel里面
* @param context
* @return
*/
@Override
public WriteCellData<?> convertToExcelData(WriteConverterContext<Integer> context) {
Integer value = context.getValue();
if (Objects.equals(value, 0)) {
return new WriteCellData<>("男");
}
if (Objects.equals(value, 1)) {
return new WriteCellData<>("女");
}
return new WriteCellData<>("未知");
}
}
SQL
@Mapper
@Repository
public interface TestDataMapper {
@Insert({
"<script>",
"insert into test_data(id,name,gender,street_address,paragraph) values ",
"<foreach collection='item' item='item' index='index' separator=','>",
"(#{item.id}, #{item.name},#{item.gender},#{item.streetAddress},#{item.paragraph})",
"</foreach>",
"</script>"
})
int addUserBatch(@Param(value="item") List<User> userList);
@Select("select * from test_data where id >= (select id from test_data limit #{start},1) limit #{end}")
List<User> getDataByPage(int start,int end);
@Select("select * from test_data")
List<User> getAllData();
@Select("select count(*) from test_data")
Integer getAllDataSize();
}
控制层
@GetMapping("/testExcelWrite")
public void testExcelWrite(){
String fileName = "D:\\export.xlsx";
long l1 = System.currentTimeMillis();
List<User> users = asyncService.getAllData();
int size = users.size();
try(ExcelWriter excelWriter= EasyExcel.write(fileName,User.class).build()){
WriteSheet writeSheet=EasyExcel.writerSheet("用户信息").build();
for(int i=0;i<size;i+=1000){
List<User> users1 = users.subList(i, i + 1000);
excelWriter.write(users1,writeSheet);
}
}
logger.info("总耗时:{}",System.currentTimeMillis()-l1);
}
这里是将所有数据一次性查出,但是数据量大的话可能会导致OOM
总耗时
2024-07-10 23:41:24.348 INFO 29116 --- [nio-8060-exec-1] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring DispatcherServlet 'dispatcherServlet'
2024-07-10 23:41:24.348 INFO 29116 --- [nio-8060-exec-1] o.s.web.servlet.DispatcherServlet : Initializing Servlet 'dispatcherServlet'
2024-07-10 23:41:24.350 INFO 29116 --- [nio-8060-exec-1] o.s.web.servlet.DispatcherServlet : Completed initialization in 1 ms
====================省略===================
2024-07-10 23:43:47.803 INFO 29116 --- [nio-8060-exec-1] top.naccl.dwz.controller.TestController : 总耗时:143429
控制层(分页查询)
@GetMapping("/testExcelWrite")
public void testExcelWrite(){
String fileName = "D:\\export.xlsx";
long l1 = System.currentTimeMillis();
Integer size = asyncService.getAllDataSize();
try(ExcelWriter excelWriter= EasyExcel.write(fileName,User.class).build()){
WriteSheet writeSheet=EasyExcel.writerSheet("用户信息").build();
for(int i=0;i<size;i+=1000){
List<User> users1 = asyncService.getDataByPage(i, 1000);
excelWriter.write(users1,writeSheet);
}
}
logger.info("总耗时:{}",System.currentTimeMillis()-l1);
}
总耗时
2024-07-10 23:55:00.212 INFO 10696 --- [nio-8060-exec-1] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring DispatcherServlet 'dispatcherServlet'
2024-07-10 23:55:00.213 INFO 10696 --- [nio-8060-exec-1] o.s.web.servlet.DispatcherServlet : Initializing Servlet 'dispatcherServlet'
2024-07-10 23:55:00.214 INFO 10696 --- [nio-8060-exec-1] o.s.web.servlet.DispatcherServlet : Completed initialization in 1 ms
====================省略===================
2024-07-11 00:12:48.008 INFO 10696 --- [nio-8060-exec-1] top.naccl.dwz.controller.TestController : 总耗时:1067759
使用EasyExcel导出数据
监听器
package top.naccl.dwz.config;
import com.alibaba.excel.context.AnalysisContext;
import com.alibaba.excel.event.AnalysisEventListener;
import com.alibaba.excel.util.ListUtils;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import top.naccl.dwz.entity.User;
import top.naccl.dwz.mapper.TestDataMapper;
import top.naccl.dwz.service.AsyncService;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.CountDownLatch;
/**
* @Description TODO
* @Author supeki
* @Date 2024/7/11 0:00
* @Version 1.0
*/
@Slf4j
public class UserExcelListener extends AnalysisEventListener<User> {
/**
* 不能交给spring管理,即不能加@Autowired那些注解
*/
AsyncService asyncService;
public UserExcelListener(AsyncService asyncService){
this.asyncService = asyncService;
}
public UserExcelListener(){} //不能省略无参构造
/**
* 每隔1000条处理下,然后清理list ,方便内存回收
*/
private static final int BATCH_COUNT = 1000;
/**
* 缓存的数据
*/
private List<User> cachedDataList = ListUtils.newArrayListWithExpectedSize(BATCH_COUNT);
/**
*
* @param exception
* @param context
* @throws Exception
*/
@Override
public void onException(Exception exception, AnalysisContext context) throws Exception {
log.error("======>>>解析异常:", exception);
throw exception;
}
/**
* 当读取到一行数据时,会调用这个方法,并将读取到的数据以及上下文信息作为参数传入
* 可以在这个方法中对读取到的数据进行处理和操作,处理数据时要注意异常错误,保证读取数据的稳定性
* @param user
* @param context
*/
@Override
public void invoke(User user, AnalysisContext context) {
log.info("解析到一条数据:{}", user);
cachedDataList.add(user);
if (cachedDataList.size() == BATCH_COUNT) {
// 处理缓存的数据,比如说入库。。。
asyncService.addUserBatchWrite(cachedDataList);
// 然后清空
cachedDataList.clear();
}
}
/**
* 当每个sheet所有数据读取完毕后,会调用这个方法,可以在这个方法中进行一些收尾工作,如资源释放、数据汇总等。
* @param context
*/
@Override
public void doAfterAllAnalysed(AnalysisContext context) {
// 收尾工作,处理剩下的缓存数据。。。
try {
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
log.info("sheet={} 所有数据解析完成!", context.readSheetHolder().getSheetName());
}
}
控制层
@GetMapping("/TestExcelRead")
public void testExcelRead() {
String fileName = "D:\\export.xlsx";
long l1 = System.currentTimeMillis();
EasyExcel.read(fileName, User.class, new UserExcelListener(asyncService)).sheet().doRead();
logger.info("总耗时:{}",System.currentTimeMillis()-l1);
}
总耗时
2024-07-11 22:56:52.700 INFO 18992 --- [nio-8060-exec-1] top.naccl.dwz.config.UserExcelListener : sheet=用户信息 所有数据解析完成!
2024-07-11 22:56:52.701 INFO 18992 --- [nio-8060-exec-1] top.naccl.dwz.controller.TestController : 总耗时:134088