背景:将一个excel文档中几万条业务数据填充完整,所缺失的数据来自线上mysql,而我们恰好有线上备库相关表的查询权限。这并不是业务需求,只是帮业务人员一个小忙。
方法很简单,先从excel中读数据,再从mysql中读数据,再写入excel文档,没有技术难点。带着急速完成显摆一下的目的,搞清excel文档所缺失的数据来自哪张表哪个字段之后,立马打开intellij。
关键是完成的速度嘛,不管美观。读写xls,以前用过jxl,于是首先拿过来jxl用。
<dependency> <groupId>net.sourceforge.jexcelapi</groupId> <artifactId>jxl</artifactId> <version>2.6.12</version> </dependency> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.1.27</version> <scope>compile</scope> </dependency>
try { Workbook book = Workbook.getWorkbook(FetchChannelData2XLSXDocument.class.getResourceAsStream(XLS)); Sheet sheet = book.getSheet(0); for (int i = 1; i < sheet.getRows(); i++) { Cell cell = sheet.getCell(0, i); String contents = cell.getContents(); if (contents != null && contents.length() > 0) { keys.add(contents); } } book.close(); System.out.println("读取xls成功:数据行数:" + (sheet.getRows() - 1) + ";map size:" + result.size()); System.out.println("第一条:" + sheet.getCell(0, 1).getContents()); System.out.println("第末条:" + sheet.getCell(0, sheet.getRows() - 1)); } catch (Exception e) { throw new RuntimeException("读取xls失败", e); }
private static Connection getConnection() { try { Connection connection = DriverManager.getConnection(URL, USER, PASSWORD); log("***************************获取连接成功***********"); return connection; } catch (SQLException e) { log("***************************获取连接失败:", e.getMessage()); try { Thread.sleep(2000); } catch (InterruptedException e1) { } return getConnection(); } }
private static String query(Connection connection, String param) throws SQLException { PreparedStatement preparedStatement = connection.prepareStatement(SQL); preparedStatement.setString(1, param); ResultSet resultSet = preparedStatement.executeQuery(); if (resultSet.next()) { return resultSet.getString(1); } else { return ""; } }
Connection connection = getConnection(); int nullCount = 0; int emptyCount = 0; for (int i = 0; i < keys.size(); i++) { String key = keys.get(i); String value = "null"; try { value = query(connection, key); } catch (SQLException e) { log("查询失败", key, e.getMessage()); connection = getConnection(); } if ("null".equals(value)) { nullCount++; } if ("".equals(value)) { emptyCount++; } result.put(key, value); int count = QUERY_COUNT.incrementAndGet(); if ((count % 1000) == 0) { log("正常加载过程", count); } }
就这样写了起来,写完一运行,有问题。
抛错
Caused by: jxl.read.biff.BiffException: Unable to recognize OLE stream
at jxl.read.biff.CompoundFile.<init>(CompoundFile.java:116)
at jxl.read.biff.File.<init>(File.java:127)
at jxl.Workbook.getWorkbook(Workbook.java:268)
at jxl.Workbook.getWorkbook(Workbook.java:253)
原来是jxl这个版本支持xls 2003,一看没有与时俱进果断放弃了,网上找了apache poi,原来这个是专业的,不仅xls,ms-office的很多文档都支持,后边文档介绍下这个。
首先依赖:
<dependency> <groupId>org.apache.poi</groupId> <artifactId>poi</artifactId> <version>3.10-FINAL</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml</artifactId> <version>3.10-FINAL</version> </dependency>读xls
try { XSSFWorkbook workbook2007 = new XSSFWorkbook(FetchChannelData2XLSXDocument.class.getResourceAsStream(XLS)); XSSFSheet sheet = workbook2007.getSheetAt(0);// 取出第一个工作表,索引是0 // 开始循环遍历行,表头不处理,从1开始 for (int i = 1; i <= sheet.getLastRowNum(); i++) { XSSFCell cell = sheet.getRow(i).getCell(0);// 获取单元格对象 String contents = cell.getStringCellValue(); if (contents != null && contents.length() > 0) { keys.add(contents); } } log("读取xls成功:数据行数:" + (sheet.getLastRowNum()) + ";map size:" + result.size()); log("第一条:" + sheet.getRow(1).getCell(0).getStringCellValue()); log("第末条:" + sheet.getRow(sheet.getLastRowNum()).getCell(0).getStringCellValue()); } catch (Exception e) { throw new RuntimeException("读取xls失败", e); }写xls
try { XSSFSheet sheet = workbook2007.getSheetAt(0);// 取出第一个工作表,索引是0 int count = 0; int writeCount = 0; // 开始循环遍历行,表头不处理,从1开始 for (int i = 1; i <= sheet.getLastRowNum(); i++) { XSSFCell write2cell = sheet.getRow(i).createCell(17);// 获取单元格对象 String key = sheet.getRow(i).getCell(0).getStringCellValue(); if (key != null && key.length() > 0) { count++; String value = result.get(key); if (value == null || "null".equals(value) || "".equals(value)) { continue; } if (CHANNEL_NAMES.containsKey(value)) { value = CHANNEL_NAMES.get(value); } write2cell.setCellValue(value); writeCount++; } } log("写入xls" + "总条数" + count + "有值写入" + writeCount); File file = new File(NEW_XLS_FILE_NAME); workbook2007.write(new FileOutputStream(file)); log("保存成功:文件路径:", file.getAbsolutePath()); } catch (Exception e) { throw new RuntimeException("写入xls失败", e); }
ok了,跑起来了。另一个问题,10万条左右数据,一条一条查询也太慢了,等得受不了,加个多线程吧。
int times = keys.size() / 10000 + 1; final CountDownLatch latch = new CountDownLatch(times); for (int i = 0; i < times; i++) { final int finalI = i; executor.execute(new Runnable() { @Override public void run() { Connection connection = getConnection(); long startTime = System.currentTimeMillis(); loadDataFromDB(connection, finalI * 10000, 10000); log(" cost", (System.currentTimeMillis() - startTime)); latch.countDown(); } }); } try { latch.await(); } catch (InterruptedException e) { e.printStackTrace(); }这样快多了,同事的需求完成了,前后也花了两三个小时,关键是apache poi的入手使用碰到了问题。
1.依赖jar包,不仅仅是poi还有,poi-ooxml。
poi-version-yyyymmdd.jar
用于操作.xls文件;依赖于commons-logging, commons-codec, log4j;
poi-scratchpad-version-yyyymmdd.jar
用于操作.ppt、.doc、.vsd、.pub、.msg文件;依赖于poi;
poi-ooxml-version-yyyymmdd.jar、poi-ooxml-schemas-version-yyyymmdd.jar
用于操作.xlsx、.pptx、docx文件;依赖于poi, dom4j,xmlbeans, stax-api-1.0.1;操作Excel主要是指ss包、xssf包;
2.改完xls的内容,要写回文档,才能有效。
workbook2007.write(new FileOutputStream(file));
apache poi是什么?怎么用?有现成的文档,贴出来,自己就不写了。
1. http://blog.sina.com.cn/s/blog_667ac0360102eaf8.html
2. http://poi.apache.org/
3. http://www.yiibai.com/apache_poi/