java写出大数据excel的方法探索(一)
需求:每个excel能够有1000W调数据量,这是个上限值。
计划方案:
1、JXL只能处理2003的而且有65535行数的限制,放弃;
2、POI处理写出,带基础样式,只能有15W左右就内存泄漏;不带样式,25W以上就内存泄漏;通过查看资源监视器,确认主要耗费内存为创建cell对象,导致内存泄漏,单纯增加JVM内存大小,并不能根本解决;放弃,实在没办法预留备选不带样式。
3、excel操作过程中,实际也是xml格式的读写,所以直接从基础的xml读写入手,本文就是一步步的探索使用xml怎么写出大数据的excel,虽然结果不是完美,但能够满足需求。
探索步骤:
以下步骤参考了很多百度搜到的各位大神的demo,借助了很多思路。
原理探索:
流程:excel--->xml---->excel
excel到xml:
①首先创建一个excel2003的xls文件,命名为test.xls;
②在默认的3个sheet页中任意填入一些数据,三个sheet都填,以便在xml中看到现象;
③另存为2003的xml电子表格;
④使用nodepad++打开xml,可以看到内容如下:
<span style="white-space:pre"></span><pre name="code" class="html"><?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
<Created>2006-09-16T00:00:00Z</Created>
<LastSaved>2006-09-16T00:00:00Z</LastSaved>
<Version>14.00</Version>
</DocumentProperties>
<OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office">
<AllowPNG/>
<RemovePersonalInformation/>
</OfficeDocumentSettings>
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
<WindowHeight>8010</WindowHeight>
<WindowWidth>14805</WindowWidth>
<WindowTopX>240</WindowTopX>
<WindowTopY>105</WindowTopY>
<ActiveSheet>2</ActiveSheet>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font ss:FontName="宋体" x:CharSet="134" ss:Size="11" ss:Color="#000000"/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
</Styles>
<Worksheet ss:Name="Sheet1">
<Table ss:ExpandedColumnCount="5" ss:ExpandedRowCount="2" x:FullColumns="1"
x:FullRows="1" ss:DefaultColumnWidth="54" ss:DefaultRowHeight="13.5">
<Row>
<Cell><Data ss:Type="Number">1</Data></Cell>
<Cell><Data ss:Type="Number">1</Data></Cell>
<Cell><Data ss:Type="Number">1</Data></Cell>
<Cell><Data ss:Type="Number">1</Data></Cell>
<Cell><Data ss:Type="Number">1</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="Number">2</Data></Cell>
<Cell><Data ss:Type="Number">2</Data></Cell>
<Cell><Data ss:Type="Number">2</Data></Cell>
<Cell><Data ss:Type="Number">2</Data></Cell>
<Cell><Data ss:Type="Number">2</Data></Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<PageSetup>
<Header x:Margin="0.3"/>
<Footer x:Margin="0.3"/>
<PageMargins x:Bottom="0.75" x:Left="0.7" x:Right="0.7" x:Top="0.75"/>
</PageSetup>
<Print>
<ValidPrinterInfo/>
<PaperSizeIndex>9</PaperSizeIndex>
<HorizontalResolution>600</HorizontalResolution>
<VerticalResolution>600</VerticalResolution>
</Print>
<Panes>
<Pane>
<Number>3</Number>
<ActiveRow>27</ActiveRow>
<ActiveCol>1</ActiveCol>
</Pane>
</Panes>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
<Worksheet ss:Name="Sheet2">
<Table ss:ExpandedColumnCount="3" ss:ExpandedRowCount="2" x:FullColumns="1"
x:FullRows="1" ss:DefaultColumnWidth="54" ss:DefaultRowHeight="13.5">
<Row>
<Cell><Data ss:Type="Number">3</Data></Cell>
<Cell><Data ss:Type="Number">3</Data></Cell>
<Cell><Data ss:Type="Number">3</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="Number">4</Data></Cell>
<Cell><Data ss:Type="Number">4</Data></Cell>
<Cell><Data ss:Type="Number">4</Data></Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<PageSetup>
<Header x:Margin="0.3"/>
<Footer x:Margin="0.3"/>
<PageMargins x:Bottom="0.75" x:Left="0.7" x:Right="0.7" x:Top="0.75"/>
</PageSetup>
<Panes>
<Pane>
<Number>3</Number>
<ActiveRow>1</ActiveRow>
<ActiveCol>2</ActiveCol>
</Pane>
</Panes>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
<Worksheet ss:Name="<span style="color:#ff0000;">Sheet3</span>">
<Table ss:ExpandedColumnCount="4" ss:ExpandedRowCount="2" x:FullColumns="1"
x:FullRows="1" ss:DefaultColumnWidth="54" ss:DefaultRowHeight="13.5">
<Row>
<Cell><Data ss:Type="Number">5</Data></Cell>
<Cell><Data ss:Type="Number">5</Data></Cell>
<Cell><Data ss:Type="Number">5</Data></Cell>
<Cell><Data ss:Type="Number">5</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="Number">6</Data></Cell>
<Cell><Data ss:Type="Number">6</Data></Cell>
<Cell><Data ss:Type="Number">6</Data></Cell>
<Cell><Data ss:Type="Number">6</Data></Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<PageSetup>
<Header x:Margin="0.3"/>
<Footer x:Margin="0.3"/>
<PageMargins x:Bottom="0.75" x:Left="0.7" x:Right="0.7" x:Top="0.75"/>
</PageSetup>
<Selected/>
<Panes>
<Pane>
<Number>3</Number>
<ActiveRow>5</ActiveRow>
<ActiveCol>7</ActiveCol>
</Pane>
</Panes>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
</Workbook>
上面的xml我们主要关注的内容不多,可以看到每个sheet页的结构都是一致,由此可以猜测,可以一行行的写出,最终偏成这个格式;
④由此进行第二部xml转为excel;
使用最普通的字节流读取,效率最高;
简单demo如下:
public class WriteBigDataToExccel { private static String path = System.getProperty("user.dir") + File.separator + "config/"; public static void main(String[] args) { BufferedInputStream bis = null; BufferedOutputStream bos = null; FileInputStream fis = null; FileOutputStream fos = null; File fi = new File(path + "test.xml"); File fo = new File(path + "t1.xls"); try { fis = new FileInputStream(fi); bis = new BufferedInputStream(fis); fos = new FileOutputStream(fo); bos = new BufferedOutputStream(fos); byte[] b = new byte[1024]; int len = 0; while(-1 != (len = bis.read(b))) { bos.write(b, 0, len); bos.flush(); } } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { close(fis, bis, fos, bos); } } /** * close(这里用一句话描述这个方法的作用) * (这里描述这个方法适用条件 – 可选) * @param fis * @param bis * @param fos * @param bos *void * @exception * @since 1.0.0 */ private static void close(FileInputStream fis, BufferedInputStream bis, FileOutputStream fos, BufferedOutputStream bos) { if(null != fis) { try { fis.close(); } catch (IOException e) { e.printStackTrace(); } } if(null != bis) { try { bis.close(); } catch (IOException e) { e.printStackTrace(); } } if(null != fos) { try { fos.close(); } catch (IOException e) { e.printStackTrace(); } } if(null != bos) { try { bos.close(); } catch (IOException e) { e.printStackTrace(); } } }
运行之后得到一个excel,目前的话只能生产xls的格式,如果是xlsx的格式无法正常打开;得到了一个与原excel内容完全相同的excel;
以上是接下来要进行写出大数据的原理,也是模版。