Java对Excel(03,07)进行上传、解析、验证、入库

本文主要记录最近的工作内容,使用java实现对Excel(03,07)的上传、解析、验证和入库(PostgreSQL)。

 

一。上传

 实际就是实现文件上传至服务器即可,但是方法有很多,首先要考虑采用何种方法实现上传:

 

1.参考文章:

Java开发过程中文件上传的各种方式全面总结  http://javacrazyer.iteye.com/blog/707705

主要有以下几种方法:

JSP+Servlet(或纯JSP);Struts2;Struts;FTP;ExtJs;Flex;

 

2.我采用的是jsp+Servlet的方式来实现的。需要先从apache官网下载fileupload.jar。

UploadFileServlet.java的主要代码:

Java对Excel(03,07)进行上传、解析、验证、入库

testfileupload.jsp的主要代码:

Java对Excel(03,07)进行上传、解析、验证、入库


3.需要注意的几个问题

(1)上传文件的大小限制

 设置  sfu.setFileSizeMax(1024*1024*10000);单位byte.表示近10G。

测试上传1.5G的文件没问题,2.9G以上的文件上传失败,无错误提示,而是“无法显示该页面”的错误:

Java对Excel(03,07)进行上传、解析、验证、入库
所以目前不清楚到底支持多大的文件上传。

 

二。解析

1.java解析excel通常有两种方法:(1)jxl(2)poi.

至于选用哪种那做得看具体的需求,我这次主要是要支持03,07的excel,目前jxl更新慢,尚不支持07,所以只能选用poi;本次只涉及到excel的读操作,没有写操作,jxl的写快于poi,但jxl的读慢于poi.

综合下来,选用poi。这个poi具体是啥意思我不太清楚,但不是常说的感兴趣点(point of interest)哈.

 

2.是否支持大文件解析

excel03最多只能有65536条记录,65536行*256列;excel07:1048576行*16384列。

excel07文件超过65536条记录,另存为xls时会报错:

Java对Excel(03,07)进行上传、解析、验证、入库

所以要支持大文件,必须得用excel07格式来操作。

 

需要考虑这点,因为这决定了是用DOM做,还是用SAX。

就1和2来看,java解析excel比起.net解析excel是麻烦了很多很多啊,.net解析excel主要有三种方法:com组件;oledb;open xml。其中使用oledb最常用也最简单。

 

3.jar包下载

Java对Excel(03,07)进行上传、解析、验证、入库

jxl;http://www.andykhan.com/jexcelapi/download.html,最新版本为JExcelApi v2.6.12

poi:http://poi.apache.org/download.html#POI-3.9,最新版本为poi-bin-3.9-20121203.zip

 

4.poi+SAX,支持excel07大文件解析(EventUserModel)

 这种方式应该是最高级别的,能满足大数据量需求,也不会造成oom错误的。我这次需要采用的就是这种实现方式。

 主要定义了三个类来实现:

(1)Excel2003Reader.java,操作03Excel。


package com.cbe.excel;

import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.poi.hssf.eventusermodel.EventWorkbookBuilder.SheetRecordCollectingListener;
import org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener;
import org.apache.poi.hssf.eventusermodel.HSSFEventFactory;
import org.apache.poi.hssf.eventusermodel.HSSFListener;
import org.apache.poi.hssf.eventusermodel.HSSFRequest;
import org.apache.poi.hssf.eventusermodel.MissingRecordAwareHSSFListener;
import org.apache.poi.hssf.eventusermodel.dummyrecord.LastCellOfRowDummyRecord;
import org.apache.poi.hssf.eventusermodel.dummyrecord.MissingCellDummyRecord;
import org.apache.poi.hssf.model.HSSFFormulaParser;
import org.apache.poi.hssf.record.BOFRecord;
import org.apache.poi.hssf.record.BlankRecord;
import org.apache.poi.hssf.record.BoolErrRecord;
import org.apache.poi.hssf.record.BoundSheetRecord;
import org.apache.poi.hssf.record.FormulaRecord;
import org.apache.poi.hssf.record.LabelRecord;
import org.apache.poi.hssf.record.LabelSSTRecord;
import org.apache.poi.hssf.record.NumberRecord;
import org.apache.poi.hssf.record.Record;
import org.apache.poi.hssf.record.SSTRecord;
import org.apache.poi.hssf.record.StringRecord;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;


public class Excel2003Reader implements HSSFListener {

 private int minColumns = -1;
 private POIFSFileSystem fs;
 private int lastRowNumber;
 private int lastColumnNumber;

 
 private boolean outputFormulaValues = true;

 
 private SheetRecordCollectingListener workbookBuildingListener;
 //excel2003工作薄
 private HSSFWorkbook stubWorkbook;

 // Records we pick up as we process
 private SSTRecord sstRecord;
 private FormatTrackingHSSFListener formatListener;

 //表索引
 private int sheetIndex = -1;
 private BoundSheetRecord[] orderedBSRs;
 //@SuppressWarnings("unchecked")
 @SuppressWarnings("rawtypes")
 private ArrayList boundSheetRecords = new ArrayList();

 // For handling formulas with string results
 private int nextRow;
 private int nextColumn;
 private boolean outputNextStringRecord;
 //当前行
 //private int curRow = 0;
 //存储行记录的容器
 private List<String> rowlist = new ArrayList<String>();
 
 private List<List<String>> exceldata=new ArrayList<List<String>>();
 @SuppressWarnings( "unused")
 private String sheetName;
 
 
  public List<List<String>> getExcelData(){
     return exceldata;
    }
 
 
 public void process(String fileName) throws IOException {
  this.fs = new POIFSFileSystem(new FileInputStream(fileName));
  MissingRecordAwareHSSFListener listener = new MissingRecordAwareHSSFListener(
    this);
  formatListener = new FormatTrackingHSSFListener(listener);
  HSSFEventFactory factory = new HSSFEventFactory();
  HSSFRequest request = new HSSFRequest();
  if (outputFormulaValues) {
   request.addListenerForAllRecords(formatListener);
  } else {
   workbookBuildingListener = new SheetRecordCollectingListener(
     formatListener);
   request.addListenerForAllRecords(workbookBuildingListener);
  }
  factory.processWorkbookEvents(request, fs);
 }
 
 
 @SuppressWarnings("unchecked")
 public void processRecord(Record record) {
     //List<String> rowlist = new ArrayList<String>();
  int thisRow = -1;
  int thisColumn = -1;
  String thisStr = null;
  String value = null;
  switch (record.getSid()) {
   case BoundSheetRecord.sid:
    boundSheetRecords.add(record);
    break;
   case BOFRecord.sid:
    BOFRecord br = (BOFRecord) record;
    if (br.getType() == BOFRecord.TYPE_WORKSHEET) {
     // 如果有需要,则建立子工作薄
     if (workbookBuildingListener != null && stubWorkbook == null) {
      stubWorkbook = workbookBuildingListener
        .getStubHSSFWorkbook();
     }
     
     sheetIndex++;
     if (orderedBSRs == null) {
      orderedBSRs = BoundSheetRecord
        .orderByBofPosition(boundSheetRecords);
     }
     sheetName = orderedBSRs[sheetIndex].getSheetname();
    }
    break;
 
   case SSTRecord.sid:
    sstRecord = (SSTRecord) record;
    break;
 
   case BlankRecord.sid:
    BlankRecord brec = (BlankRecord) record;
    thisRow = brec.getRow();
    thisColumn = brec.getColumn();
    thisStr = "";
    rowlist.add(thisColumn, thisStr);
    break;
   case BoolErrRecord.sid: //单元格为布尔类型
    BoolErrRecord berec = (BoolErrRecord) record;
    thisRow = berec.getRow();
    thisColumn = berec.getColumn();
    thisStr = berec.getBooleanValue()+"";
    rowlist.add(thisColumn, thisStr);
    break;
 
   case FormulaRecord.sid: //单元格为公式类型
    FormulaRecord frec = (FormulaRecord) record;
    thisRow = frec.getRow();
    thisColumn = frec.getColumn();
    if (outputFormulaValues) {
     if (Double.isNaN(frec.getValue())) {
      // Formula result is a string
      // This is stored in the next record
      outputNextStringRecord = true;
      nextRow = frec.getRow();
      nextColumn = frec.getColumn();
     } else {
      thisStr = formatListener.formatNumberDateCell(frec);
     }
    } else {
     thisStr = '"' + HSSFFormulaParser.toFormulaString(stubWorkbook,
       frec.getParsedexpression_r()) + '"';
    }
    rowlist.add(thisColumn,thisStr);
    break;
   case StringRecord.sid://单元格中公式的字符串
    if (outputNextStringRecord) {
     // String for formula
     StringRecord srec = (StringRecord) record;
     thisStr = srec.getString();
     thisRow = nextRow;
     thisColumn = nextColumn;
     outputNextStringRecord = false;
    }
    break;
   case LabelRecord.sid:
    LabelRecord lrec = (LabelRecord) record;
    //curRow = thisRow = lrec.getRow();
    thisColumn = lrec.getColumn();
    value = lrec.getValue().trim();
    value = value.equals("")?" ":value;
    rowlist.add(thisColumn, value);
    break;
   case LabelSSTRecord.sid:  //单元格为字符串类型
    LabelSSTRecord lsrec = (LabelSSTRecord) record;
    //curRow = thisRow = lsrec.getRow();
    thisColumn = lsrec.getColumn();
    if (sstRecord == null) {
     rowlist.add(thisColumn, " ");
    } else {
     value =  sstRecord
     .getString(lsrec.getSSTIndex()).toString().trim();
     value = value.equals("")?" ":value;
     rowlist.add(thisColumn,value);
    }
    break;
   case NumberRecord.sid:  //单元格为数字类型
    NumberRecord numrec = (NumberRecord) record;
    //curRow = thisRow = numrec.getRow();
    thisColumn = numrec.getColumn();
    value = formatListener.formatNumberDateCell(numrec).trim();
    value = value.equals("")?" ":value;
    // 向容器加入列值
    rowlist.add(thisColumn, value);
    break;
   default:
    break;
  }

  // 遇到新行的操作
  if (thisRow != -1 && thisRow != lastRowNumber) {
   lastColumnNumber = -1;
  }

  // 空值的操作
  if (record instanceof MissingCellDummyRecord) {
   MissingCellDummyRecord mc = (MissingCellDummyRecord) record;
   //curRow = thisRow = mc.getRow();
   thisColumn = mc.getColumn();
   rowlist.add(thisColumn," ");
  }

  // 更新行和列的值
  if (thisRow > -1)
   lastRowNumber = thisRow;
  if (thisColumn > -1)
   lastColumnNumber = thisColumn;

  // 行结束时的操作
  if (record instanceof LastCellOfRowDummyRecord) {
   if (minColumns > 0) {
    // 列值重新置空
    if (lastColumnNumber == -1) {
     lastColumnNumber = 0;
    }
   }
   lastColumnNumber = -1;
    // 每行结束时, 调用getRows() 方法
   //rowReader.getRows(sheetIndex,curRow, rowlist);
   
   List<String> rowlistcopy = new ArrayList<String>();
   for (int i = 0; i < rowlist.size(); i++) {
    rowlistcopy.add(rowlist.get(i));
   }
   exceldata.add(rowlistcopy);
   // 清空容器
   rowlist.clear();
   
  }
 }


}

(2)Excel2007ReaderNew.java,操作07excel,这个是核心,之所以有个New,是取代了之前的Excel2007Reader.java,解决有单元格为空的判定处理问题。

参考:http://gaosheng08.iteye.com/blog/624758


package com.cbe.excel;
import java.io.InputStream;  
import java.util.ArrayList;  
import java.util.Iterator;  
import java.util.List;  
 
import org.apache.poi.xssf.eventusermodel.XSSFReader;  
import org.apache.poi.xssf.model.SharedStringsTable;  
import org.apache.poi.xssf.usermodel.XSSFRichTextString;  
import org.apache.poi.openxml4j.opc.OPCPackage;  
import org.xml.sax.Attributes;  
import org.xml.sax.InputSource;  
import org.xml.sax.SAXException;  
import org.xml.sax.XMLReader;  
import org.xml.sax.helpers.DefaultHandler;  
import org.xml.sax.helpers.XMLReaderFactory;  


public  class Excel2007ReaderNew   extends DefaultHandler {

  private SharedStringsTable sst;  
     private String lastContents;  
     private boolean nextIsString;  
  
     @SuppressWarnings("unused")
  private int sheetIndex = -1;  
     private List<String> rowlist = new ArrayList<String>();  
     private int curRow = 0;     //当前行  
     private int curCol = 0;     //当前列索引  
     private int preCol = 0;     //上一列列索引  
     private int titleRow = 0;   //标题行,一般情况下为0  
     private int rowsize = 0;    //列数  
       
     private List<List<String>> exceldata=new ArrayList<List<String>>();//整个excel数据
     public List<List<String>> getExcelData(){
      return exceldata;
     }
    
     //excel记录行操作方法,以行索引和行元素列表为参数,对一行元素进行操作,元素为String类型  
 //  public abstract void optRows(int curRow, List<String> rowlist) throws SQLException ;  
       
     //excel记录行操作方法,以sheet索引,行索引和行元素列表为参数,对sheet的一行元素进行操作,元素为String类型  
    // public abstract void optRows(int sheetIndex,int curRow, List<String> rowlist) throws SQLException;  
       
     //只遍历一个sheet,其中sheetId为要遍历的sheet索引,从1开始,1-3  
     public void processOneSheet(String filename,int sheetId) throws Exception {  
         OPCPackage pkg = OPCPackage.open(filename);  
         XSSFReader r = new XSSFReader(pkg);  
         SharedStringsTable sst = r.getSharedStringsTable();  
           
         XMLReader parser = fetchSheetParser(sst);  
  
         // rId2 found by processing the Workbook  
         // 根据 rId# 或 rSheet# 查找sheet  
         InputStream sheet2 = r.getSheet("rId"+sheetId);  
         sheetIndex++;  
         InputSource sheetSource = new InputSource(sheet2);  
         parser.parse(sheetSource);  
         sheet2.close();  
         pkg.close();//add by lfc 20130710,解决文件使用后无法删除的问题
     }  
  
      
     public void process(String filename) throws Exception {  
         OPCPackage pkg = OPCPackage.open(filename);  
         XSSFReader r = new XSSFReader(pkg);  
         SharedStringsTable sst = r.getSharedStringsTable();  
  
         XMLReader parser = fetchSheetParser(sst);  
  
         Iterator<InputStream> sheets = r.getSheetsData();  
         while (sheets.hasNext()) {  
             curRow = 0;  
             sheetIndex++;  
             InputStream sheet = sheets.next();  
             InputSource sheetSource = new InputSource(sheet);  
             parser.parse(sheetSource);  
             sheet.close();  
         }
         pkg.close();//add by lfc 20130710,解决文件使用后无法删除的问题
     }  
  
     public XMLReader fetchSheetParser(SharedStringsTable sst)  
             throws SAXException {  
         XMLReader parser = XMLReaderFactory  
                 .createXMLReader("org.apache.xerces.parsers.SAXParser");  
         this.sst = sst;  
         parser.setContentHandler(this);  
         return parser;  
     }  
  
     public void startElement(String uri, String localName, String name,  
             Attributes attributes) throws SAXException {  
         // c => 单元格  
         if (name.equals("c")) {  
             // 如果下一个元素是 SST 的索引,则将nextIsString标记为true  
             String cellType = attributes.getValue("t");  
             String rowStr = attributes.getValue("r");  
             curCol = this.getRowIndex(rowStr);  
             if (cellType != null && cellType.equals("s")) {  
                 nextIsString = true;  
             } else {  
                 nextIsString = false;  
             }  
         }  
         // 置空  
         lastContents = "";  
     }  
  
     public void endElement(String uri, String localName, String name)  
             throws SAXException {  
         // 根据SST的索引值的到单元格的真正要存储的字符串  
         // 这时characters()方法可能会被调用多次  
         if (nextIsString) {  
             try {  
                 int idx = Integer.parseInt(lastContents);  
                 lastContents = new XSSFRichTextString(sst.getEntryAt(idx))  
                         .toString();  
             } catch (Exception e) {  
  
             }  
         }  
  
         // v => 单元格的值,如果单元格是字符串则v标签的值为该字符串在SST中的索引  
         // 将单元格内容加入rowlist中,在这之前先去掉字符串前后的空白符  
         if (name.equals("v")) {  
             String value = lastContents.trim();  
             value = value.equals("")?" ":value;  
             int cols = curCol-preCol;  
             if (cols>1){  
                 for (int i = 0;i < cols-1;i++){  
                     rowlist.add(preCol,"");  
                 }  
             }  
             preCol = curCol;  
             rowlist.add(curCol-1, value);  
         }else {  
             //如果标签名称为 row ,这说明已到行尾,调用 optRows() 方法  
             if (name.equals("row")) {  
                 int tmpCols = rowlist.size();  
                 if(curRow>this.titleRow && tmpCols<this.rowsize){  
                     for (int i = 0;i < this.rowsize-tmpCols;i++){  
                         rowlist.add(rowlist.size(), "");  
                     }  
                 }  
                 // add by lfc 20130710,注释掉
               
                 if(curRow==this.titleRow){  
                     this.rowsize = rowlist.size();  
                 }
                
                 List<String> rowlistcopy = new ArrayList<String>();
     for (int i = 0; i < rowlist.size(); i++) {
      rowlistcopy.add(rowlist.get(i));
     }
     exceldata.add(rowlistcopy);
     
                 rowlist.clear();  
                 curRow++;  
                 curCol = 0;  
                 preCol = 0;  
             }  
         }  
     }  
  
     public void characters(char[] ch, int start, int length)  
             throws SAXException {  
         //得到单元格内容的值  
         lastContents += new String(ch, start, length);  
     }  
       
     //得到列索引,每一列c元素的r属性构成为字母加数字的形式,字母组合为列索引,数字组合为行索引,  
     //如AB45,表示为第(A-A+1)*26+(B-A+1)*26列,45行  
     public int getRowIndex(String rowStr){  
         rowStr = rowStr.replaceAll("[^A-Z]", "");  
         byte[] rowAbc = rowStr.getBytes();  
         int len = rowAbc.length;  
         float num = 0;  
         for (int i=0;i<len;i++){  
             num += (rowAbc[i]-'A'+1)*Math.pow(26,len-i-1 );  
         }  
         return (int) num;  
     }  
  
     public int getTitleRow() {  
         return titleRow;  
     }  
  
     public void setTitleRow(int titleRow) {  
         this.titleRow = titleRow;  
     }  
 
}

 

(3)ExcelReaderUtil.java

核心代码:


  public static final String EXCEL03_EXTENSION = ".xls"; //excel2003扩展名
  public static final String EXCEL07_EXTENSION = ".xlsx"; //excel2007扩展名
  
  
  public static List<List<String>> readExcel(String fileName) throws Exception{
    List<List<String>> exceldata=new ArrayList<List<String>>();
   if (fileName.endsWith(EXCEL03_EXTENSION)){// 处理excel2003文件
    Excel2003Reader excel03 = new Excel2003Reader();
    excel03.process(fileName);
    exceldata=excel03.getExcelData();
   }    
   else if (fileName.endsWith(EXCEL07_EXTENSION)){// 处理excel2007文件
    //Excel2007Reader excel07 = new Excel2007Reader();
    Excel2007ReaderNew excel07 = new Excel2007ReaderNew();
    excel07.process(fileName);
    exceldata=excel07.getExcelData();
   }
   else {
    throw new  Exception("文件格式错误,fileName的扩展名只能是xls或xlsx!");
   }
   return exceldata;
  }

 

5.poi+DOM,普通模式(UserModel)

 相对简单,但数据量大了报OOM错误,所以没有发展前景。

Java对Excel(03,07)进行上传、解析、验证、入库

ExcelUtil.java类,代码如下:

import java.io.FileInputStream;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;

import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.DateUtil;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.WorkbookFactory;

import com.cbe.entity.Index;



public class ExcelUtil {

 public static List<Index> GetDistrictIndex(String filename){
  List<Index> indexsList=new ArrayList<Index>();
  try {  
   
            }
  catch (Exception ex) {
   ex.printStackTrace();
   }
  
  return indexsList;
  
            }

 
 public static List<List<Object>> ReadExcel(String fileName) throws Exception{
  Workbook workbook=null;
  try {
  if (fileName.endsWith(".xlsx") || fileName.endsWith(".xls")){
   //注:add by lfc,20130702,采用这种方式同时支持xls和xlsx,而不必分别用HSSF和XSSF来做
   FileInputStream inputStream = new FileInputStream(fileName);
   workbook= WorkbookFactory.create(inputStream);   
  }
  }
  catch (Exception e) {
  e.printStackTrace();
  }
  return analyzeWorkbook(workbook);
 }
  
 
 
 public static  List<List<Object>> analyzeWorkbook(Workbook workbook){
  
  Sheet sheet = workbook.getSheetAt(0);//第一个表单
  List<String> columnNames=new ArrayList<String>();//列
  List<Object> row=new ArrayList<Object>();//一行
  List<List<Object>> rows=new ArrayList<List<Object>>();//所有行
  int rowCount = sheet.getLastRowNum();//行数
  int columnCount=sheet.getRow(0).getPhysicalNumberOfCells();//列数
  
  for (int i = 0; i <columnCount; i++) {
   Cell cell=sheet.getRow(0).getCell(i);
   columnNames.add(getCellValue(cell).toString());
  }
  
   for(int j=0;j<=rowCount;j++){   //第一行为列名,若只是取数据则从第二行开始
    Row r=sheet.getRow(j);
             for(Cell cell : r){ 
              row.add(getCellValue(cell));
             }
             rows.add(row);
   }
   return rows;
 }
 

 public static Object getCellValue(Cell cell){
  int cellType = cell.getCellType();
  switch (cellType) {
             case Cell.CELL_TYPE_STRING://1
                 return cell.getStringCellValue();
             case Cell.CELL_TYPE_NUMERIC://0
              if (DateUtil.isCellDateFormatted(cell)) {//日期格式
               Date t = cell.getDateCellValue();
               SimpleDateFormat format=new SimpleDateFormat("yyyy-MM-dd");
               return format.format(t);
              }
              else{
               return cell.getNumericCellValue();
              }
             case Cell.CELL_TYPE_BOOLEAN://4
                 return cell.getBooleanCellValue();
             case Cell.CELL_TYPE_FORMULA://2
             case Cell.CELL_TYPE_BLANK://3
             case Cell.CELL_TYPE_ERROR://5
             default:
              return cell.getStringCellValue();
  }
 }
 
}

 

 

附:

测试的时候可能要监控一下耗时什么的,可以使用MyEclipse自带的jvisualvm.exe工具(我没测过)。

路径:D:\Program Files\MyEclipse\Common\binary\com.sun.java.jdk.win32.x86_1.6.0.013\bin

 

Java对Excel(03,07)进行上传、解析、验证、入库

初次运行时会提示要做jdk的校准标准化(calibration),没细研究是个什么概念,直接点确定就好。

运行后的主界面:

Java对Excel(03,07)进行上传、解析、验证、入库

 

三。验证

 这个就根据自己的需求,对特定的excel模板做特定的检查,废话就不多说了。

验证处理还是比较费劲的,所以还是控制好源头“excel模板”,整理好数据,否则问题会很多的,本身excel07采用xml的解析方式就存在不少问题的。

如图,不同的格式会有不同的xml,这个在调试程序的时候我也费了不少周折

Java对Excel(03,07)进行上传、解析、验证、入库


Java对Excel(03,07)进行上传、解析、验证、入库

 

四。入库

 这个工作就交个dao和daoImp包下的接口和类来实现了,自己写。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值