读取Word文档: tm-extractors.jar http://www.textmining.org/ 读取Excel文档: jxl.jar http://sourceforge.net/ 读取PDF文档: PDFBox.jar http://www.pdfbox.org/