tm-extractors是封装了POI的word读取工具。下载jar包,导入到工程中便可以使用了。代码如下:
package com.you.read;
import java.io.FileInputStream;
import org.textmining.text.extraction.WordExtractor;
public class WordReader {
public static String readDoc(String doc) throws Exception {
FileInputStream in = new FileInputStream(doc);
WordExtractor extractor = null;
String text = null;
extractor = new WordExtractor();
text = extractor.extractText(in);
return text;
}
public static void main(String[] args) {
try {
String text = WordReader.readDoc("d:/bloom.doc");
System.out.println(text);
} catch (Exception e) {
e.printStackTrace();
}
}
}
运行结果抛出异常,为:
org.textmining.text.extraction.FastSavedException: Fast-saved files are unsupported at this time
at org.textmining.text.extraction.WordExtractor.extract