lucene-索引word文档

1、通过POI项目来索引。

public class POIWordDocHandler implements DocumentHandler{

  public Document getDocument(InputStream is)throws DocumentHandlerException{

    StringbodyText=null;

   try{//提取文本字段,接收word文档的inputstream对象,允许把原文写入writer类,从wrtier类提取

 

      WordDocument wd=new wordDocument(is);

       StringWriterdocTextWriter=new StringWriter();

      wd.writeAllText(bew PrintWriter(docTextWriter));

      docTextWriter.close();

      bodyText=docTextWriter.toString();

    }

    catch(Exception e){

      throw new DocumentHandlerException("cannot extracttext  from a word document",e);

    }

    if((bodyText!=null)&&(bodyText.trim().length()>0)){

         Document doc=new Document();

        doc.add(Field.UnStored("body",bodyText));

        return doc;

    }

    returnnull;

  }

  public static void main() throwsException{

     POIWordDocHandler handler=new POIWordDocHandler();

      Documentdoc=handler.getDocument(new FileInputStream(newFile(args[0])));

     System.out.println(doc);

  }

}

2、使用TextMining.org包API,支持从WORD6/95

public class TextMiningWordDocHandler implementsDocumentHandler{

    publicDocument getDocument throws DocumentHandlerException(){

     String bodyText=null;

     try{

         bodyText=newWordExtractor().extractText(is);//从InputStream对象中提取文本

     }

     catch (Exception e){

         throw new DocumentHandlerException("cannot extract text from a worddocument",e);

     }

     

     if((bodyText!=null)&&(bodyText.trim().length()>0)){

         Document doc=new Document();

         doc.add(Field.unStored("body",bodyText));

         return doc;

     }

     return null;

    }

   

    publicstatic void main(String[] args) throws Exception{

         TextMiningWordDocHandler handler=newTextMiningWordDocHandler();

         Document doc=handler.getDocument(new FileInputStream(newFile(args[0])));

          System.out.println(doc);

    }

}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值