txt转word java,如何将.doc或.docx文件转换为.txt

I'm wondering how you can convert Word .doc/.docx files to text files through Java. I understand that there's an option where I can do this through Word itself but I would like to be able to do something like this:

java DocConvert somedocfile.doc converted.txt

Thanks.

解决方案

If you're interested in a Java library that deals with Word document files, you might want to look at e.g. Apache POI. A quote from the website:

Why should I use Apache POI?

A major use of the Apache POI api is

for Text Extraction applications such

as web spiders, index builders, and

content management systems.

P.S.: If, on the other hand, you're simply looking for a conversion utility, Stack Overflow may not be the most appropriate place to ask for this.

Edit: If you don't want to use an existing library but do all the hard work yourself, you'll be glad to hear that Microsoft has published the required file format specifications. (The Microsoft Open Specification Promise lists the available specifications. Just google for any of them that you're interested in. In your case, you'd need e.g. the OLE2 Compound File Format, the Word 97 binary file format, and the Open XML formats.)

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值