Android:读取Word文档

最新推荐文章于 2023-06-17 17:27:24 发布

one1go

最新推荐文章于 2023-06-17 17:27:24 发布

阅读量1.2w

点赞数 5

分类专栏： Android学习文章标签： android

本文链接：https://blog.csdn.net/one1go/article/details/72732160

版权

Android学习专栏收录该内容

11 篇文章 0 订阅

订阅专栏

问题详情

目前了解到的读取 Word 文档的有三种形式：
1、直接调用第三方应用，如 WPS
2、使用 tm-extractors-0.4.jar
3、使用 poi

解决思路

使用第三方应用

这种查看内容效果时最好的，这个需要提示用户下载第三方，你可以写连接到应用市场的代码，也可以直接提示让用户自己下载
下载完成之后使用下面代码调用可以读取 doc 或 docx 文件的程序

Intent intent = getWordFileIntent(wordInfo.getPath());
try {
      getContext().startActivity(intent);
}catch (Exception e) {
      Toast.makeText(getContext(),"找不到可以打开该文件的程序",Toast.LENGTH_SHORT).show();
                }

//android获取一个用于打开Word文件的intent
    public static Intent getWordFileIntent(String param )
    {
        Intent intent = new Intent("android.intent.action.VIEW");
        intent.addCategory("android.intent.category.DEFAULT");
        intent.addFlags(Intent.FLAG_ACTIVITY_NEW_TASK);
        Uri uri = Uri.fromFile(new File(param ));
        intent.setDataAndType(uri, "application/msword");
        return intent;
    }

使用 tm-extractors-0.4.jar

这个显示效果比较简单，只能显示文字，图片、图表都不能显示
主要代码如下：

public String readWord(String file) {
        // 创建输入流用来读取doc文件
        FileInputStream in;
        String text = null;
        try {
            in = new FileInputStream(new File(file));
            WordExtractor extractor = null;
            // 创建WordExtractor
            extractor = new WordExtractor();
            // 进行提取对doc文件
            text = extractor.extractText(in);
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (Exception e) {
            e.printStackTrace();
        }
        return text;
    }

使用 poi

它的主要思路是将 word 文档转换成 html, 然后用 webview 显示，使用的需要两个 Jar 包 poi-3.15.jar，poi-scratchpad-3.15.jar
主要代码如下
FileUtil

public class FileUtil {
    private final static String TAG = "FileUtil";

    public static String getFileName(String pathandname) {
        int start = pathandname.lastIndexOf("/");
        int end = pathandname.lastIndexOf(".");
        if (start != -1 && end != -1) {
            return pathandname.substring(start + 1, end);
        } else {
            return "";
        }
    }

    public static String createFile(String dir_name, String file_name) {
        String sdcard_path = Environment.getExternalStorageDirectory().getAbsolutePath();
        String dir_path = String.format("%s/Download/%s", sdcard_path, dir_name);
        String file_path = String.format("%s/%s", dir_path, file_name);
        try {
            File dirFile = new File(dir_path);
            if (!dirFile.exists()) {
                dirFile.mkdir();
            }
            File myFile = new File(file_path);
            myFile.createNewFile();
        } catch (Exception e) {
            e.printStackTrace();
        }
        return file_path;
    }

    public static ZipEntry getPicEntry(ZipFile docxFile, int pic_index) {
        String entry_jpg = "word/media/image" + pic_index + ".jpeg";
        String entry_png = "word/media/image" + pic_index + ".png";
        String entry_gif = "word/media/image" + pic_index + ".gif";
        String entry_wmf = "word/media/image" + pic_index + ".wmf";
        ZipEntry pic_entry = null;
        pic_entry = docxFile.getEntry(entry_jpg);
        // 以下为读取docx的图片 转化为流数组
        if (pic_entry == null) {
            pic_entry = docxFile.getEntry(entry_png);
        }
        if (pic_entry == null) {
            pic_entry = docxFile.getEntry(entry_gif);
        }
        if (pic_entry == null) {
            pic_entry = docxFile.getEntry(entry_wmf);
        }
        return pic_entry;
    }

    public static byte[] getPictureBytes(ZipFile docxFile, ZipEntry pic_entry) {
        byte[] pictureBytes = null;
        try {
            InputStream pictIS = docxFile.getInputStream(pic_entry);
            ByteArrayOutputStream pOut = new ByteArrayOutputStream();
            byte[] b = new byte[1000];
            int len = 0;
            while ((len = pictIS.read(b)) != -1) {
                pOut.write(b, 0, len);
            }
            pictIS.close();
            pOut.close();
            pictureBytes = pOut.toByteArray();
            Log.d(TAG, "pictureBytes.length=" + pictureBytes.length);
            if (pictIS != null) {
                pictIS.close();
            }
            if (pOut != null) {
                pOut.close();
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        return pictureBytes;

    }

    public static void writePicture(String pic_path, byte[] pictureBytes) {
        File myPicture = new File(pic_path);
        try {
            FileOutputStream outputPicture = new FileOutputStream(myPicture);
            outputPicture.write(pictureBytes);
            outputPicture.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

WordUtil

public class WordUtil {
    private final static String TAG = "WordUtil";
    public String htmlPath;
    private String docPath;
    private String picturePath;
    private List<Picture> pictures;
    private TableIterator tableIterator;
    private int presentPicture = 0;
    private FileOutputStream output;

    private String htmlBegin = "<html><meta charset=\"utf-8\"><body>";
    private String htmlEnd = "</body></html>";
    private String tableBegin = "<table style=\"border-collapse:collapse\" border=1 bordercolor=\"black\">";
    private String tableEnd = "</table>";
    private String rowBegin = "<tr>", rowEnd = "</tr>";
    private String columnBegin = "<td>", columnEnd = "</td>";
    private String lineBegin = "<p>", lineEnd = "</p>";
    private String centerBegin = "<center>", centerEnd = "</center>";
    private String boldBegin = "<b>", boldEnd = "</b>";
    private String underlineBegin = "<u>", underlineEnd = "</u>";
    private String italicBegin = "<i>", italicEnd = "</i>";
    private String fontSizeTag = "<font size=\"%d\">";
    private String fontColorTag = "<font color=\"%s\">";
    private String fontEnd = "</font>";
    private String spanColor = "<span style=\"color:%s;\">", spanEnd = "</span>";
    private String divRight = "<div align=\"right\">", divEnd = "</div>";
    private String imgBegin = "<img src=\"%s\" >";

    public WordUtil(String doc_name) {
        docPath = doc_name;
        htmlPath = FileUtil.createFile("html", FileUtil.getFileName(docPath) + ".html");
        Log.d(TAG, "htmlPath=" + htmlPath);
        try {
            output = new FileOutputStream(new File(htmlPath));
            presentPicture = 0;
            output.write(htmlBegin.getBytes());
            if (docPath.endsWith(".doc")) {
                readDOC();
            } else if (docPath.endsWith(".docx")) {
                readDOCX();
            }
            output.write(htmlEnd.getBytes());
            output.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    //读取word中的内容并写到sd卡上的html文件中
    private void readDOC() {
        try {
            FileInputStream in = new FileInputStream(docPath);
            POIFSFileSystem pfs = new POIFSFileSystem(in);
            HWPFDocument hwpf = new HWPFDocument(pfs);
            Range range = hwpf.getRange();
            pictures = hwpf.getPicturesTable().getAllPictures();
            tableIterator = new TableIterator(range);
            int numParagraphs = range.numParagraphs();// 得到页面所有的段落数
            for (int i = 0; i < numParagraphs; i++) { // 遍历段落数
                Paragraph p = range.getParagraph(i); // 得到文档中的每一个段落
                if (p.isInTable()) {
                    int temp = i;
                    if (tableIterator.hasNext()) {
                        Table table = tableIterator.next();
                        output.write(tableBegin.getBytes());
                        int rows = table.numRows();
                        for (int r = 0; r < rows; r++) {
                            output.write(rowBegin.getBytes());
                            TableRow row = table.getRow(r);
                            int cols = row.numCells();
                            int rowNumParagraphs = row.numParagraphs();
                            int colsNumParagraphs = 0;
                            for (int c = 0; c < cols; c++) {
                                output.write(columnBegin.getBytes());
                                TableCell cell = row.getCell(c);
                                int max = temp + cell.numParagraphs();
                                colsNumParagraphs = colsNumParagraphs + cell.numParagraphs();
                                for (int cp = temp; cp < max; cp++) {
                                    Paragraph p1 = range.getParagraph(cp);
                                    output.write(lineBegin.getBytes());
                                    writeParagraphContent(p1);
                                    output.write(lineEnd.getBytes());
                                    temp++;
                                }
                                output.write(columnEnd.getBytes());
                            }
                            int max1 = temp + rowNumParagraphs;
                            for (int m = temp + colsNumParagraphs; m < max1; m++) {
                                temp++;
                            }
                            output.write(rowEnd.getBytes());
                        }
                        output.write(tableEnd.getBytes());
                    }
                    i = temp;
                } else {
                    output.write(lineBegin.getBytes());
                    writeParagraphContent(p);
                    output.write(lineEnd.getBytes());
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    private void readDOCX() {
        try {
            ZipFile docxFile = new ZipFile(new File(docPath));
            ZipEntry sharedStringXML = docxFile.getEntry("word/document.xml");
            InputStream inputStream = docxFile.getInputStream(sharedStringXML);
            XmlPullParser xmlParser = Xml.newPullParser();
            xmlParser.setInput(inputStream, "utf-8");
            boolean isTable = false; // 表格
            boolean isSize = false; // 文字大小
            boolean isColor = false; // 文字颜色
            boolean isCenter = false; // 居中对齐
            boolean isRight = false; // 靠右对齐
            boolean isItalic = false; // 斜体
            boolean isUnderline = false; // 下划线
            boolean isBold = false; // 加粗
            boolean isRegion = false; // 在那个区域中
            int pic_ndex = 1; // docx中的图片名从image1开始，所以索引从1开始
            int event_type = xmlParser.getEventType();
            while (event_type != XmlPullParser.END_DOCUMENT) {
                switch (event_type) {
                case XmlPullParser.START_TAG: // 开始标签
                    String tagBegin = xmlParser.getName();
                    if (tagBegin.equalsIgnoreCase("r")) {
                        isRegion = true;
                    }
                    if (tagBegin.equalsIgnoreCase("jc")) { // 判断对齐方式
                        String align = xmlParser.getAttributeValue(0);
                        if (align.equals("center")) {
                            output.write(centerBegin.getBytes());
                            isCenter = true;
                        }
                        if (align.equals("right")) {
                            output.write(divRight.getBytes());
                            isRight = true;
                        }
                    }
                    if (tagBegin.equalsIgnoreCase("color")) { // 判断文字颜色
                        String color = xmlParser.getAttributeValue(0);
                        output.write(String.format(spanColor, color).getBytes());
                        isColor = true;
                    }
                    if (tagBegin.equalsIgnoreCase("sz")) { // 判断文字大小
                        if (isRegion == true) {
                            int size = getSize(Integer.valueOf(xmlParser.getAttributeValue(0)));
                            output.write(String.format(fontSizeTag, size).getBytes());
                            isSize = true;
                        }
                    }
                    if (tagBegin.equalsIgnoreCase("tbl")) { // 检测到表格
                        output.write(tableBegin.getBytes());
                        isTable = true;
                    } else if (tagBegin.equalsIgnoreCase("tr")) { // 表格行
                        output.write(rowBegin.getBytes());
                    } else if (tagBegin.equalsIgnoreCase("tc")) { // 表格列
                        output.write(columnBegin.getBytes());
                    }
                    if (tagBegin.equalsIgnoreCase("pic")) { // 检测到图片
                        ZipEntry pic_entry = FileUtil.getPicEntry(docxFile, pic_ndex);
                        if (pic_entry != null) {
                            byte[] pictureBytes = FileUtil.getPictureBytes(docxFile, pic_entry);
                            writeDocumentPicture(pictureBytes);
                        }
                        pic_ndex++; // 转换一张后，索引+1
                    }
                    if (tagBegin.equalsIgnoreCase("p") && !isTable) {// 检测到段落，如果在表格中就无视
                        output.write(lineBegin.getBytes());
                    }
                    if (tagBegin.equalsIgnoreCase("b")) { // 检测到加粗
                        isBold = true;
                    }
                    if (tagBegin.equalsIgnoreCase("u")) { // 检测到下划线
                        isUnderline = true;
                    }
                    if (tagBegin.equalsIgnoreCase("i")) { // 检测到斜体
                        isItalic = true;
                    }
                    // 检测到文本
                    if (tagBegin.equalsIgnoreCase("t")) {
                        if (isBold == true) { // 加粗
                            output.write(boldBegin.getBytes());
                        }
                        if (isUnderline == true) { // 检测到下划线，输入<u>
                            output.write(underlineBegin.getBytes());
                        }
                        if (isItalic == true) { // 检测到斜体，输入<i>
                            output.write(italicBegin.getBytes());
                        }
                        String text = xmlParser.nextText();
                        output.write(text.getBytes()); // 写入文本
                        if (isItalic == true) { // 输入斜体结束标签</i>
                            output.write(italicEnd.getBytes());
                            isItalic = false;
                        }
                        if (isUnderline == true) { // 输入下划线结束标签</u>
                            output.write(underlineEnd.getBytes());
                            isUnderline = false;
                        }
                        if (isBold == true) { // 输入加粗结束标签</b>
                            output.write(boldEnd.getBytes());
                            isBold = false;
                        }
                        if (isSize == true) { // 输入字体结束标签</font>
                            output.write(fontEnd.getBytes());
                            isSize = false;
                        }
                        if (isColor == true) { // 输入跨度结束标签</span>
                            output.write(spanEnd.getBytes());
                            isColor = false;
                        }
//                      if (isCenter == true) { // 输入居中结束标签</center>。要在段落结束之前再输入该标签，因为该标签会强制换行
//                          output.write(centerEnd.getBytes());
//                          isCenter = false;
//                      }
                        if (isRight == true) { // 输入区块结束标签</div>
                            output.write(divEnd.getBytes());
                            isRight = false;
                        }
                    }
                    break;
                // 结束标签
                case XmlPullParser.END_TAG:
                    String tagEnd = xmlParser.getName();
                    if (tagEnd.equalsIgnoreCase("tbl")) { // 输入表格结束标签</table>
                        output.write(tableEnd.getBytes());
                        isTable = false;
                    }
                    if (tagEnd.equalsIgnoreCase("tr")) { // 输入表格行结束标签</tr>
                        output.write(rowEnd.getBytes());
                    }
                    if (tagEnd.equalsIgnoreCase("tc")) { // 输入表格列结束标签</td>
                        output.write(columnEnd.getBytes());
                    }
                    if (tagEnd.equalsIgnoreCase("p")) { // 输入段落结束标签</p>，如果在表格中就无视
                        if (isTable == false) {
                            if (isCenter == true) { // 输入居中结束标签</center>
                                output.write(centerEnd.getBytes());
                                isCenter = false;
                            }
                            output.write(lineEnd.getBytes());
                        }
                    }
                    if (tagEnd.equalsIgnoreCase("r")) {
                        isRegion = false;
                    }
                    break;
                default:
                    break;
                }
                event_type = xmlParser.next();
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    private int getSize(int sizeType) {
        if (sizeType >= 1 && sizeType <= 8) {
            return 1;
        } else if (sizeType >= 9 && sizeType <= 11) {
            return 2;
        } else if (sizeType >= 12 && sizeType <= 14) {
            return 3;
        } else if (sizeType >= 15 && sizeType <= 19) {
            return 4;
        } else if (sizeType >= 20 && sizeType <= 29) {
            return 5;
        } else if (sizeType >= 30 && sizeType <= 39) {
            return 6;
        } else if (sizeType >= 40) {
            return 7;
        } else {
            return 3;
        }
    }

    private String getColor(int colorType) {
        if (colorType == 1) {
            return "#000000";
        } else if (colorType == 2) {
            return "#0000FF";
        } else if (colorType == 3 || colorType == 4) {
            return "#00FF00";
        } else if (colorType == 5 || colorType == 6) {
            return "#FF0000";
        } else if (colorType == 7) {
            return "#FFFF00";
        } else if (colorType == 8) {
            return "#FFFFFF";
        } else if (colorType == 9 || colorType == 15) {
            return "#CCCCCC";
        } else if (colorType == 10 || colorType == 11) {
            return "#00FF00";
        } else if (colorType == 12 || colorType == 16) {
            return "#080808";
        } else if (colorType == 13 || colorType == 14) {
            return "#FFFF00";
        } else {
            return "#000000";
        }
    }

    public void writeDocumentPicture(byte[] pictureBytes) {
        picturePath = FileUtil.createFile("html", FileUtil.getFileName(docPath) + presentPicture + ".jpg");
        FileUtil.writePicture(picturePath, pictureBytes);
        presentPicture++;
        String imageString = String.format(imgBegin, picturePath);
        try {
            output.write(imageString.getBytes());
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public void writeParagraphContent(Paragraph paragraph) {
        Paragraph p = paragraph;
        int pnumCharacterRuns = p.numCharacterRuns();
        for (int j = 0; j < pnumCharacterRuns; j++) {
            CharacterRun run = p.getCharacterRun(j);
            if (run.getPicOffset() == 0 || run.getPicOffset() >= 1000) {
                if (presentPicture < pictures.size()) {
                    writeDocumentPicture(pictures.get(presentPicture).getContent());
                }
            } else {
                try {
                    String text = run.text();
                    if (text.length() >= 2 && pnumCharacterRuns < 2) {
                        output.write(text.getBytes());
                    } else {
                        String fontSizeBegin = String.format(fontSizeTag, getSize(run.getFontSize()));
                        String fontColorBegin = String.format(fontColorTag, getColor(run.getColor()));
                        output.write(fontSizeBegin.getBytes());
                        output.write(fontColorBegin.getBytes());
                        if (run.isBold()) {
                            output.write(boldBegin.getBytes());
                        }
                        if (run.isItalic()) {
                            output.write(italicBegin.getBytes());
                        }
                        output.write(text.getBytes());
                        if (run.isBold()) {
                            output.write(boldEnd.getBytes());
                        }
                        if (run.isItalic()) {
                            output.write(italicEnd.getBytes());
                        }
                        output.write(fontEnd.getBytes());
                        output.write(fontEnd.getBytes());
                    }
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        }
    }
}

调用

Button open = (Button) findViewById(R.id.open);
        content = (WebView) findViewById(R.id.content);
        WebSettings settings = content.getSettings();
        settings.setUseWideViewPort(true);
        settings.setLoadWithOverviewMode(true);
        settings.setSupportZoom(true);
        settings.setBuiltInZoomControls(true);// 设置WebView可触摸放大缩小
        settings.setUseWideViewPort(true);

        open.setOnClickListener(new OnClickListener() {

            @Override
            public void onClick(View v) {
                String path = Environment.getExternalStorageDirectory()
                        + "/xx/a.docx";
                Log.d(TAG, "path=" + path);
                // tm-extractors-0.4.jar与poi的包在编译时会冲突，二者只能同时导入一个
                WordUtil wu = new WordUtil(path);
                Log.d(TAG, "htmlPath=" + wu.htmlPath);
                content.loadUrl("file:///" + wu.htmlPath);
            }
        });

答案参考

http://blog.csdn.net/liubo253/article/details/54614886
http://blog.csdn.net/aqi00/article/details/69942521#comments

源码和 jar 包

源码下载地址

one1go

关注

5
点赞
踩
16

收藏

觉得还不错? 一键收藏
4
评论
Android:读取Word文档

问题详情目前了解到的读取Word文档的有三种形式： 1、直接调用第三方应用，如WPS 2、使用tm-extractors-0.4.jar 3、使用poi解决思路使用第三方应用这种查看内容效果时最好的，这个需要提示用户下载第三方，你可以写连接到应用市场的代码，也可以直接提示让用户自己下载下载完成之后使用下面代码调用可以读取doc或docx文件的程序Intent intent = getWor
复制链接

扫一扫