使用epublib解析epub文件（章节内容、书籍菜单）

最新推荐文章于 2024-07-26 14:42:54 发布

sonnyching

最新推荐文章于 2024-07-26 14:42:54 发布

阅读量2.5w

点赞数 12

分类专栏：开源库文章标签： android epublib 阅读器

本文链接：https://blog.csdn.net/sonnyching/article/details/47407549

版权

开源库专栏收录该内容

2 篇文章 0 订阅

订阅专栏

前阵子在android上解析epub格式的书籍。发现了这个开源的epub解析库。相关资料甚少！折腾了一阵子，发现其实光使用的话还是挺简单的。真是萌萌哒~下面简单介绍一下epublib。PS：第一次在CSDN发博客，排版略丑别嫌弃啊~

epublib不仅可以用来解析epub格式的书籍，同样也可以用来生成一本epub书籍。由于我只是用于阅读，所以这里只介绍解析的方法。当然，要想了解epub的解析，首先得对epub格式的规范有一定的认识。

话不多说，进入正题！

(一) 相关资料：

1. 项目地址：

https://github.com/psiegman/epublib

2. 官方api文档：

http://www.siegmann.nl/static/epublib/apidocs/

3. 官方网站：

http://www.siegmann.nl/epublib

4. Android上使用需要用到的依赖

Slf4j-android : http://www.slf4j.org/android/

epublib-core-latest.jar : https://github.com/downloads/psiegman/epublib/epublib-core-latest.jar

5. 使用epublib的开源android项目

PagerTurner (这个以后可能会开一个篇章简单介绍一下)

（二）重要的类：

1. Book

—— 表示一本书。书籍的内容全在这里。通过book对象能过获得各类书籍内容，如resource，Metadata等；

2.Resource

—— 所有的resource对象构成一本书。所以，一个Resource就是书籍的一部分资源，这资源信息可以是html,css,js,图片等；

3.MetaData

—— 书籍的头信息。比如，作者，出版社，语言等；

4.Spine

—— 书籍的阅读顺序，是一个线性的顺序。通过Spine可以知道应该按照怎样的章节（注：这里所说的章节其实就是resource，不仅是书籍文本内容哦~下同）顺序去阅读，并且通过Spine可以找到对应章节的内容。

形式如：

5.TableOfContent

—— 这个与Spine有所区别。同样可以访问到各个章节的内容。但是他是树形结构。

形式如：

6.Resources

——获得全部的Resource对象。然后通过Resources对象能够轻易的取出你想要查找的Resource对象，可以用于查找Resource.

可以通过id或者href来定位resource,也可以通过MediaType类型来获得指定resource。部分重要方法如下图所示：

7.MediaType

—— Resource的类型描述。用于说明此Resource是何种类型（CSS/JS/图片/HTML/ VEDIO等）。

8.MediatypeService

—— 这个类中就包含了各种MediaType类型。如下图所示：

（三）基本用法

1.读取一本书

（1）最简单的方法就是直接从流中读取书籍

try {
    EpubReader reader = new EpubReader();
    InputStream in = aciticity.getAssets().open("1.epub");
    reader.readEpub(in);
} catch (Exception e) {
    e.printStackTrace();
}

（2）加载速度太慢？图片。视频之类的就先不读取了。

  try {
            EpubReader epubReader = new EpubReader();

        MediaType[] lazyTypes = {
                 MediatypeService.CSS,  
                 MediatypeService.GIF, MediatypeService.JPG,
                 MediatypeService.PNG,
                 MediatypeService.MP3,
                 MediatypeService.MP4};
        String fileName = "1.epub";
        Book book = epubReader.readEpubLazy(fileName,"UTF-8",Arrays.asList(lazyTypes));

    } catch (Exception e) {
        e.printStackTrace();
    }

2.获取章节内容

（1）获取所有章节内容。

 List<Resource> contents = book.getContents();

（2）获取某一章的章节内容。

//通过index获取
int index = 0;
Resource byIndex = book.getSpine().getResource(index);

//通过href获取
String href = "/images/1.png";
Resource byHref = book.getResources().getByHref(href);

//通过id
String id = "chapter01"; 
Resource byId = book.getResources().getById(id);

//特殊的resource,可以直接获取
book.getCoverImage();
book.getCoverPage();
book.getNcxResource();
book.getOpfResource();

//其他
book.getSpine().getSpineReferences().get(0).getResource();
book.getGuide().getReferences().get(0).getResource();

3.获取书籍菜单

（1）两种不同的菜单

//通过spine获取线性的阅读菜单，此菜单依照阅读顺序排序
List<SpineReference> spineReferences = book.getSpine().getSpineReferences();
//获取对应resource
pineReferences.get(0).getResource();

//通过TableOfContents获取树形菜单。此菜单按照章节之间的关系（树形）排序
TableOfContents tableOfContents = book.getTableOfContents();
//获得对应的reource
tableOfContents.getTocReferences().get(0).getResource();

其中，对于TableOfContent,接下来给出参考的遍历方式。

首先，是菜单的JavaBean:

/**
 * 菜单，包含了标题，href,儿子节点
 * @author Administrator
 *
 */
public class ContentItem{

    private String title;
    private String url;//href
    private int size;//resource的大小

    //孩子节点
    private List<ContentItem> children;

    public ContentItem() {
        super();
    }

    public ContentItem(String title, String url,int size) {
        super();
        this.title = title;
        this.url = url;
        this.size = size;
    }

    public String getTitle() {
        return title;
    }

    public void setTitle(String title) {
        this.title = title;
    }

    public String getUrl() {
        return url;
    }

    public void setUrl(String url) {
        this.url = url;
    }

    public List<ContentItem> getChildren() {
        if(this.children==null){
            children = new ArrayList<ContentItem>();
        }
        return children;
    }

    public void setChildren(List<ContentItem> children) {
        this.children = children;
    }

    public int getSize() {
        return size;
    }

    public void setSize(int size) {
        this.size = size;
    }

}

上面contentItem中的url就是href,到时候需要加载具体章节的内容的话，只需要根据href就可以拿到指定的resource了。

然后，通过TableOfContents,遍历出我们想要的树形菜单。

public class EpubMenuParser {

    private ContentItem menu = new ContentItem();

    public ContentItem startParse(Book book){
        //从深度为0开始遍历
        parseMenu(book.getTableOfContents().getTocReferences(), 0);
        return menu;

    }

    /**
     * 遍历epub书籍的目录
     * @param refs 
     * @param level 菜单深度
     */
    private void parseMenu(List<TOCReference> refs, int level) {

        if (refs == null || refs.isEmpty()) {
            return;
        }

        for (TOCReference ref : refs) {

            if (ref.getResource() != null) {
                if (level == 0) {
                    // 第一层，一级节点，父节点是root
                    ContentItem item = new ContentItem(ref.getTitle(),
                             ref.getCompleteHref(),(int)ref.getResource().getSize());
                    menu.getChildren().add(item);

                } else if (level == 1) {

                    int lastIndexOf_depth1 = menu.getChildren().size() - 1;// 当前最后一个一级节点的位置

                    // 存入root的孩子节点中的最后一个一级节点中
                    ContentItem item2 = new ContentItem(ref.getTitle(),
                            ref.getCompleteHref(),(int)ref.getResource().getSize());

                    menu.getChildren().get(lastIndexOf_depth1).getChildren()
                            .add(item2);

                } else if (level == 2) {

                    int lastIndexOf_depth1 = menu.getChildren().size() - 1;// 当前最后一个一级节点的位置
                    int lastIndexOf_depth2 = menu.getChildren()
                            .get(lastIndexOf_depth1).getChildren().size() - 1;// 当前最后一个二级节点的位置

                    // 父节点为二级节点中的最后一个节点
                    ContentItem item = new ContentItem(ref.getTitle(),
                        ref.getCompleteHref(),(int)ref.getResource().getSize());

                    menu.getChildren().get(lastIndexOf_depth1).getChildren()
                            .get(lastIndexOf_depth2).getChildren().add(item);
                }
            }
            //继续遍历它的儿子
            parseMenu(ref.getChildren(), level + 1);
        }
    }

}

当然，这个遍历方法还是有点问题的。比如，目前的写法只支持3级菜单。不过一般的都够用了。当然，写死了并不好，根据需求而定哦~

还有很多其他的用法，之后慢慢研究。