安卓学习笔记之六：电子书EPUB制作

manmanbab

已于 2024-02-17 05:58:22 修改

阅读量1.8k

点赞数 22

文章标签：笔记社交电子学习智能手机

于 2024-01-20 03:11:38 首次发布

本文链接：https://blog.csdn.net/manmanbab/article/details/135710021

版权

有一些TXT格式的旧书，想放到手机微信读书中阅读。

在网上找到的电子书格式不友好，或者错字太多。

可以学习使用Sigil进行电子书的制作、编辑和修改。

Epub基础知识介绍

一、什么是epub

epub是一个完全开放和免费的电子书标准。它可以“自动重新编排”的内容。

Epub文件后缀名：.epub

二、 epub组成

Epub内部使用XHTML（或者DTBook）来展现文件的内容；用一系列css来定义格式和版面设计；然后把所有的文件压缩成zip包。

Epub格式中包含了DRM相关功能（目前epub引擎暂时不考虑drm相关信息）

EPub包括三项主要规格：

开放出版结构（Open Publication Structure，OPS）2.0，以定义内容的版面；

开放包裹格式（Open Packaging Format，OPF）2.0，定义以XML为基础的.epub档案结构；

OEBPS容纳格式（OEBPS Container Format，OCF）1.0，将所有相关文件收集至ZIP压缩档案之中。

1. OPS：

用XHTML（或者DTBook）来构筑书的内容。

用一系列css来定义书的格式和版面设计。

支持 png、jpeg、gif、svg的图片格式。

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<link rel="stylesheet" type="text/css" href="stylesheet.css"/>
<title>边　城</title>
</head>
<body>
<h1 class="fxx">一</h1>
<p class="content">由四川过湖南去，靠东有一条官路。这官路将近湘西边境，到了一个地方名叫“茶峒”的小山城时，有一小溪，溪边有座白色小塔，塔下住了一户单独的人家。这人家只一个老人，一个女孩子，一只黄狗。</p>
......
......
<p class="content">茶峒山城只隔渡头一里路，买油买盐时，逢年过节祖父得喝一杯酒时，祖父不上城，黄狗就伴同翠翠入城里去备办节货。到了卖杂货的铺子里，有大把的粉条，大缸的白糖，有炮仗，有红蜡烛，莫不给翠翠一种很深的印象，回到祖父身边，总把这些东西说个半天。那里河边还有许多上行船，百十船夫忙着起卸百货，这种船只比起渡船来全大得多，有趣味得多，翠翠也不容易忘记。</p>
</body>
</html>

2. OPF：

OPF 文件是 EPUB 规范中最复杂的元数据。它用来定义ops一系列内容组合到一起的机制，并为ebook提供了一些额外的结构和内容。

Opf包含四个子元素：metadata, manifest, spine, guide。

在OEBPS中的opf包含两个XML： .opf和.ncx。

<?xml version="1.0" encoding="utf-8"?>
<package version="2.0" unique-identifier="BookId" xmlns="http://www.idpf.org/2007/opf">
  <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
    <dc:language>zh</dc:language>
    <dc:title>边城</dc:title>
    <dc:creator>沈从文</dc:creator>
    <meta name="Sigil version" content="2.0.2" />
    <dc:date opf:event="modification" xmlns:opf="http://www.idpf.org/2007/opf">2024-01-19</dc:date>
    <dc:identifier opf:scheme="UUID" id="BookId">urn:uuid:4cf1b6ce-4dc7-4e76-954d-02024718c8d4</dc:identifier>
    <meta name="cover" content="imgfrontcover" />
  </metadata>
  <manifest>
    <item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/>
    <item id="main-css" href="stylesheet.css" media-type="text/css"/>
    <item id="Section0000.xhtml" href="Section0000.xhtml" media-type="application/xhtml+xml"/>
    ......
    ......
    <item id="Section0022.xhtml" href="Section0022.xhtml" media-type="application/xhtml+xml"/>
    <item id="imgfrontcover" href="frontcover.jpg" media-type="image/jpeg"/>
    ......
    ......
    <item id="cover.xhtml" href="cover.xhtml" media-type="application/xhtml+xml"/>
  </manifest>
  <spine toc="ncx">
    <itemref idref="cover.xhtml"/>
    <itemref idref="Section0000.xhtml"/>
    ......
    ......
    <itemref idref="Section0022.xhtml"/>
  </spine>
  <guide>
    <reference type="cover" title="Cover" href="cover.xhtml"/>
  </guide>
</package>

（一）.opf

OPF包括以下内容：

1）metadata：epub的元数据，如title、language、identifier、cover等。其中，title 和 identifier这两个数据是必须的。

按照EPUB规范，identifier由数字图书的创建者定义，必须唯一。对于图书出版商来说，这个字段一般包括ISBN或者Library of Congress编号；也可以使用URL或者随机生成的唯一用户ID。注意：unique-identifier 的值必须和 dc:identifier 元素的 ID 属性匹配。

<?xml version="1.0" encoding="utf-8"?>
<package version="2.0" unique-identifier="BookId" xmlns="http://www.idpf.org/2007/opf">
  <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
    <dc:language>zh</dc:language>
    <dc:title>边城</dc:title>
    <dc:creator>沈从文</dc:creator>
    <meta name="Sigil version" content="2.0.2" />
    <dc:date opf:event="modification" xmlns:opf="http://www.idpf.org/2007/opf">2024-01-19</dc:date>
    <dc:identifier opf:scheme="UUID" id="BookId">urn:uuid:4cf1b6ce-4dc7-4e76-954d-02024718c8d4</dc:identifier>
    <meta name="cover" content="imgfrontcover" />
  </metadata>

2）manifest：列出了package中所包含的所有文件（xhtml、css、png、ncx等）。EPUB 鼓励使用 CSS 设定图书内容的样式，因此 manifest 中也包含 CSS。注意：进入数字图书的所有文件都必须在 manifest 中列出。

  <manifest>
    <item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/>
    <item id="main-css" href="stylesheet.css" media-type="text/css"/>
    <item id="Section0000.xhtml" href="Section0000.xhtml" media-type="application/xhtml+xml"/>
    <item id="Section0001.xhtml" href="Section0001.xhtml" media-type="application/xhtml+xml"/>
    ......
    ......
    <item id="Section0022.xhtml" href="Section0022.xhtml" media-type="application/xhtml+xml"/>
    <item id="imgfrontcover" href="frontcover.jpg" media-type="image/jpeg"/>
    <item id="imgfigure-0003-001" href="figure-0003-001.jpg" media-type="image/jpeg"/>
    ......
    ......
    <item id="cover.xhtml" href="cover.xhtml" media-type="application/xhtml+xml"/>
  </manifest>

3）spine：所有xhtml文档的线性阅读顺序。其中，spine的TOC属性必须包含在manifest列出来的.ncx的id。可以将 OPF spine 理解为是书中 “页面” 的顺序，解析的时候按照文档顺序从上到下依次读取 spine。

在spine中的每个 itemref 元素都需要有一个 idref 属性，这个属性和 manifest 中的某个 ID 匹配。

spine 中的 linear 属性表明该项是作为线性阅读顺序中的一项，还是和先后次序无关。有些阅读器可以将spine中linear=no的项作为辅助选项处理，有些阅读器则选择忽略这个属性。例如在下边的实例中，支持辅助选项处理的阅读器会依次列出titlepage、chapter01、chapter05，chapter02、chapter03、chapter04只在点击到（或者其他开启动作）之后才会显示。

但是对于支持打印的阅读器，需要忽略linear=no的属性，保证能够最完全的展示ops中的内容。

好的阅读器需要同时提供两种选择给用户。

  <spine toc="ncx">
    <itemref idref="cover.xhtml"/>
    <itemref idref="Section0000.xhtml"/>
    ......
    ......
    <itemref idref="Section0022.xhtml"/>
  </spine>

（二）.ncx

NCX 定义了数字图书的目录表。复杂的图书中，目录表通常采用层次结构，包括嵌套的内容、章和节。包含了TOC（tablet of content,提供了分段的一些信息）。

NCX的 <head> 标记中包含四个 meta 元素：

uid：数字图书的惟一 ID。该元素应该和 OPF 文件中的 dc:identifier 对应。
depth：反映目录表中层次的深度。
totalPageCount 和 maxPageNumber：仅用于纸质图书，保留 0 即可。

docTitle/text 的内容是图书的标题，和 OPF 中的 dc:title 匹配。

navMap 是 NCX 文件中最重要的部分，定义了图书的目录。navMap 包含一个或多个 navPoint 元素，每个 navPoint 都要包含下列元素：

playOrder：说明文档的阅读顺序。和 OPF spine 中 itemref 元素的顺序相同。
navLabel/text ：给出该章节的标题。通常是章的标题或者数字。
content ：它的 src 属性指向包含这些内容的物理资源。就是 OPF manifest 中声明的文件。
还可以有一个或多个 navPoint 元素。NCX 使用嵌套的导航点表示层次结构的文档

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE ncx PUBLIC "-//NISO//DTD ncx 2005-1//EN"
 "http://www.daisy.org/z3986/2005/ncx-2005-1.dtd"><ncx version="2005-1" xmlns="http://www.daisy.org/z3986/2005/ncx/">
  <head>
    <meta name="dtb:uid" content="urn:uuid:4cf1b6ce-4dc7-4e76-954d-02024718c8d4"/>
    <meta name="dtb:depth" content="1"/>
    <meta name="dtb:totalPageCount" content="0"/>
    <meta name="dtb:maxPageNumber" content="0"/>
  </head>
  <docTitle>
    <text>边城</text>
  </docTitle>
  <navMap>
    <navPoint id="navPoint-1" playOrder="1">
      <navLabel>
        <text>序</text>
      </navLabel>
      <content src="Section0000.xhtml"/>
    </navPoint>
    ......
    ......
    <navPoint id="navPoint-23" playOrder="23">
      <navLabel>
        <text>附录</text>
      </navLabel>
      <content src="Section0022.xhtml"/>
    </navPoint>
  </navMap>
</ncx>

（三）NCX 和 OPF spine 有什么不同？

两者很容易混淆，因为两个文件都描述了文档的顺序和内容。要说明两者的区别，最简单的办法就是拿印刷书来打比方：OPF spine 描述了书中的各个章节是如何实际连接起来的，比方说翻过第一章的最后一页就看到第二章的第一页。NCX 在图书的一开始描述了目录，目录肯定会包含书中主要的章节，但是还可能包含没有单独分页的小节。

一条法则是 NCX 包含的 navPoint 元素通常比 OPF spine 中的 itemref 元素多。实际上，spine 中的所有项都会出现在 NCX 中，但 NCX 可能更详细。

3. OCF：

OCF定义了文件是如何被打包成ZIP的，并且有两个额外的信息：

1）ASCII格式的mimetype文件。该文件必须包含application/epub+zip字符串，并且是ZIP压缩包的第一个文件。Mimetype要求是非压缩格式。

2）一个命名为META-INF的文件夹。这个文件夹中需要包含container.xml文件

4. Drm

需要在META-INF文件夹中包含rights.xml

5. 总结起来，一个epub电子书的zip包含以下东西：

1、mimetype 文件，必须是压缩包的第一个文件。注意，Mimetype必须是非压缩格式。

2、meta-inf目录，里面至少包含一个container.xml 文件。

3、OEBPS目录（可以是别的名字，但建议用这个名字），包含了：

a) image子目录（不一定总有）存放了所有的图片文件

b) content.opf 文件名可以是其它的，扩展名一定是opf，就是一个xml格式的包内的文件列表

c) toc.ncx 目录文件，一个“逻辑目录”, 浏览控制文件.

d) 一些xhtml或html文件。就是书的内容。

我们可以把一个.epub文件更名为.zip文件，然后通过解压工具展开，看到其目录和文件结构：

边城>tree /f
Folder PATH listing
Volume serial number is 4A56-B1E8
C:.
│   mimetype
│   边城.zip
│
├───META-INF
│       container.xml
│       signatures.xml
│
└───OEBPS
        content.opf
        cover.xhtml
        figure-0003-001.jpg
        figure-0003-002.jpg
        figure-0003-003.jpg
        figure-0003-004.jpg
        figure-0003-005.jpg
        figure-0003-007.jpg
        figure-0003-008.jpg
        frontcover.jpg
        Section0000.xhtml
        Section0001.xhtml
        Section0002.xhtml
        Section0003.xhtml
        Section0004.xhtml
        Section0005.xhtml
        Section0006.xhtml
        Section0007.xhtml
        Section0008.xhtml
        Section0009.xhtml
        Section0010.xhtml
        Section0011.xhtml
        Section0012.xhtml
        Section0013.xhtml
        Section0014.xhtml
        Section0015.xhtml
        Section0016.xhtml
        Section0017.xhtml
        Section0018.xhtml
        Section0019.xhtml
        Section0020.xhtml
        Section0021.xhtml
        Section0022.xhtml
        stylesheet.css
        toc.ncx

三、完整的EPUB规范

OPF规范：http://www.idpf.org/2007/opf/OPF_2.0_final_spec.html

OPS规范：http://www.idpf.org/2007/ops/OPS_2.0_final_spec.html

OEBPS规范：http://www.idpf.org/ocf/ocf1.0/download/ocf10.htm

四、如何制作一个epub电子书

制作epub电子书，可以不借助任何电子图书编辑软件，只进行文档编辑和zip压缩就可以了，具体步骤如下：

先建一个空的zip文件，可以取为任何名字，最好和你的书同名。
拷贝mimetype文件到包内，注意所谓拷贝，就是这个文件不要用压缩模式。
把其它的目录和文件用压缩模式放入zip包。
改文件扩展名为.epub
OK，一本epub电子书就制成了

市面上可以找到一些成熟的电子图书编辑软件：

epubBuilder：epubbuilder是国人自做软件，手工制作时还是很好用的，尤其是每个章节的制作和目录，比较方便，还提供了导入chm，txt，html文件的功能，比较人性化，ecub：
Calibre
Adobe InDesign
Stanza
OpenBerg Rector
ePUB check tool
Convert uploads to ePUB
Web2FB2
Python converter
DAISY Pipeline

下面介绍一个一个免费的、开源的、多平台的电子书编辑器

Sigil介绍

Sigil 是一个免费的、开源的、多平台的电子书编辑器，使用 Qt（和 QtWebEngine）。它被设计用来编辑 ePub 格式的书籍（包括 ePub 2 和 ePub 3）。

下载地址：Download Sigil - Sigil-Ebook

Sigil 功能特性：

基于 GPLv3 协议，完全免费的开源软件；
跨平台，Windows，Linux，Mac 都可以使用它来制作 epub，果粉们有福了；
支持 Unicode，经过软件掘客尝试，对中文的支持完全没有问题；
多视角的编辑器：书籍预览模式，代码预览模式，双阅览模式；
带有元数据编辑器；
支持多层次的分级目录编辑器，可以自动根据标题生成目录，一级棒；
所有导入文档都将自动转换成 Unicode 编码；
支持 TXT，HTML，EPUB 多格式文本的导入，以后会支持更多格式；
友好的用户界面；
C++ 程序，无需任何.NET，JRE 等运行库支持。

Sigil使用手册：Sigil User Guide - Sigil-Ebook

通过Sigil快速将TXT文本制作成EPUB电子书

一、预处理txt文本

可以使用任何通用的文本编辑工具来做，这里使用UltraEdit举例。

1. 使用UltraEdit打开文本文件

2. 使用16进制模式显示

可以看到换行字符的ANCII码0D 0A。

3. 转换换行符，将正文转换为xhml格式一段文字

4. 转换后的文本

基本做到了每个段落格式都是一段文字

5. 用Sigil直接将.TXT转换成.xhtml

其实，直接用Sigil打开一个.txt文件，就会自动将整篇文档转换成一个.xhtml文件OEBPS/Text/section0001.xhtml，自动为段落加上。

二、使用Sigil制作电子书

使用Sigil直接将整篇.TXT文件转换成.xhtml格式后，可以用split切成一个一个的章节来制作电子书，读者可以自己做尝试。这里不做演示。

这里只演示如何先建立一个空白的电子书，然后逐步分章节，填入标题和内容。

1. 打开Sigil, 制作一个新的epub2格式的电子书

2. 创建空白章节

有几章节，就创建几个空白页面。

3. 填写书名，各个章节标题，然后保存为.epub文件

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title>海风阵阵吹</title>
</head>

<body>
<h1 class="fxx">第一章	初听琴声</h1>
  <p>&nbsp;</p>
</body>
</html>

4. 填写各个章节正文

可以在右边的预览窗口进行检查，如果格式有错误，会有提示。

5. 制作封面

导入一个封面图片。

使用工具，制作封面

选中封面图片

6. 生成TOC

7. 完善.opf和ncx文件，填写书名

content.opf:

<?xml version="1.0" encoding="utf-8"?>
<package version="2.0" unique-identifier="BookId" xmlns="http://www.idpf.org/2007/opf">
  <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
    <dc:identifier opf:scheme="UUID" id="BookId">urn:uuid:20bb5f6e-3752-4649-8f46-b9a98e7c2276</dc:identifier>
    <dc:language>en</dc:language>
    <dc:title>海风阵阵吹</dc:title>
    <meta name="Sigil version" content="2.0.2"/>
    <dc:date opf:event="modification" xmlns:opf="http://www.idpf.org/2007/opf">2024-01-19</dc:date>
    <meta name="cover" content="haifengzhenzhenchui.jpg"/>
  </metadata>
  <manifest>
    <item id="Section0001.xhtml" href="Text/Section0001.xhtml" media-type="application/xhtml+xml"/>
    <item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/>
    <item id="Section0002.xhtml" href="Text/Section0002.xhtml" media-type="application/xhtml+xml"/>
    <item id="Section0003.xhtml" href="Text/Section0003.xhtml" media-type="application/xhtml+xml"/>
    <item id="Section0004.xhtml" href="Text/Section0004.xhtml" media-type="application/xhtml+xml"/>
    <item id="haifengzhenzhenchui.jpg" href="Images/海风阵阵吹.jpg" media-type="image/jpeg"/>
    <item id="cover.xhtml" href="Text/cover.xhtml" media-type="application/xhtml+xml"/>
  </manifest>
  <spine toc="ncx">
    <itemref idref="cover.xhtml"/>
    <itemref idref="Section0001.xhtml"/>
    <itemref idref="Section0002.xhtml"/>
    <itemref idref="Section0003.xhtml"/>
    <itemref idref="Section0004.xhtml"/>
  </spine>
  <guide>
    <reference type="cover" title="Cover" href="Text/cover.xhtml"/>
  </guide>
</package>

toc.ncx

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ncx PUBLIC "-//NISO//DTD ncx 2005-1//EN"
   "http://www.daisy.org/z3986/2005/ncx-2005-1.dtd">

<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1">
  <head>
    <meta name="dtb:uid" content="urn:uuid:20bb5f6e-3752-4649-8f46-b9a98e7c2276"/>
    <meta name="dtb:depth" content="1"/>
    <meta name="dtb:totalPageCount" content="0"/>
    <meta name="dtb:maxPageNumber" content="0"/>
  </head>
  <docTitle>
    <text>海风阵阵吹</text>
  </docTitle>
  <navMap>
    <navPoint id="navPoint-1" playOrder="1">
      <navLabel>
        <text>第一章 初听琴声</text>
      </navLabel>
      <content src="Text/Section0001.xhtml"/>
    </navPoint>
    <navPoint id="navPoint-2" playOrder="2">
      <navLabel>
        <text>第二章 海滨重逢</text>
      </navLabel>
      <content src="Text/Section0002.xhtml"/>
    </navPoint>
    <navPoint id="navPoint-3" playOrder="3">
      <navLabel>
        <text>第三章 京城惊魂</text>
      </navLabel>
      <content src="Text/Section0003.xhtml"/>
    </navPoint>
    <navPoint id="navPoint-4" playOrder="4">
      <navLabel>
        <text>第四章 海岛深酬</text>
      </navLabel>
      <content src="Text/Section0004.xhtml"/>
    </navPoint>
  </navMap>
</ncx>