如何使用TypeScript从Microsoft Word生成GitHub markdown文件

by Manish Bansal

通过Manish Bansal

What? Why would one want to generate an MD file from a Microsoft word document? If that’s the first thought you had after reading this title, then let me give you a strong use case.

什么? 为什么要从Microsoft Word文档生成MD文件? 如果这是您阅读本标题后的第一个想法,那么让我给您一个强大的用例。

Consider a situation where you are using Git or any other version control system (VCS) for your project’s sources as well as its artifacts. Now, like most projects, you decide to use Microsoft word for documentation and check it into Git. Again, multiple team members edit the same document. After editing, they check-in the document into the repository.

考虑一种情况,其中您将Git或任何其他版本控制系统(VCS)用于项目的源代码及其工件。 现在,像大多数项目一样,您决定使用Microsoft Word作为文档,并将其检入Git。 同样,多个团队成员编辑同一文档。 编辑后,他们将文档检入到存储库中。

Now, Git will be able to maintain the history of your document. How will you be able to look at the changes that have been made to the document since you last checked it in? Yes, you can use Microsoft word’s track change mode, but isn’t that messy? Or for heaven’s sake, will you be able to use Git diff utility to check the differences quickly? I would say, no.

现在,Git将能够维护您的文档历史记录。 自上次签入后,您将如何查看对文档所做的更改? 是的,您可以使用Microsoft Word的音轨更改模式,但这不是很乱吗? 还是为了天堂,您是否可以使用Git diff实用程序快速检查差异? 我会说,不。

Then what is the solution? Should you stop using Microsoft Word for documentation? Or should you switch to some other VCS?

那该怎么办呢? 您是否应该停止使用Microsoft Word作为文档? 还是应该切换到其他VCS?

I would say neither. How about you maintain your documentation in Microsoft word? Then change it into a markdown (MD) file (in layman terms, a text file) during the build phase and check in? If that solution excites you, then keep reading.

我不会说。 您如何用Microsoft Word维护文档? 然后在构建阶段将其更改为markdown(MD)文件(以通俗易懂的术语为文本文件)并签入? 如果该解决方案使您兴奋,请继续阅读。

But before jumping right into conversion, let me first tell you what exactly is a markdown file.

但是,在开始进行转换之前,让我先告诉您Markdown文件到底是什么。

什么是降价促销或MD文件? (What is a markdown or an MD file?)

Markdown is a syntax language aiming for easy reading and writing structured text. Further, it is easy to learn, and it only requires a text editor to create a document.

Markdown是一种语法语言,旨在轻松阅读和编写结构化文本。 此外,它很容易学习,并且只需要文本编辑器即可创建文档。

Now, there are multiple implementations of the language (like GFM aka Github flavored Markdown). Each of these implementations has their own improvements and features that are not necessarily compatible with each other.

现在,该语言有多种实现方式(例如GFM或 Github风格的Markdown)。 这些实现中的每一个都有自己的改进和功能,不一定彼此兼容。

Each implementation supports various common features like paragraphs, blockquotes, headings, and lists. This helps in maintaining text in a structured manner like Microsoft Word. But, instead of using internal binary codes, MD files use plain text characters for these features. This makes an MD file a text file but not a binary file like a docx file.

每个实现都支持各种通用功能,例如段落,块引用,标题和列表。 这有助于以结构化方式(如Microsoft Word)维护文本。 但是,MD文件不使用内部二进制代码,而是使用纯文本字符来实现这些功能。 这使MD文件成为文本文件,而不是像docx文件这样的二进制文件。

For example, in GitHub’s markdown flavor, here are the various features and ways of representing them in the form of text compared to a word document.

例如,以GitHub的markdown风格,这里是与word文档相比以文本形式表示它们的各种功能和方式。

For the detailed advantages of MD files over word documents, you can also refer to this article.

对于MD文件在Word文档的详细优点,你也可以参考这个文章。

好! 我确信。 给我看代码。 (OK! I am convinced. Show me the code.)

Disclaimer: This project is inspired by TypeScript source code. While browsing it, I found this idea of converting a word document to an MD file. You can see its source code here.

免责声明:该项目的灵感来自TypeScript源代码。 浏览它时,我发现了将Word文档转换为MD文件的想法。 您可以在此处查看其源代码。

For simplicity, I have removed a few sections of code in my repository. The original code was meant to convert TypeScript specification documentation to an MD file. This file contains lots of customized styles. So, once you are done with this article, you can very much go through TypeScript converter code and appreciate it’s abilities to perform more complex conversions.

为简单起见,我在存储库中删除了几部分代码。 原始代码旨在将TypeScript规范文档转换为MD文件。 此文件包含许多自定义样式。 因此,在完成本文的工作之后,您几乎可以遍历TypeScript转换器代码,并欣赏它执行更复杂的转换的能力。

The complete code mentioned in this article can be referred to here. The whole code can be divided into 3 sections:

本文提到的完整代码可以在这里参考 。 整个代码可以分为3部分:

  1. Gulp Configurations.

    Gulp配置。
  2. CScript execution.

    CScript执行。
  3. TypeScript main function

    TypeScript主要功能

As stated earlier, you can convert a word document to a MD file during the build phase. This can be done by any task runner. Here, I have chosen gulp.

如前所述,您可以在构建阶段将Word文档转换为MD文件。 任何任务赛跑者都可以做到这一点。 在这里,我选择了大口吃。

In Gulp configurations, I have defined 3 tasks. First one is to clean the build directory which is pretty standard. Second is to compile the TypeScript code. And the last one is to call the CScript for executing the JavaScript.

在Gulp配置中,我定义了3个任务。 第一个是清理非常标准的构建目录。 二是编译TypeScript代码。 最后一个是调用CScript来执行JavaScript。

什么是CScript? (What is CScript?)

CScript.exe (present in C:\Windows\System32) is a console-based executable for the scripting host that are used to run the scripts. It can interpret scripting languages like VB Script or JavaScript. Similarly, we have WScript but it is used for windows applications. In this, the console is not attached. So if you have a requirement of creating a console based application, we can use CScript.

CScript.exe(在C:\ Windows \ System32中存在)是用于脚本宿主的基于控制台的可执行文件,用于运行脚本。 它可以解释脚本语言,例如VB脚本或JavaScript。 同样,我们有WScript,但它用于Windows应用程序。 在这种情况下,未连接控制台。 因此,如果您需要创建基于控制台的应用程序,则可以使用CScript。

Now, in our project, the main job of CScript is to provide a run-time environment to our script i.e. JavaScript. Now, you must be thinking, why haven’t I used node instead of CScript to run my JavaScript.

现在,在我们的项目中,CScript的主要工作是为脚本(即JavaScript)提供运行时环境。 现在,您必须在思考,为什么我没有使用node而不是CScript来运行我JavaScript。

Both provide a run-time environment for a JavaScript. CScript provides inherent support for windows component object model technique. So if you try to run this script via Node, you will get an error like this.

两者都为JavaScript提供了运行时环境。 CScript为Windows组件对象模型技术提供了固有的支持。 因此,如果您尝试通过Node运行此脚本,则会收到这样的错误。

var fileStream = new ActiveXObject(“ADODB.Stream”);
var fileStream = new ActiveXObject(“ ADODB.Stream”);
ReferenceError: ActiveXObject is not defined
ReferenceError:未定义ActiveXObject

Now, what is a component object model technique?

现在,什么是组件对象模型技术?

Component object model is a technology developed by Microsoft. It is not a language but a binary standard. As per the definition,

组件对象模型是Microsoft开发的一种技术。 它不是语言,而是二进制标准。 根据定义,

The Microsoft Component Object Model (COM) is a platform-independent, distributed, object-oriented system for creating binary software components that can interact. COM is the foundation technology for Microsoft’s OLE (compound documents), ActiveX (Internet-enabled components), as well as others.

Microsoft组件对象模型( COM )是一个独立于平台,分布式,面向对象的系统,用于创建可以交互的二进制软件组件。 COM是Microsoft的OLE(复合文档),ActiveX(支持Internet的组件)以及其他产品的基础技术。

In layman terms, COM objects are interfaces to the various runtime objects. (That’s why the definition has a term called “binary software components”). It is not a language, but a technique which is programming language agnostic.

用外行术语来说,COM对象是各种运行时对象的接口。 (这就是为什么该定义有一个术语称为“二进制软件组件”的原因)。 它不是语言,而是一种与语言无关的技术。

The only language requirement for COM is that code is generated in a language that can create structures of pointers. Either explicitly or implicitly, call functions through pointers. Object-oriented languages such as C++ and Smalltalk provide programming mechanisms that simplify the implementation of COM objects

COM的唯一语言要求是以一种可以创建指针结构的语言生成代码。 通过指针来显式或隐式地调用函数。 诸如C ++和Smalltalk之类的面向对象语言提供了简化COM对象的实现的编程机制。

After that, we can use any other language like Java, VB or JavaScript to interact with those COM objects. This will further give us access to runtime applications. In our case, to Microsoft word applications.

之后,我们可以使用任何其他语言(如Java,VB或JavaScript)与这些COM对象进行交互。 这将进一步使我们能够访问运行时应用程序。 在我们的案例中,要使用Microsoft Word应用程序。

So, are you saying we cannot use Node at all here?

那么,您是在说我们根本不能使用Node吗?

No, that is not true. We can use Node also instead of CScript. But to support COM, we will need to install another package called win32com for COM support. Details can be found here.

不,那不是真的。 我们也可以使用Node代替CScript。 但是要支持COM,我们将需要安装另一个名为win32com的软件包来获得COM支持。 详细信息可以在这里找到。

最终代码 (Final code)

Now, in order to interact with word application, various APIs have been used. And since we are using the COM object model, I referred to the word object model.

现在,为了与单词应用程序进行交互,已使用了各种API。 由于我们使用的是COM对象模型,因此我将其称为对象模型

Word provides hundreds of objects with which you can interact. These objects are organized in a hierarchy that closely follows the user interface. At the top of the hierarchy is the Application object. This object represents the current instance of Word. The Application object contains the Document, Selection, Bookmark, and Range objects. Each of these objects has many methods and properties that you can access to manipulate and interact with the object.

Word提供了数百个可以与之交互的对象。 这些对象以紧密跟随用户界面的层次结构进行组织。 层次结构的顶部是Application对象。 该对象表示Word的当前实例。 Application对象包含Document,Selection,Bookmark和Range对象。 这些对象中的每一个都有许多方法和属性,您可以访问这些方法和属性来操作和与对象交互。

Now, in our script, we have first created a word application object by using ActiveXObject. Once the application object is obtained, the document object is created by passing the name of the document (obtained from command line arguments of cscript calling).

现在,在脚本中,我们首先使用ActiveXObject创建了word应用程序对象。 一旦获得了应用程序对象,就通过传递文档名称(从cscript调用的命令行参数获得)来创建文档对象。

Now, this represents the active object of the actual document. This object is capable of parsing as well as manipulating the word document. However, in our use case, we only need to parse the document and write a text file.

现在,这表示实际文档的活动对象。 该对象能够解析和处理word文档。 但是,在我们的用例中,我们只需要解析文档并编写一个文本文件。

This code is very generic, which is used to convert very basic features of a word document like cross-references, lists, subscript texts, bold and italic characters etc. into GFM format. However, you can write your own code converting your customized styles of the word document into the desired format.

该代码非常通用,用于将Word文档的非常基本的功能(例如交叉引用,列表,下标文本,粗体和斜体字符等)转换为GFM格式。 但是,您可以编写自己的代码,将您的Word文档的自定义样式转换为所需的格式。

You can find the actual typescript code here. The code is quite easy to read. Below are few major highlights of it:

您可以在此处找到实际的打字稿代码。 该代码很容易阅读。 以下是它的一些主要亮点:

  1. First, a document object is passed to convertDocumentToMarkdown function which returns the text to be written in an MD file.

    首先 ,将文档对象传递给convertDocumentToMarkdown函数,该函数返回要写入MD文件中的文本。

  2. Further, in convertDocumentToMarkdown, methods and properties of the document object are called to find and replace relevant word features with the corresponding GFM language syntax. E.g. first, subscript and bold & italic texts are searched. After that, the text is replaced by GFM specific code. And finally, the word styles are removed. All this is done here.

    此外,在convertDocumentToMarkdown中,将调用文档对象的方法和属性,以使用相应的GFM语言语法查找和替换相关的单词特征。 例如,首先搜索下标,粗体和斜体文本。 之后,该文本将替换为GFM特定代码。 最后,单词样式被删除。 所有这些都在这里完成。

  3. After this, cross-references are replaced. However, this is tricky. First, the toggleShowCodes function is called. This has a similar impact as alt+F9 in a word document. This replaces all the cross-references in a document with the code. After that, find and replace method is called to find and replace all cross-references with GFM style. Here, “19 REF” is passed as an argument to a function. This is a standard search criterion for finding all cross-references in a word document. At last, after replacing, again the toggleShowCodes function is called to bring back the document to its original form.

    此后,将替换交叉引用。 但是,这很棘手。 首先,调用toggleShowCodes函数。 这与Word文档中的alt + F9具有类似的影响。 这将用代码替换文档中的所有交叉引用。 此后,将调用find and replace方法,以使用GFM样式查找和替换所有交叉引用。 在此,“ 19 REF”作为参数传递给函数。 这是用于查找Word文档中所有交叉引用的标准搜索条件。 最后,在替换之后,再次调用toggleShowCodes函数将文档恢复为原始格式。

  4. At last, the writeDocument function is called which does the main job. It reads the document paragraph by paragraph and then, using switch case, looks for the styles of the paragraphs (like if it’s a heading or a table or a list paragraph or an image). Now, depending on the found style, the desired text is written in the MD file.

    最后,调用writeDocument函数完成主要工作。 它逐段读取文档,然后使用切换大小写查找段落的样式(例如,如果它是标题,表格,列表段落或图像)。 现在,根据找到的样式,将所需的文本写入MD文件中。

A word or two on embedding images: Embedding images into an MD file is a bit tricky.

关于嵌入图像的一两个单词:将图像嵌入MD文件有点棘手。

First, you need to store the images on your git repository. Then the link has to be given in the MD file for embedding in it. The syntax is ![alt text](path/in/the/repository/image1.jpg).

首先,您需要将图像存储在git存储库中。 然后,必须在MD文件中提供链接以嵌入该链接。 语法为![替代文字](path / in / the / repository / image1.jpg)。

Now, in order to auto-generate this link for an image while converting word into an MD file, hidden text is created (just after the image without any space) with content as the link itself. And then in the code, this hidden text is stripped off and inserted into the MD file.

现在,为了在将word转换为MD文件时自动为图像生成此链接,将创建隐藏文本(紧随图像之后没有任何空格),其内容本身就是链接。 然后在代码中 ,将这些隐藏的文本剥离并插入到MD文件中。

Now, you might find the actual code to do all this stuff very tedious, but this is all as per the API exposed by the Word application. So do not worry about that. You can definitely refer my code or TypeScript’s original code. Both will be a good starter for your next project.

现在,您可能会发现执行所有这些操作的实际代码非常繁琐,但这都是根据Word应用程序公开的API进行的。 因此,不必为此担心。 您绝对可以引用我的代码或TypeScript的原始代码。 两者都是您下一个项目的良好入门。

Oh wait!! That is it. You made it till the end ?. Well, then ? Congratulations! ? And, If you liked this article, please hit that clap ? button below. It would mean a lot to me and it will help other people see the story. Cheers! ?

等一下!! 这就对了。 你做到了吗? 好吧 ? 恭喜你! ? 而且,如果您喜欢这篇文章,请打一下? 下方的按钮。 这对我来说意义重大,它将帮助其他人了解这个故事。 干杯! ?

翻译自: https://www.freecodecamp.org/news/how-to-generate-a-github-markdown-file-from-microsoft-word-using-typescript-a8976ea958c3/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值