从TypeScript视角看HTML DOM（二）：Node与Element

最新推荐文章于 2024-05-12 15:35:08 发布

元无心

最新推荐文章于 2024-05-12 15:35:08 发布

阅读量3.6k

点赞数 2

分类专栏：前端杂谈从TypeScript视角看HTML DOM 文章标签： TypeScript HTML DOM

本文链接：https://blog.csdn.net/HermitSun/article/details/95780601

版权

前端杂谈同时被 2 个专栏收录

68 篇文章 4 订阅

订阅专栏

从TypeScript视角看HTML DOM

7 篇文章 0 订阅

订阅专栏

要说DOM，绕不开的自然是结点（Node）的问题[1]。我们在实际使用的时候，应该都见过这种现象，对于同一段HTML，用children和childNodes获取到的内容不太一样。举一个从网上看到的例子:

<html>
  <body>
    <h1>China</h1>
    <!-- My comment ...  -->
    <p>China is a popular country with...</p>
    <div>
      <button>See details</button>
    </div>
  </body>
</html>

对它分别调用children和childNodes：

document.body.children
// HTMLCollection(3) [h1, p, div]
document.body.childNodes
// NodeList(9) [text, h1, text, comment, text, p, text, div, text]

差别还是挺大的，两个属性的返回值都不一样，children返回的是HTMLCollection，childNodes返回的是NodeList。这两个是什么？我们看看微软的实现：

// HTMLCollection
interface HTMLCollectionBase {
    readonly length: number;
    item(index: number): Element | null;
    [index: number]: Element;
}

interface HTMLCollection extends HTMLCollectionBase {
    namedItem(name: string): Element | null;
}
// NodeList
interface NodeList {
    readonly length: number;
    item(index: number): Node | null;
    forEach(callbackfn: (value: Node, key: number, parent: NodeList) => void, thisArg?: any): void;
    [index: number]: Node;
}

可以看到，HTMLCollection是Element的容器，而NodeList则是Node的容器。事实上，Element的Node的子类，因为DOM对整个HTML的抽象方式，每一个标签都会被抽象为一个结点。我们先来看看结点的定义是什么。按照MDN的解释：

A Node is an interface from which a number of DOM types inherit, and allows these various types to be treated (or tested) similarly.

The following interfaces all inherit from Node its methods and properties: Document, Element, CharacterData (which Text, Comment, and CDATASection inherit), ProcessingInstruction, DocumentFragment, DocumentType, Notation, Entity, EntityReference.

之前看到某位dalao概括得不错：

“Node是一个基类，DOM中的Element，Text和Comment都继承于它。换句话说，Element，Text和Comment是三种特殊的Node，它们分别叫做ELEMENT_NODE, TEXT_NODE和COMMENT_NODE。我们平时使用的HTML上的元素，即Element，是类型为ELEMENT_NODE的Node。”

事实上，这个可以从MDN上的Node type constants表里看到：
在这里插入图片描述
具体实现上是不是这样呢？我们来看看（移除了部分跟此次讨论无关的内容）：

interface Node extends EventTarget {}
interface Element extends Node {}
interface CharacterData extends Node {}
interface Text extends CharacterData {}
interface CDATASection extends Text {}
interface Comment extends CharacterData {}

的确如此。插一句，按照目前的规范，有几种结点类型已经废弃了，有兴趣可以看看MDN上怎么说的，而且，很有意思的是，在微软的实现里，除了上述的三个之外，还有一个ProcessingInstruction也是继承于CharacterData的：

interface ProcessingInstruction extends CharacterData {}

这一点是不符合MDN上所说的。不知道是MDN错了，还是微软错了？

继续。我们知道了Node是整个DOM的最大粒度抽象之后，对于Element包含的内容，还需要一些了解。按照MDN的说法：

Element is the most general base class from which all objects in a Document inherit. It only has methods and properties common to all kinds of elements. More specific classes inherit from Element. For example, the HTMLElementinterface is the base interface for HTML elements, while the SVGElement interface is the basis for all SVG elements. Most functionality is specified further down the class hierarchy.

Element中包含了HTMLElement和SVGElement，其中HTMLElement是所有HTML的基本接口，SVGElement不在我们这次的讨论范围内。随便举几个例子看一看就知道了：

interface HTMLElement extends Element {}
interface HTMLFormElement extends HTMLElement {}
interface HTMLFrameElement extends HTMLElement {}
interface HTMLImageElement extends HTMLElement {}
interface HTMLInputElement extends HTMLElement {}
// ...还有很多，这里只是随便找了几个

可以看到，这些和我们平时在HTML里写的标签是一一对应的。也就是说，HTMLElement只包含我们平时用到的那些HTML元素，<form>、<frame>、<img>、<input>……

插一句，也许有必要解释一下为什么继承HTMLElement。我感觉这个跟组件化有关，理论上的正确性是很显然的；面向对象理论早已用实践证明了这一点。至于实现上，我觉得跟WebComponents有关：

customElements.define('foo', class Foo extends HTMLElement {// ...});

这是自定义一个标签的方法。但是从广义上说，原生标签其实也可以算作是“自定义”标签。

了解了Node和Element之后，之前的问题的答案其实就很显然了。childNodes会显示所有的子结点，所以会出现text、comment等一系列结点，所以才会出现[text, h1, text, comment, text, p, text, div, text]这一大堆；而Element只是Node的一个子类，其中只包括了HTMLElement（因为这里没有SVGElement），所以只会显示我们直观感受上的“结点”h1、 p、div。

解决了种类问题，就要解决数量问题了。children的输出结果是三个，这个很好理解，也符合我们的直观认知；但是childNodes为什么是九个呢？我们随便找一个text结点，看看其中的内容：

document.body.childNodes[0].wholeText
// "↵"

这下很明显了。我们为了可读性而添加的换行符，也被当成了Text结点塞进了DOM树里。也就是说，我们写的HTML会变成类似这样：

<html>
  <body>
    ↵
    <h1>China</h1>
    ↵
    <!-- My comment ...  -->
    ↵
    <p>China is a popular country with...</p>
    ↵
    <div>
      <button>See details</button>
    </div>
    ↵
  </body>
</html>

数一数，5个Text结点+3个HTMLElement结点+1个Comment结点，正好九个。这就说得通了。

注[1]：在这里，我认为Node译作结点较为合适。正如谢希仁《计算机网络》中提到的，按照MINGCI94，在计算机网络领域，Node的标准译名是“结点”。HTML DOM显然是属于计算机网络领域的。

参考资料

2
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
从TypeScript视角看HTML DOM（二）：Node与Element

要说DOM，绕不开的自然是结点（Node）的问题[1]。我们在实际使用的时候，应该都见过这种现象，对于同一段HTML，用children和childNodes获取到的内容不太一样。举一个从网上看到的例子:<html> <body> <h1>China</h1>  ...
复制链接

扫一扫