使用JavaScript的textContent提取和插入文本

最新推荐文章于 2023-10-15 16:41:26 发布

cungui5726

最新推荐文章于 2023-10-15 16:41:26 发布

阅读量1.1k

点赞数

文章标签： java python js html javascript ViewUI

原文链接：https://thenewcode.com/489/Extracting-and-Inserting-Text-with-JavaScripts-textContent

版权

textContent是JavaScript中用于插入和提取文本的安全高效方式。它能防止XSS攻击，且速度快。提取时，所有HTML标记都会被移除，仅保留文本内容。开放标签可能导致问题，需要正确闭合以避免意外解析。

摘要由CSDN通过智能技术生成

Many aspects of web development require extracting text from a page while cleansing it of markup: populating an RSS feed, for example, or filling a JSON request with page data. There are also plenty of occasions when you’ll need to fill a newly created element with text: for example, creating a label for a <button>. In both cases, the safest and most efficient way to achieve these ends in JavaScript is usually via the textContent property.

Web开发的许多方面都需要在清除标记的同时从页面提取文本：例如，填充RSS feed或用页面数据填充JSON请求。在很多情况下，您需要用文本填充新创建的元素：例如，为<button>创建标签。在这两种情况下，在JavaScript中实现这些目标的最安全，最有效的方法通常是通过textContent属性。

插入文字内容 (Inserting Text Content)

Let’s say that we’ve created a new button element, referenced in JavaScript as hitSwitch:

假设我们创建了一个新的button元素，在JavaScript中称为hitSwitch ：

var hitSwitch = document.createElement("button");

We want to place text inside that element: i.e. between the opening <button> and closing </button> tags, before placing the element on the page. Traditionally, that would call for innerHTML, but there are two downsides to that approach:

我们想在该元素内放置文本：即，在将元素放置在页面上之前，在<button>和</button>标记之间。传统上，这需要使用innerHTML ，但是该方法有两个缺点：

innerHTML can be used as a vector for cross-site scripting attacks (XSS)
innerHTML可用作跨站点脚本攻击(XSS)的载体
innerHTML’s execution speed is little slow, as it parses the text before adding it to the element.
innerHTML的执行速度有点慢，因为它会先分析文本，然后再将其添加到元素中。

In most cases, textContent is a better choice:

在大多数情况下， textContent是更好的选择：

hitSwitch.textContent = "Hit me with your rhythm stick";

hitSwitch now appears as:

hitSwitch现在显示为：

<button>Hit me with your rhythm stick</button>

textContent only adds text: if you needed to add HTML markup at the same time, innerHTML or insertAdjacentHTML are better choices. When textContent is used to set text, it will replace any text and markup that already exists inside the referenced element. For example, you could remove the entire content of a web page by using the following:

textContent 仅添加文本：如果您需要同时添加HTML标记， innerHTML或insertAdjacentHTML是更好的选择。当使用textContent设置文本时，它将替换被引用元素内部已经存在的任何文本和标记。例如，您可以使用以下方法删除网页的全部内容：

document.body.textContent = "";

提取文字 (Extracting Text)

textContent can also be used to extract content from a page. If we have the following:

textContent也可以用于从页面提取内容。如果我们具有以下条件：

<p id="futuro"><strong>Futurism</strong> (Italian: <em>Futurismo</em>):
an artistic and social movement that originated in Italy before 
<abbr title="World War">WW</abbr>I.
It emphasized speed, technology, youth, and violence, 
together with new industrial objects&hellip; the car, the aeroplane, 
the train, and the modern city.</p>

Then we can pull the text content of the paragraph only by using the following:

然后，我们只能使用以下命令拉出段落的文本内容 ：

var futurism = document.getElementById("futuro");
var textExtract = futurism.textContent;

Printed to the console, textExtract would appear as:

打印到控制台后， textExtract将显示为：

"Futurism (Italian: Futurismo):
an artistic and social movement that originated in Italy before 
WW I. It emphasized speed, technology, youth, and violence, 
together with new industrial objects… the car, the aeroplane, 
the train, and the modern city."

There are several things to note about this extraction technique:

关于此提取技术，需要注意以下几点：

The original content on the page remains unchanged.
页面上的原始内容保持不变。
All HTML markup is removed from the extracted text, including tags inside the referenced element. Text content between those tags is retained.
所有 HTML标记都将从提取的文本中删除，包括引用的元素内的标签。这些标签之间的文本内容将保留。
HTML entities are automatically converted into their on-screen representation.
HTML实体会自动转换为其屏幕上的表示形式。
Images, being tags, will be removed entirely, and their alt values will not appear in the extraction.
作为标签的图像将被完全删除，并且其alt值不会出现在提取中。
To shorten our code, we could merge the two lines of JavaScript into:
为了缩短代码，我们可以将JavaScript的两行合并为：

var textExtract = document.getElementById("futuro").textContent;

开放标签的危险 (The Danger of Open Tags)

Previously I have pointed out that elements like <p> can be written without a closing tag. This is always optional; and if you’re ever going to use textContent, the practice can be dangerous. If we remove the closing tag from the text sample, and add an inline <script> after it:

之前，我曾指出<p>类的元素可以不带有结束标记而编写。这始终是可选的；如果您要使用textContent ，那么这种做法可能很危险。如果我们从文本样本中删除结束标记，并在其后添加一个内联<script> ：

<p id="futuro"><strong>Futurism</strong> (Italian: <em>Futurismo</em>):
an artistic and social movement that originated in Italy before 
<abbr title="World War">WW</abbr>I.
It emphasized speed, technology, youth, and violence, 
together with new industrial objects&hellip; the car, the aeroplane, 
the train, and the modern city.
<script>
var dt = dy + dx;
</script>

Then repeat the same JavaScript:

然后重复相同JavaScript：

var textExtract = document.getElementById("futuro").textContent;

The resulting value of textExtract is now:

现在， textExtract的结果值为：

Futurism (Italian: Futurismo):
an artistic and social movement that originated in Italy before 
WWI. It emphasized speed, technology, youth, and violence, 
together with new industrial objects… the car, the aeroplane, 
the train, and the modern city.

var dt = dy + dx;

Why does this happen? JavaScript sees the new <script> tag as being inside the paragraph (due to the paragraph’s lack of a closing tag). It eliminates the script markup itself, but interprets the remaining code as “text”. To avoid this, we just need to close the paragraph with a </p>, clarifying where the paragraph ends.

为什么会这样？ JavaScript的看到新<script>标记为段落内 (由于段缺少结束标记的)。它消除了脚本标记本身，但将其余代码解释为“文本”。为避免这种情况，我们只需要用</p>关闭该段落，以澄清该段落的结尾。

textContent will not pull text from inside a comment that happens to be part of a referenced element… which is fortunate, considering the language that many developers leave inside their comments.

考虑到许多开发人员在其注释中保留的语言， textContent 不会从恰好是引用元素一部分的注释中提取文本……这是幸运的。

结论 (Conclusion)

textContent is a very useful property to have in your arsenal of JavaScript techniques to manipulate and extract from the DOM, with terrific support: most browsers have supported it from their earliest versions, and Microsoft has support from IE9 (having abandoned its earlier proprietary .innerText property, which had the same functionality).

textContent是您JavaScript技术库中非常有用的属性，可用于操作和从DOM提取内容，并提供了出色的支持：大多数浏览器从最早的版本开始就对其提供支持，而Microsoft从IE9中获得了支持(放弃了其早期的专有.innerText属性，具有相同的功能)。

翻译自: https://thenewcode.com/489/Extracting-and-Inserting-Text-with-JavaScripts-textContent

cungui5726

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
使用JavaScript的textContent提取和插入文本

Many aspects of web development require extracting text from a page while cleansing it of markup: populating an RSS feed, for example, or filling a JSON request with page data. There are also plenty o...
复制链接

扫一扫