post.html,GitHub - posthtml/posthtml-parser: Parse HTML/XML to PostHTMLTree

posthtml-parser

68747470733a2f2f62616467652e667572792e696f2f6a732f706f737468746d6c2d7061727365722e737667

68747470733a2f2f7472617669732d63692e6f72672f706f737468746d6c2f706f737468746d6c2d7061727365722e7376673f6272616e63683d6d6173746572

68747470733a2f2f636f766572616c6c732e696f2f7265706f732f706f737468746d6c2f706f737468746d6c2d7061727365722f62616467652e7376673f6272616e63683d6d6173746572

Parse HTML/XML to PostHTML AST.

More about PostHTML

Install

NPM install

$ npm install posthtml-parser

Usage

Input HTML

Cat

import parser from 'posthtml-parser'

import fs from 'fs'

const html = fs.readFileSync('path/to/input.html', 'utf-8')

console.log(parser(html)) // Logs a PostHTML AST

input HTML

Cat

Result PostHTMLTree

[{

tag: 'a',

attrs: {

class: 'animals',

href: '#'

},

content: [

'\n ',

{

tag: 'span',

attrs: {

class: 'animals__cat',

style: 'background: url(cat.png)'

},

content: ['Cat']

},

'\n'

]

}]

PostHTML AST Format

Any parser being used with PostHTML should return a standard PostHTML Abstract Syntax Tree (AST). Fortunately, this is a very easy format to produce and understand. The AST is an array that can contain strings and objects. Any strings represent plain text content to be written to the output. Any objects represent HTML tags.

Tag objects generally look something like this:

{

tag: 'div',

attrs: {

class: 'foo'

},

content: ['hello world!']

}

Tag objects can contain three keys. The tag key takes the name of the tag as the value. This can include custom tags. The optional attrs key takes an object with key/value pairs representing the attributes of the html tag. A boolean attribute has an empty string as its value. Finally, the optional content key takes an array as its value, which is a PostHTML AST. In this manner, the AST is a tree that should be walked recursively.

Options

directives

Type: Array

Default: [{name: '!doctype', start: ''}]

Description: Adds processing of custom directives. Note: The property name in custom directives can be String or RegExp type

xmlMode

Type: Boolean

Default: false

Description: Indicates whether special tags (

decodeEntities

Type: Boolean

Default: false

Description: If set to true, entities within the document will be decoded.

lowerCaseTags

Type: Boolean

Default: false

Description: If set to true, all tags will be lowercased. If xmlMode is disabled.

lowerCaseAttributeNames

Type: Boolean

Default: false

Description: If set to true, all attribute names will be lowercased. This has noticeable impact on speed.

recognizeCDATA

Type: Boolean

Default: false

Description: If set to true, CDATA sections will be recognized as text even if the xmlMode option is not enabled. NOTE: If xmlMode is set to true then CDATA sections will always be recognized as text.

recognizeSelfClosing

Type: Boolean

Default: false

Description: If set to true, self-closing tags will trigger the onclosetag event even if xmlMode is not set to true. NOTE: If xmlMode is set to true then self-closing tags will always be recognized.

sourceLocations

Type: Boolean

Default: false

Description: If set to true, AST nodes will have a location property containing the start and end line and column position of the node.

License

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值