HtmlParser示例及对比说明

delphi html parser

代码是改自原wr960204的 HtmlParser ,因为自己的需求需要对 html 进行修改操作,但无奈只支持读取操作,所以在此基础上做了修改并命名为HtmlParserEx.pas与之区别。

IHtmlElement和THtmlElement的改变:

1、Attributes属性增加Set方法

2、TagName属性增加Set方法

3、增加Parent属性

4、增加RemoveAttr方法

5、增加Remove方法

6、增加RemoveChild方法

7、增加Find方法,此为SimpleCSSSelector的一个另名

8、_GetHtml不再直接附加FOrignal属性值,而是使用GetSelfHtml重新对修改后的元素进行赋值操作,并更新FOrignal的值

9、增加Text属性

IHtmlElementList和THtmlElementList的改变:

1、增加RemoveAll方法

2、增加Remove方法

3、增加Each方法

4、增加Text属性

修改后的新功能的一些使用法

IHtmlElement
EL.Attributes[‘class’] := ‘xxxx’;

 EL.TagName = 'a';

 EL.Remove; // 移除自己

 EL.RemoveChild(El2);

 El.Find('a');

IHtmlElementList
// 移除选择的元素
LHtml.Find(‘a’).RemoveAll;

// 查找并遍沥
// LHtml.Find(‘a’).Each(
procedure(AIn

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
The HTML pieces are: CData Sections: CData Sections, found in XML, are used to escape blocks of text containing characters which would otherwise be recognized as markup. A CData section begins with <![CDATA[and ends with ]]>. Comments: The Comments' contents are returned readily stripped of the comment markers. A comment starts with <!– and ends with –>. Document Type Definitions: A Document Type Definition defines the syntax of markup constructs. It begins with <!DOCTYPE and ends with >. HTML Processing Instructions: HTML Processing Instructions are a mechanism to capture platform-specific idioms. They start with <? and end with >. HTML-Tags: HTML-Tags are readily parsed into Name, Attributes and Values. DIHtmlParser recognizes Start Tags, End Tags and Empty Element Tags. Example: <TagName Attribute=“Value” />. Scripts: DIHtmlParser returns the contents between the <SCRIPT> and </SCRIPT> tags as simple text. The surrounding HTML tags are reported separately. Styles: DIHtmlParser returns the contents between the <STYLE> and </STYLE> tags as simple text. The surrounding HTML tags are reported separately. Text: Text is everything which is not markup. If the NormalizeWhiteSpace option is enabled, DIHtmlParser reduces multiple white space to a single character. Preformatted text wrapped by <PRE>and </PRE> is never normalized. Titles: DIHtmlParser returns the contents between the <TITLE> and </TITLE> tags as simple text. Titles are not normal text because they are parsed differently. XML Processing Instructions: XML Processing Instructions are similar to the HTML Processing Instructions with a slightly different syntax: They begin with <?XML and end with ?>. The Non-HTML pieces are: Active Server Pages (ASP): Active Server Page markup is often used to enclose scripting macros. It begins with <% and runs up to %>. Custom-Tags: Custom Tags are similar to HTML-Tags and to what Delphi's Help calls Transparent Tags. For DIHtmlParser, a Custom-Tags' name must begin with a user-define start character just as #like in <#Name Attribute=“Value” />. PHP: PHP is a powerful and popular scripting language. Its markup begins with <?PHP and ends with ?>. Server Side Includes (SSI): SSI, an extension of the Apache Web Server, starts with <!–# and continues up to –>. It allows to insert include files and other data into HTML documents on the fly. Parsing Efficiency

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值