AngleSharp示例

最新推荐文章于 2024-05-09 10:05:35 发布

XBMY

最新推荐文章于 2024-05-09 10:05:35 发布

阅读量355

点赞数

可以分享

本文链接：https://blog.csdn.net/cxb2011/article/details/108847473

版权

AngleSharp示例代码

解析定义良好的文档
简单文档操作
获得某些元素
获得单一元素
连接JavaScript评估
更复杂的JavaScript DOM交互
JavaScript和C#中的事件

这是一个每天使用AngleSharp的例子列表。

解析定义良好的文档

var source = @"
<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content=""initial-scale=1, minimum-scale=1, width=device-width"">
  <title>Error 404 (Not Found)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/errors/logo_sm_2.png) no-repeat}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/errors/logo_sm_2_hr.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/errors/logo_sm_2_hr.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/errors/logo_sm_2_hr.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:55px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>404.</b> <ins>That’s an error.</ins>
  <p>The requested URL <code>/error</code> was not found on this server.  <ins>That’s all we know.</ins>";

//Use the default configuration for AngleSharp
var config = Configuration.Default;

//Create a new context for evaluating webpages with the given config
var context = BrowsingContext.New(config);

//Just get the DOM representation
var document = await context.OpenAsync(req => req.Content(source));

//Serialize it back to the console
Console.WriteLine(document.DocumentElement.OuterHtml);

所以我们定义了一些源代码，调用OpenAsyncAN方法BrowsingContext举个例子。这个OpenAsync方法允许我们解析来自任何类型请求的文档，例如来自Web服务器的文档。回调样式称为“虚拟请求”，它不调用真正的请求，而是保留在代码中。

在这种情况下，我们使用提供的源代码来确定请求响应的内容。然后将响应的内容解析为HTML文档。之后，我们将DOM序列化回一个字符串。最后，我们在控制台中输出这个字符串。

简单文档操作

AngleSharp根据正式的HTML 5规范构造DOM。这也意味着生成的模型是完全交互式的，可以用于简单的操作。下面的示例创建一个文档，并通过插入带有某些文本的另一个段落元素来更改树结构。

static async Task FirstExample()
{
    //Use the default configuration for AngleSharp
    var config = Configuration.Default;

    //Create a new context for evaluating webpages with the given config
    var context = BrowsingContext.New(config);

    //Parse the document from the content of a response to a virtual request
    var document = await context.OpenAsync(req => req.Content("<h1>Some example source</h1><p>This is a paragraph element"));

    //Do something with document like the following
    Console.WriteLine("Serializing the (original) document:");
    Console.WriteLine(document.DocumentElement.OuterHtml);

    var p = document.CreateElement("p");
    p.TextContent = "This is another paragraph.";

    Console.WriteLine("Inserting another element in the body ...");
    document.Body.AppendChild(p);

    Console.WriteLine("Serializing the document again:");
    Console.WriteLine(document.DocumentElement.OuterHtml);
}

在这里，解析器将创建一个新的IHtmlDocument实例，然后查询该实例以找到一些匹配的节点。在上面的示例代码中，我们还创建了另一个IElement，也就是IHtmlParagraphElement。然后将此一个追加到Body节点。

获得某些元素

AngleSharp将所有DOM列表公开为IEnumerable喜欢IEnumerable为NodeList班级。这允许我们将LINQ与一些已经提供的DOM功能结合使用，比如QuerySelectorAll方法。

static async Task UsingLinq()
{
    //Create a new context for evaluating webpages with the default config
    var context = BrowsingContext.New(Configuration.Default);

    //Create a document from a virtual request / response pattern
    var document = await context.OpenAsync(req => req.Content("<ul><li>First item<li>Second item<li class='blue'>Third item!<li class='blue red'>Last item!</ul>"));

    //Do something with LINQ
    var blueListItemsLinq = document.All.Where(m => m.LocalName == "li" && m.ClassList.Contains("blue"));

    //Or directly with CSS selectors
    var blueListItemsCssSelector = document.QuerySelectorAll("li.blue");

    Console.WriteLine("Comparing both ways ...");

    Console.WriteLine();
    Console.WriteLine("LINQ:");

    foreach (var item in blueListItemsLinq)
    {
        Console.WriteLine(item.TextContent);
    }

    Console.WriteLine();
    Console.WriteLine("CSS:");

    foreach (var item in blueListItemsCssSelector)
    {
        Console.WriteLine(item.TextContent);
    }
}

因为All的性质IDocument返回所有IElement包含在文档中的节点，我们可以非常有效地使用LINQ。另一方面，QuerySelectorAll还返回(与All)IHtmlCollection对象。因此，这可以与LINQ以及过滤！此外，此列表已被过滤。

也可以获得与All带有选择符-特殊星号*选择器：

//Same as document.All
var blueListItemsLinq = document.QuerySelectorAll("*").Where(m => m.LocalName == "li" && m.ClassList.Contains("blue"));

这完全一样吗？实际上没有-All返回所谓的活着DOM列表，也就是说，如果我们将对象保存在某个地方，我们将始终可以访问最新的DOM更改。

获得单一元素

另外，我们还有QuerySelector方法。这一个非常接近LINQ语句，这些语句使用FirstOrDefault()产生结果。使用QuerySelector方法。

让我们看看一些示例代码：

static async Task SingleElements()
{
    //Create a new context for evaluating webpages with the default config
    var context = BrowsingContext.New(Configuration.Default);

    //Create a new document
    var document = await context.OpenAsync(req => req.Content("<b><i>This is some <em> bold <u>and</u> italic </em> text!</i></b>"));

    var emphasize = document.QuerySelector("em");

    Console.WriteLine("Difference between several ways of getting text:");
    Console.WriteLine();
    Console.WriteLine("Only from C# / AngleSharp:");
    Console.WriteLine();
    Console.WriteLine(emphasize.ToHtml());   //<em> bold <u>and</u> italic </em>
    Console.WriteLine(emphasize.Text());// bold and italic

    Console.WriteLine();
    Console.WriteLine("From the DOM:");
    Console.WriteLine();
    Console.WriteLine(emphasize.InnerHtml);  // bold <u>and</u> italic
    Console.WriteLine(emphasize.OuterHtml);  //<em> bold <u>and</u> italic </em>
    Console.WriteLine(emphasize.TextContent);// bold and italic
}

输出命令试图演示从节点获取字符串的几种方法之间的差异。实际上，DOM属性OuterHtml使用ToHtml()生成HTML代码的版本。其他变体都是不同的。当Text()只是一个删除文本的助手(省略不需要的文本内容，如

扩展方法，如Text()可以在命名空间中找到。AngleSharp.Dom.

连接JavaScript评估

该项目还包含一个基于Jint(JavaScript解释器)的示例JavaScript引擎。

示例首先创建基于预定义的自定义版本。Configuration班级。这里我们只包括另一个引擎，位于AngleSharp.Scripting(命名空间和项目)。启用脚本也很重要。AngleSharp知道，拥有脚本引擎和使用它们是两件不同的事情。

这是完整的示例代码。

static async Task SimpleScriptingSample()
{
    //We require a custom configuration
    var config = Configuration.Default.WithJs();

    //Create a new context for evaluating webpages with the given config
    var context = BrowsingContext.New(config);

    //This is our sample source, we will set the title and write on the document
    var source = @"<!doctype html>
        <html>
        <head><title>Sample</title></head>
        <body>
        <script>
        document.title = 'Simple manipulation...';
        document.write('<span class=greeting>Hello World!</span>');
        </script>
        </body>";

    var document = await context.OpenAsync(req => req.Content(source));

    //Modified HTML will be output
    Console.WriteLine(document.DocumentElement.OuterHtml);
}

这段代码只是解析给定的HTML代码，遇到提供的JavaScript并执行它。JavaScript将在给定的时间点操作文档，更改文档的标题，并附加更多的HTML以进行解析。最后，我们将看到，打印的(序列化的)HTML不同于原来的HTML。

更复杂的JavaScript DOM交互

在AngleSharp中使用JavaScript没有问题。在当前状态下，我们还可以轻松地使用DOM操作，比如创建元素、追加或删除元素。

下面的示例代码执行DOM查询，创建新元素并删除现有元素。

static void ExtendedScriptingSample()
{
    //We require a custom configuration with JavaScript and CSS
    var config = Configuration.Default.WithJs().WithCss();

    //Create a new context for evaluating webpages with the given config
    var context = BrowsingContext.New(config);

    //This is our sample source, we will do some DOM manipulation
    var source = @"<!doctype html>
        <html>
        <head><title>Sample</title></head>
        <style>
        .bold {
        font-weight: bold;
        }
        .italic {
        font-style: italic;
        }
        span {
        font-size: 12pt;
        }
        div {
        background: #777;
        color: #f3f3f3;
        }
        </style>
        <body>
        <div id=content></div>
        <script>
        (function() {
        var doc = document;
        var content = doc.querySelector('#content');
        var span = doc.createElement('span');
        span.id = 'myspan';
        span.classList.add('bold', 'italic');
        span.textContent = 'Some sample text';
        content.appendChild(span);
        var script = doc.querySelector('script');
        script.parentNode.removeChild(script);
        })();
        </script>
        </body>";

    var document = await context.OpenAsync(req => req.Content(source));

    //HTML will have changed completely (e.g., no more script element)
    Console.WriteLine(document.DocumentElement.OuterHtml);
}

原则上，还可以添加其他JavaScript引擎。当然，与基于反射的自动版本相比，手动包装对象提供了更好的性能。不过，AngleSharp.Js库(可在NuGet上获得)展示了将现有JavaScript引擎绑定到AngleSharp的可能性和基础知识。

JavaScript和C#中的事件

以下示例的开头与前两个示例完全相同。我们创建一个自定义配置，其中包含JavaScriptEngine引擎。在启用脚本(在本例中是样式)之后，我们可以解析我们的文档。

此示例的示例文档由一个脚本组成，该脚本调用console.log方法。一次在添加侦听器之前，另一次在添加监听器之后。

当文档完全加载后，将调用侦听器。这是在执行提供的JavaScript之后发生的，因此我们应该在最后看到这个事件。我们还注册了另一个事件侦听器，它将在自定义事件发生后被调用。你好被派去了。

public static void EventScriptingExample()
{
    //We require a custom configuration
    var config = Configuration.Default.WithJs();

    //Create a new context for evaluating webpages with the given config
    var context = BrowsingContext.New(config);

    //This is our sample source, we will trigger the load event
    var source = @"<!doctype html>
        <html>
        <head><title>Event sample</title></head>
        <body>
        <script>
        console.log('Before setting the handler!');

        document.addEventListener('load', function() {
        console.log('Document loaded!');
        });

        document.addEventListener('hello', function() {
        console.log('hello world from JavaScript!');
        });

        console.log('After setting the handler!');
        </script>
        </body>";

    var document = await context.OpenAsync(req => req.Content(source));

    //HTML should be output in the end
    Console.WriteLine(document.DocumentElement.OuterHtml);

    //Register Hello event listener from C# (we also have one in JS)
    document.AddEventListener("hello", (s, ev) =>
    {
        Console.WriteLine("hello world from C#!");
    });

    var e = document.CreateEvent("event");
    e.Init("hello", false, false);
    document.Dispatch(e);
}

我们还在C#中为这个自定义事件注册了一个事件侦听器。在这里，我们有智慧和所有其他舒适的工具。在通过官方API启动事件之后，我们将识别来自两个注册事件侦听器的输出(来自JavaScript和C#)。

XBMY

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
AngleSharp示例

AngleSharp示例代码解析定义良好的文档简单文档操作获得某些元素获得单一元素连接JavaScript评估更复杂的JavaScript DOM交互JavaScript和C#中的事件这是一个每天使用AngleSharp的例子列表。解析定义良好的文档var source = @"<!DOCTYPE html><html lang=en> <meta charset=utf-8> <meta name=viewport content=""initi
复制链接

扫一扫