开始使用NCrawler

最新推荐文章于 2024-06-11 09:36:25 发布

BabyKylin

最新推荐文章于 2024-06-11 09:36:25 发布

阅读量3k

点赞数 1

分类专栏： c# 文章标签： NCrawler 爬虫 c#

本文链接：https://blog.csdn.net/u011412226/article/details/50894301

版权

c# 专栏收录该内容

14 篇文章

订阅专栏

因为项目需要，需要做一个爬虫，就选择了NCrawler。NCrawler是一个.NET爬虫项目，项目地址。首先从NuGet搜索NCrawler，笔者这里使用了NCrawler和NCrawler.HtmlProcessor两个程序包。

之后引入：

using NCrawler;
using NCrawler.Interfaces;
using NCrawler.HtmlProcessor;

然后

然后然后就可以写代码了。注意，有的网站进行了设置，是爬取不到的，比如百度等，可能需要修改HTTP头。具体怎么修改笔者还没有研究到。这只是个怎样开始使用的Demo，仅供参考。

class Program
{
    static void Main(string[] args)
    {
        Uri uri = new Uri("http://www.csdn.net/");
        Crawler c = new Crawler(uri,new HtmlDocumentProcessor(),new DumperStep());
        c.MaximumThreadCount = 300;//线程数量
        c.MaximumCrawlDepth = 2;//爬行深度
        c.Crawl();//开始爬行
    }
}
public class DumperStep : IPipelineStep
{
    public void Process(Crawler crawler, PropertyBag propertyBag)
    {
        Console.WriteLine(propertyBag.Step.Uri);
    }
}