Go爬虫colly官方示例十一【max_depth】- 设置爬取页面的深度

转载请标明出处:
http://blog.csdn.net/qq_27818541/article/details/112297218
本文出自:【BigManing的博客】

示例介绍

  1. 使用colly.MaxDepth(1) 设置爬取页面的深度

示例代码

示例中设置深度为1:

package main

import (
	"fmt"
	"strings"

	"github.com/gocolly/colly/v2"
)

func main() {
	// Instantiate default collector
	c := colly.NewCollector(
		// 爬取的最大深度 ,1 :只抓取visit的page里的 url
		colly.MaxDepth(1),
	)

	// 抓取到所有的a标签(有href属性)
	c.OnHTML("a[href]", func(e *colly.HTMLElement) {
		link := e.Attr("href")
		if !strings.HasPrefix(link, "http") {
			return
		}
		fmt.Println(link)
		//  继续访问 url, 里面会检查depth
		e.Request.Visit(link)
	})

	// Start scraping on https://en.wikipedia.org
	c.Visit("https://bigmaning.com/")
}

输出

https://bigmaning.com/#blog
https://bigmaning.com/#blog
https://bigmaning.com/#blog
https://github.com/BigManing
https://bigmaning.com/rss.xml
https://bigmaning.com/archives/random
https://bigmaning.com/archives/random
https://bigmaning.com/archives/fan-xiang-dai-li-----kan-zhe-yi-pian-jiu-gou-le
https://bigmaning.com/archives/fan-xiang-dai-li-----kan-zhe-yi-pian-jiu-gou-le
https://creativecommons.org/licenses/by-nc-sa/4.0/
https://halo.run
https://github.com/halo-dev/halo-theme-vno

Process finished with exit code 0

增加深度效果

设置深度为2:那就是爬取 入口url里面的link,以及link里面的link

效果:


https://bigmaning.com/#blog
https://bigmaning.com/#blog
https://bigmaning.com/#blog
https://github.com/BigManing
https://github.com/
https://lab.github.com/
https://opensource.guide
https://github.com/events
https://github.community
https://education.github.com
https://stars.github.com
https://enterprise.github.com/contact
https://education.github.com
https://avatars1.githubusercontent.com/u/12949958?s=400&u=9f3c2d8f3400dd12cc2dcc82ed42bd3f9d08ad47&v=4
https://docs.github.com/en/articles/blocking-a-user-from-your-personal-account
https://docs.github.com/en/articles/reporting-abuse-or-spam
https://docs.github.com/articles/why-are-my-contributions-not-showing-up-on-my-profile
https://docs.github.com/categories/setting-up-and-managing-your-github-profile
https://github.com/site/terms
https://github.com/site/privacy
https://github.com/security
https://githubstatus.com/
https://docs.github.com
https://github.com
https://github.com/contact
https://github.com/pricing
https://docs.github.com
https://services.github.com
https://github.blog
https://github.com/about
https://bigmaning.com/rss.xml
https://bigmaning.com/archives/random
https://bigmaning.com/#blog
https://bigmaning.com/#blog
https://bigmaning.com/#blog
https://github.com/BigManing
https://bigmaning.com/rss.xml
https://bigmaning.com/archives/fan-xiang-dai-li-----kan-zhe-yi-pian-jiu-gou-le
https://bigmaning.com/archives/fan-xiang-dai-li-----kan-zhe-yi-pian-jiu-gou-le
https://creativecommons.org/licenses/by-nc-sa/4.0/
https://halo.run
https://github.com/halo-dev/halo-theme-vno
https://bigmaning.com/archives/random
https://bigmaning.com/archives/fan-xiang-dai-li-----kan-zhe-yi-pian-jiu-gou-le
https://bigmaning.com/#blog
https://bigmaning.com/#blog
https://bigmaning.com/#blog
https://github.com/BigManing
https://bigmaning.com/rss.xml
https://www.cloudflare.com/zh-cn/learning/serverless/glossary/what-is-edge-computing
https://www.cloudflare.com/zh-cn/learning/cdn/cdn-load-balance-reliability
https://www.cloudflare.com/learning/cdn/glossary/reverse-proxy/
https://bigmaning.com/archives/random
https://bigmaning.com/archives/random
https://creativecommons.org/licenses/by-nc-sa/4.0/
https://halo.run
https://github.com/halo-dev/halo-theme-vno
https://bigmaning.com/archives/fan-xiang-dai-li-----kan-zhe-yi-pian-jiu-gou-le
https://creativecommons.org/licenses/by-nc-sa/4.0/
https://halo.run
https://halo.run
https://halo.run/archives/download.html
https://github.com/halo-dev/halo
https://bbs.halo.run
https://halo.run/archives/install-quick-start.html
https://creativecommons.org/licenses/by-nc-sa/4.0/
https://github.com/halo-dev
https://bbs.halo.run
https://t.me/halo_dev
https://t.me/HaloBlog
https://github.com/halo-dev/halo-theme-vno
https://github.com/
https://lab.github.com/
https://opensource.guide
https://github.com/events
https://github.community
https://education.github.com
https://stars.github.com
https://enterprise.github.com/contact
https://education.github.com
https://docs.github.com/articles/which-remote-url-should-i-use
https://cli.github.com
https://desktop.github.com
https://desktop.github.com/
https://desktop.github.com/
https://developer.apple.com/xcode/
https://visualstudio.github.com/
https://github.com/halo-dev
https://github.com/onevcat
https://github.com/onevcat/vno-jekyll
https://camo.githubusercontent.com/cc8afd398da6860f5b05a5bad395a645dc652a15e19a3e52b231330027f57ff0/68747470733a2f2f692e6c6f6c692e6e65742f323031392f30352f32392f3563656437303230303330643536383236302e706e67
https://camo.githubusercontent.com/292589c63ac456eedc22db294929b32e9fac372069863472e3382b460b0ed218/68747470733a2f2f692e6c6f6c692e6e65742f323031392f30352f32392f3563656437303230626138353838373137392e706e67
https://onevcat.com/
https://github.com/halo-dev/halo-theme-vno/archive/master.zip
https://github.com/halo-dev/halo-theme-vno/archive/master.zip
https://github.com/site/terms
https://github.com/site/privacy
https://github.com/security
https://githubstatus.com/
https://docs.github.com
https://github.com
https://github.com/contact
https://github.com/pricing
https://docs.github.com
https://services.github.com
https://github.blog
https://github.com/about

Process finished with exit code 0

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值