Go语言学习笔记-A Tour of Go 练习笔记-Web Crawler

最新推荐文章于 2024-07-17 11:46:10 发布

sxmatch

最新推荐文章于 2024-07-17 11:46:10 发布

阅读量151

点赞数

分类专栏： Golang 文章标签： golang 学习爬虫

本文链接：https://blog.csdn.net/sxmatch/article/details/128316995

版权

Golang 专栏收录该内容

12 篇文章 0 订阅

订阅专栏

Web Crawler

练习题目:

In this exercise you'll use Go's concurrency features to parallelize a web crawler.

Modify the Crawl function to fetch URLs in parallel without fetching the same URL twice.

Hint: you can keep a cache of the URLs that have been fetched on a map, but maps alone are not safe for concurrent use!

练习程序:

cher map[string]*fakeResult

type fakeResult struct {
	body string
	urls []string
}

func (f fakeFetcher) Fetch(url string) (string, []string, error) {
	if res, ok := f[url]; ok {
		return res.body, res.urls, nil
	}
	return "", nil, fmt.Errorf("not found: %s", url)
}

var url_counter = SafeCounter{v: make(map[string]int)}

// fetcher is a populated fakeFetcher.
var fetcher = fakeFetcher{
	"https://golang.org/": &fakeResult{
		"The Go Programming Language",
		[]string{
			"https://golang.org/pkg/",
			"https://golang.org/cmd/",
		},
	},
	"https://golang.org/pkg/": &fakeResult{
		"Packages",
		[]string{
			"https://golang.org/",
			"https://golang.org/cmd/",
			"https://golang.org/pkg/fmt/",
			"https://golang.org/pkg/os/",
		},
	},
	"https://golang.org/pkg/fmt/": &fakeResult{
		"Package fmt",
		[]string{
			"https://golang.org/",
			"https://golang.org/pkg/",
		},
	},
	"https://golang.org/pkg/os/": &fakeResult{
		"Package os",
		[]string{
			"https://golang.org/",
			"https://golang.org/pkg/",
		},
	},
}

运行结果：

found: https://golang.org/ "The Go Programming Language"
not found: https://golang.org/cmd/
found: https://golang.org/pkg/ "Packages"
found: https://golang.org/pkg/os/ "Package os"
found: https://golang.org/pkg/fmt/ "Package fmt"

学习笔记：

本题目重点要掌握GO中goroutines之间同步锁的用法，对于所有线程都要访问的SafeCounter中的Map数据进行加锁保护，确保同一时间只有一个goroutine能访问并赋值。