学习Go语言的第一个爬虫代码
1.第一个爬虫代码
package main
import (
"fmt"
"io/ioutil"
"net/http"
)
func main() {
//res 为结构体,储存了很多的信息
resp,err := http.Get("https://studygolang.com/pkgdoc")
if err!= nil{
fmt.Println(err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK{
fmt.Printf("Error status Code :%d",resp.StatusCode)
}
result,err := ioutil.ReadAll(resp.Body)
if err != nil{
panic(err)
}
fmt.Printf("%s",result)
}
2.对代码的简单分析
2.1Get函数
首先分析http包下的Get函数。
Get 向指定的 URL 发出 GET。 如果响应是以下重定向代码之一,则 Get 跟随重定向,最多 10 个重定向:
301 (Moved Permanently)
302 (Found)
303 (See Other)
307 (Temporary Redirect)
308 (Permanent Redirect)
如果重定向过多或存在 HTTP 协议错误,则会返回错误。 非 2xx 响应不会导致错误。 任何返回的错误都是 * url类型。 错误。 网址。 如果请求超时或被取消,错误值的 Timeout 方法将报告 true。
当 err 为 nil 时,resp 总是包含一个非 nil 的 resp.Body。 完成读取后,调用者应关闭 resp.Body。
Get 是 DefaultClient.Get 的包装器。
要使用自定义标头发出请求,请使用NewRequest和 DefaultClient.Do。
我个人的理解是:Get函数会对指定的URL发出Get请求,而我们使用resp对这个函数进行赋值后,resp的数据类型为 *http.Response。
因为使用Get函数完成读取后,应当关闭resp.Body。必须使用Colse函数进行关闭。于是有了这一行代码:defer resp.Body.Close()
Get的使用示例:
res, err := http.Get("http://www.google.com/robots.txt")
if err != nil {
log.Fatal(err)
}
robots, err := io.ReadAll(res.Body)
res.Body.Close()
if err != nil {
log.Fatal(err)
}
fmt.Printf("%s", robots)
2.2Response
使用Get函数对resp进行赋值后吗,resp的数据类型是*Response。
Response是一个结构体,中就包含很多的有关该网页信息。Response 表示来自 HTTP 请求的响应。一旦收到响应头,客户端和传输从服务器返回响应。 响应正文在读取Body字段时按需流式传输。
//Response结构体
type Response struct {
Status string
StatusCode int
Proto string
ProtoMajor int
ProtoMinor int
Header Header
Body io.ReadCloser
ContentLength int64
TransferEncoding []string
Close bool
Uncompressed bool
Trailer Header
Request *Request
TLS *tls.ConnectionState
}
2.3 StatusCode
用于判断网页访问是否成功,如果成功resp中的StatusCode值应该等于http.StatusOK(200)
2.4 最后一步
result,err := ioutil.ReadAll(resp.Body)表示使用ReadAll函数对resp中的响应体进行读取,即读取本次目标数据。
Body 代表响应体。响应正文在读取 Body 字段时按需流式传输。 如果网络连接失败或服务器终止响应,Body.Read
调用将返回错误。
读取完成后,只要对result进行输出操作就好。
3.读取结果
GOROOT=D:\CodeLife\root\GoRoot #gosetup
GOPATH=D:\ego;D:\CodeLife\Code;C:\Users\Lenovo\go #gosetup
D:\CodeLife\root\GoRoot\bin\go.exe build -o C:\Users\Lenovo\AppData\Local\Temp\___go_build_gocrawl.exe gocrawl #gosetup
C:\Users\Lenovo\AppData\Local\Temp\___go_build_gocrawl.exe #gosetup
<!DOCTYPE html>
<html lang="en">
<meta charset="utf-8">
<meta name="description" content="Go is an open source programming language that makes it easy to build simple, reliable, and efficient software.">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="theme-color" content="#00ADD8">
<title>The Go Programming Language</title>
<link href="https://fonts.googleapis.com/css?family=Work+Sans:600|Roboto:400,700" rel="stylesheet">
<link href="https://fonts.googleapis.com/css?family=Product+Sans&text=Supported%20by%20Google&display=swap" rel="stylesheet">
<link type="text/css" rel="stylesheet" href="/lib/godoc/style.css">
<script>window.initFuncs = [];</script>
<script>
var _gaq = _gaq || [];
_gaq.push(["_setAccount", "UA-11222381-2"]);
window.trackPageview = function() {
_gaq.push(["_trackPageview", location.pathname+location.hash]);
};
window.trackPageview();
window.trackEvent = function(category, action, opt_label, opt_value, opt_noninteraction) {
_gaq.push(["_trackEvent", category, action, opt_label, opt_value, opt_noninteraction]);
};
</script>
<script src="/lib/godoc/jquery.js" defer></script>
<script src="/lib/godoc/playground.js" defer></script>
<script>var goVersion = "\"go1.15.12\"";</script>
<script src="/lib/godoc/godocs.js" defer></script>
<body class="Site">
<header class="Header js-header">
<div class="Header-banner">
Black Lives Matter.
<a href="https://support.eji.org/give/153413/#!/donation/checkout"
target="_blank"
rel="noopener">Support the Equal Justice Initiative.</a>
</div>
<nav class="Header-nav ">
<a href="/"><img class="Header-logo" src="/lib/godoc/images/go-logo-blue.svg" alt="Go"></a>
<button class="Header-menuButton js-headerMenuButton" aria-label="Main menu" aria-expanded="false">
<div class="Header-menuButtonInner"></div>
</button>
<ul class="Header-menu">
<li class="Header-menuItem"><a href="/doc/">Documents</a></li>
<li class="Header-menuItem"><a href="/pkg/">Packages</a></li>
<li class="Header-menuItem"><a href="/project/">The Project</a></li>
<li class="Header-menuItem"><a href="/help/">Help</a></li>
</ul>
</nav>
</header>
<main id="page" class="Site-content">
<div class="container">
<div id="nav"></div>
<div class="HomeContainer">
<section class="HomeSection Hero">
<h1 class="Hero-header">
Go is an open source programming language that makes it easy to build
<strong>simple</strong>, <strong>reliable</strong>, and <strong>efficient</strong> software.
</h1>
<i class="Hero-gopher"></i>
<a href="/dl/" class="Button Button--big HeroDownloadButton">
<img class="HeroDownloadButton-image" src="/lib/godoc/images/cloud-download.svg" alt="">
Download Go
</a>
<p class="Hero-description">
Binary distributions available for<br>
Linux, macOS, Windows, and more.
</p>
</section>
<section class="HomeSection Playground">
<div class="Playground-headerContainer">
<h2 class="HomeSection-header">Try Go</h2>
</div>
<div class="Playground-inputContainer">
<textarea class="Playground-input js-playgroundCodeEl" spellcheck="false" aria-label="Try Go">// You can edit this code!
// Click here and start typing.
package main
import "fmt"
func main() {
fmt.Println("Hello, 世界")
}
</textarea>
</div>
<div class="Playground-outputContainer js-playgroundOutputEl">
<pre class="Playground-output"><noscript>Hello, 世界</noscript></pre>
</div>
<div class="Playground-controls">
<select class="Playground-selectExample js-playgroundToysEl" aria-label="Code examples">
<option value="hello.go">Hello, World!</option>
<option value="life.go">Conway's Game of Life</option>
<option value="fib.go">Fibonacci Closure</option>
<option value="peano.go">Peano Integers</option>
<option value="pi.go">Concurrent pi</option>
<option value="sieve.go">Concurrent Prime Sieve</option>
<option value="solitaire.go">Peg Solitaire Solver</option>
<option value="tree.go">Tree Comparison</option>
</select>
<div class="Playground-buttons">
<button class="Button Button--primary js-playgroundRunEl" title="Run this code [shift-enter]">Run</button>
<div class="Playground-secondaryButtons">
</div>
</div>
</div>
</section>
</div>
<script>
(function() {
'use strict';
window.initFuncs.push(function() {
if (window.playground) {
window.playground({
"codeEl": ".js-playgroundCodeEl",
"outputEl": ".js-playgroundOutputEl",
"runEl": ".js-playgroundRunEl",
"shareEl": ".js-playgroundShareEl",
"shareRedirect": "//play.golang.org/p/",
"toysEl": ".js-playgroundToysEl"
});
document.querySelector(".js-playgroundOutputEl pre").classList.add("Playground-output");
} else {
$(".Playground").hide();
}
});
})();
</script>
</div>
</main>
<footer>
<div class="Footer ">
<img class="Footer-gopher" src="/lib/godoc/images/footer-gopher.jpg" alt="The Go Gopher">
<ul class="Footer-links">
<li class="Footer-link"><a href="/doc/copyright.html">Copyright</a></li>
<li class="Footer-link"><a href="/doc/tos.html">Terms of Service</a></li>
<li class="Footer-link"><a href="http://www.google.com/intl/en/policies/privacy/">Privacy Policy</a></li>
<li class="Footer-link"><a href="http://golang.org/issues/new?title=x/website:" target="_blank" rel="noopener">Report a website issue</a></li>
</ul>
<a class="Footer-supportedBy" href="https://google.com">Supported by Google</a>
</div>
</footer>
<script>
(function() {
var ga = document.createElement("script"); ga.type = "text/javascript"; ga.async = true;
ga.src = ("https:" == document.location.protocol ? "https://ssl" : "http://www") + ".google-analytics.com/ga.js";
var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
进程 已完成,退出代码为 0
这一次我爬的是golang官网,有兴趣的小伙伴可以爬其他网站。个人觉得这串代码对于前端的学习还是很有帮助的。