一、goQuery库:
1.简述
据官方描述,goQuery实现了与jQuery相似的DOM操作功能。与jQuery不同的是,jQuery返回的是完整的DOM树,而goQuery返回的是DOM节点。goQuery底层由golang标准库net/html实现,解析器要求文档必须是 UTF-8 编码,使用者应按需转换文档编码。goQuery-readme
2.主要方法
2.1 Document: 返回要被操作的HTML文档
// Document represents an HTML document to be manipulated. Unlike jQuery, which
// is loaded as part of a DOM document, and thus acts upon its containing
// document, GoQuery doesn't know which HTML document to act upon. So it needs
// to be told, and that's what the Document class is for. It holds the root
// document node to manipulate, and can make selections on this document.
type Document struct {
*Selection
Url *url.URL
rootNode *html.Node
}
2.2 Selection: 符合指定条件的节点。
// Selection represents a collection of nodes matching some criteria. The
// initial Selection can be created by using Document.Find, and then
// manipulated using the jQuery-like chainable syntax and methods.
type Selection struct {
Nodes []*html.Node
document *Document
prevSel *Selection
}
2.3文档操作函数:
Eq()
Index()
Last()
Slice()
Get()
······
二、采集代理IP:
1.代理IP池
2.goQuery采集:
不多说,上代码:
import "github.com/PuerkitoBio/goquery"
导入goquery库
//采集代理返回的参数
type proxyResult struct {
Ip string `json:"ip"` //ip
Port int `json:port` //端口
Agreement string `json:agreement` //请求协议
Anonymous string `json:anonymous` //透明度
Region string `json:region` //地区
Speed string `json:"speed"` //响应速度
Source string `json:"source"` //来源(采集资源站)
Verification string `json:"verification"` //验证时间
}
//采集代理所需的参数
type proxyParamet struct {
ipIndex int `json:"ipIndex"` //ip下标
portIndex int `json:"portIndex"` //端口下标
agreementIndex int