golang 爬网站的坑点 - 自动转发
最近做了一个爬某直播平台视频播放地址的一个小东西,发现一个golang 爬网站的坑点,容许我稍微地卖弄下
发现问题
播放地址是一个链接地址,是这样子的,https://www.xxxx.com/xerR23xd 主域名 后缀一个短连接,用浏览器访问下,然后用 Fiddler 分析了下,上图
请求以后,返回了302,并且请求头上有一个Loaction ,就是需要跳转到那个地址,
看了这个,比较简单啊,所以先用 Goland 模拟了下,
可以拿到链接,然后开始写代码,一个简单的Get请求,没啥技术含量,
但可能是见鬼了,请求以后就是拿不到location,然后把请求打印了下,再把,请求完后的链接地址也打印了下,
发现鬼了,请求完地址以后,打印出来的请求地址,直接就是location的地址了,也就是说,go 代码 直接给你完成了跳转。
百度一下,然后看了看 源码, http.client 里面有个参数
// CheckRedirect specifies the policy for handling redirects.
// If CheckRedirect is not nil, the client calls it before
// following an HTTP redirect. The arguments req and via are
// the upcoming request and the requests made already, oldest
// first. If CheckRedirect returns an error, the Client's Get
// method returns both the previous Response (with its Body
// closed) and CheckRedirect's error (wrapped in a url.Error)
// instead of issuing the Request req.
// As a special case, if CheckRedirect returns ErrUseLastResponse,
// then the most recent response is returned with its body
// unclosed, along with a nil error.
//
// If CheckRedirect is nil, the Client uses its default policy,
// which is to stop after 10 consecutive requests.
CheckRedirect func(req *Request, via []*Request) error
CheckRedirect 实绩控制302 跳转的,
所以我们增加如下配置
client := &http.Client{
Timeout: 5 * time.Second,
CheckRedirect: func(req *http.Request, via []*http.Request) error {
return http.ErrUseLastResponse
},
}
最终完整额get请求如下,就能够不自动跳转
func Get(actionUrl string, params map[string]string, headers map[string]string) (err error, resp string, header map[string]string) {
//new request
req, err := http.NewRequest("GET", actionUrl, nil)
if err != nil {
fmt.Println(err.Error())
return errors.New("new request is fail "), "", header
}
//add params
q := req.URL.Query()
if params != nil {
for key, val := range params {
q.Add(key, val)
}
req.URL.RawQuery = q.Encode()
}
//add headers
if headers != nil {
for key, val := range headers {
req.Header.Add(key, val)
}
}
client := &http.Client{
Timeout: 5 * time.Second,
CheckRedirect: func(req *http.Request, via []*http.Request) error {
return http.ErrUseLastResponse
},
}
fmt.Println("Go GET URL :", req.URL.String())
//发送请求
res, err := client.Do(req)
if err != nil {
return err, "", header
}
defer res.Body.Close() //一定要关闭res.Body
//读取body
resBody, err := ioutil.ReadAll(res.Body) //把 body 内容读入字符串 s
if err != nil {
return err, "", header
}
cHeader := res.Header.Clone()
header = make(map[string]string)
for k := range cHeader {
header[k] = cHeader.Get(k)
}
return nil, string(resBody), header
}