Python 边做边学 8.1 工具类--HTTP工具

最新推荐文章于 2023-11-29 21:42:56 发布

lufaxinT

最新推荐文章于 2023-11-29 21:42:56 发布

阅读量1k

点赞数

分类专栏： python python爬虫 Python 边做边学文章标签： python python爬虫数据采集

本文链接：https://blog.csdn.net/tomorrow13210073213/article/details/72629335

版权

python 同时被 3 个专栏收录

25 篇文章 0 订阅

订阅专栏

python爬虫

25 篇文章 0 订阅

订阅专栏

Python 边做边学

25 篇文章 4 订阅

订阅专栏

媳妇儿说：“你看过XXX吗？”（某动漫）
我回：“没有”
她鄙视：“那你的童年都干啥了？”
我想说，我的童年，很荒芜。

原文连接：http://blog.csdn.net/tomorrow13210073213/article/category/6931287

开始编码

任何项目里工具类，是少不了的；甚至可能是工具包；下面我们就开始构建自己的工具类；

百度一下：“python3 http”

找到一篇比较简单的：

http://www.cnblogs.com/miniren/p/5885393.html

基于这篇文章的介绍我们可以写一个简单方法：

def get_html(host, uri):
    conn = http.client.HTTPConnection(host)
    conn.request("GET", uri)
    r1 = conn.getresponse()
    str = r1.read()
    return str

给定host和uri，返回页面内容；测试一下，访问百度，成功；比较简单，不再帖测试代码；

python3 通过http代理获取页面

上面代码虽然实现了页面访问，但并不能完全满足我们的需求；我们需要通过HTTP代理来访问目标网站，所以继续百度，找到一个比较靠谱的：

http://mayulin.blog.51cto.com/1628315/543559
有下面的内容，足够摸到门道了：

proxy='http://%s:%s@%s' %(user,password,proxyserver)
proxy_handler=urllib2.ProxyHandler({'http':proxy})
#创建opener
opener=urllib2.build_opener(proxy_handler,urllib2.HTTPHandler)
try:
#使用创建的opener访问URL
    response=opener.open(url,timeout=3)
#输出页面 status code
    print response.code
#输出页面内容
    print response.read().decode('gb2312')
#异常处理
except urllib2.HTTPError,e:
    print 'The server couldn\'t fulfill the request.'
    print 'Error code:',e.code
except urllib2.URLError,e:
    print 'We failed to open the URL:%s' %(url)
    print 'Reason:',e.reason

proxyserver='proxy.test.com:'

proxy='http://%s:%s@%s' %(user,password,proxyserver)

这里定义了一个代理字符串，大体结构是：“用户名:密码@代理ip:代理端口”

proxy_handler=urllib2.ProxyHandler({'http':proxy})

通过代理创建了一个“handler”；

opener=urllib2.build_opener(proxy_handler,urllib2.HTTPHandler)

通过上面的“handler”创建了一个“opener”；

response=opener.open(url,timeout=3)

通过“opener”打开了一个“url”；

print response.read().decode('gb2312')

对请求到的内容进行了打印（这里python版本应该是2.x）；

虽然python版本不一样，但大体流程应该就是这样吧~~

实际效果

具体过程不再介绍，看最终结果：

import urllib.request


class HtmlUtil():
    __def_ua = "***"

    __proxy = None
    __user_name = None
    __password = None

    def __init__(self, proxy=None, user_name=None, password=None):
        self.__proxy = proxy
        self.__user_name = user_name
        self.__password = password

    def get(self, url, ua=None):
        if url is None or len(url) == 0:
            raise Exception("url不可为空")
        if ua is None or len(ua) == 0:
            ua = self.__def_ua
        headers = {'User-Agent': ua, 'Connection': 'keep-alive', "Accept": "*/*", "Referer": "http://www.baidu.com"}
        handler = None
        proxy = self.__proxy
        if proxy is not None:
            if self.__user_name is not None and self.__password is not None:
                proxy = self.__user_name + ":" + self.__password + "@" + proxy
            handler = urllib.request.ProxyHandler({'http': 'http://' + proxy + '/'})
        else:
            handler = urllib.request.BaseHandler()
        opener = urllib.request.build_opener(handler)
        get_request = urllib.request.Request(url, None, headers=headers, method="GET")
        get_response = opener.open(get_request)
        html_str = get_response.read().decode()
        return html_str

    def post(self, url, data, ua=None):
        if url is None or len(url) == 0:
            raise Exception("url不可为空")
        if data is None:
            data = {}
        postdata = urllib.parse.urlencode(data).encode()
        if ua is None or len(ua) == 0:
            ua = self.__def_ua
        headers = {'User-Agent': ua, 'Connection': 'keep-alive', "Accept": "*/*", "Referer": "http://www.baidu.com"}
        handler = None
        proxy = self.__proxy
        if proxy is not None:
            if self.__user_name is not None and self.__password is not None:
                proxy = self.__user_name + ":" + self.__password + "@" + proxy
            handler = urllib.request.ProxyHandler({'http': 'http://' + proxy + '/'})
        else:
            handler = urllib.request.BaseHandler()
        opener = urllib.request.build_opener(handler)
        get_request = urllib.request.Request(url, postdata, headers=headers, method="POST")
        get_response = opener.open(get_request)
        html_str = get_response.read().decode()
        return html_str

支持使用代理调用，也可以不使用代理调用；

使用代理调用方式如下：

html_util = HtmlUtil("proxy", "user", "password")
res_str = html_util .post(url, data)

不使用代理：

html_util = HtmlUtil()
res_str = html_util .post(url, data)

有一些重复代码，但能用。

以上内容仅供练习，学习使用；

lufaxinT

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Python 边做边学 8.1 工具类--HTTP工具

python3 通过http代理获取页面，GET/POST
复制链接

扫一扫

专栏目录