愚公移山日记·15

最新推荐文章于 2020-04-28 13:02:31 发布

Python_G－Dragon

最新推荐文章于 2020-04-28 13:02:31 发布

阅读量121

点赞数 1

分类专栏：日记文章标签： python

本文链接：https://blog.csdn.net/Python_G_Dragon/article/details/105329557

版权

日记专栏收录该内容

42 篇文章 0 订阅

订阅专栏

愚公移山日记·15

学习进度

前天说到爬取一个网页的免费IP地址，由于昨天的学习进度很慢，仅仅弄检验IP的一点点，所以昨天没有发博客，今天本以为会把多线程解决掉，但是事与愿违，还是有点难度的。
检验IP的代码：

import requests
import re
from fake_useragent import UserAgent
def get_html(url):
    count = 0
    while True:
        headers = {'user-agent':UserAgent().random}
        response = requests.get(url,headers = headers)
        if response.status_code == 200:
            response.encoding = 'utf-8'
            return response
        else :
            count += 1
            if count == 3:
                return
            else:
                continue
 
def get_infos(response):
    num = re.findall(r'<tr>[\s\S]*?<td>(.*?)</td>',response.text)
    return num   
if __name__ == '__main__':
    urls = ['http://www.xiladaili.com/gaoni/{}/'.format(str(i)) for i in range(1,2)]
    for url in urls :
        response = get_html(url)
        num = get_infos(response)
        for i in num:
             try:
                 requests.get('http://wenshu.court.gov.cn/', proxies={"http":'http://' +  i})
             except:
                 print ('connect failed')
             else:
                 print ('success')

结果如下图
IP地址
这方法很慢只能一条一条的去逐个检验，很慢。
下面我来搞一下我今天的学习结果，虽然没有解决重要的问题，但是希望路过的看官，留下您宝贵的意见。

import requests
import re
from fake_useragent import UserAgent
import threading
def get_html(url):
    count = 0
    while True:
        headers = {'user-agent':UserAgent().random}
        response = requests.get(url,headers = headers)
        if response.status_code == 200:
            response.encoding = 'utf-8'
            return response
        else :
            count += 1
            if count == 3:
                return
            else:
                continue
def get_infos(response):
    num = re.findall(r'<tr>[\s\S]*?<td>(.*?)</td>',response.text)
    return num
def try_ip(ip):
    try:
        requests.get('http://wenshu.court.gov.cn/', proxies={"http":'http://' +  ip})
    except:
        print ('connect failed')
    else:
        print ('success')
             
if __name__ == '__main__':
    urls = ['http://www.xiladaili.com/gaoni/{}/'.format(str(i)) for i in range(1,2)]
    for url in urls :
        response = get_html(url)
        num = get_infos(response)
        for i in num:
            t = threading.Thread(target=try_ip,args=(i,))
            t.start()
            t.join()

这上面是我用threading模块写的代码但是问题不知出在那里，加上这个之后速度还是很慢，而且检验IP地址出现问题，并不准确。

希望路过的看官留下您宝贵的意见。

Python_G－Dragon

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
愚公移山日记·15

愚公移山日记·15学习进度前天说到爬取一个网页的免费IP地址，由于昨天的学习进度很慢，仅仅弄检验IP的一点点，所以昨天没有发博客，今天本以为会把多线程解决掉，但是事与愿违，还是有点难度的。检验IP的代码：import requestsimport refrom fake_useragent import UserAgentdef get_html(url): count = ...
复制链接

扫一扫