白嫖代理去哔哩哔哩看冰冰!!!确定不进来康康?

  如何爬取和验证你想要的代理,代码中都有详细的解释。主要爬取了快代理和泥马代理这两个免费的代理网页中的ip和port。然后将这些ip和port写入到一个host文件之中。最后使用httpbin和bilibili的api对这些ip和port进行了验证。

1.爬取代理ip和port

1.1 快代理免费代理地址

https://www.kuaidaili.com/free/inha/

import requests
from bs4 import BeautifulSoup
from lxml import etree
import time


# 爬取快代理上的免费代理并保存
base_url = "https://www.kuaidaili.com/free/inha/"
fp = open('./host/kuaidaili_host.txt','w')

for i in range(1,6):
    url = base_url + str(i)
    response = requests.get(url)
    text = response.text
    # print(text)
    html = etree.HTML(text)
    ips = html.xpath('//table//tr/td[@data-title="IP"]/text()')
    ports = html.xpath('//table//tr/td[@data-title="PORT"]/text()')
    URL_headers = html.xpath('//table//tr/td[@data-title="类型"]/text()')

    for j in range(len(ips)):
        fp.write(ips[j])
        fp.write('\t')
        fp.write(ports[j])
        fp.write('\t')
        fp.write(URL_headers[j])
        fp.write('\n')
    print("Finish scratch page {} : {}".format(i,url))
    time.sleep(1)
    
fp.close()
Finish scratch page 1 : https://www.kuaidaili.com/free/inha/1
Finish scratch page 2 : https://www.kuaidaili.com/free/inha/2
Finish scratch page 3 : https://www.kuaidaili.com/free/inha/3
Finish scratch page 4 : https://www.kuaidaili.com/free/inha/4
Finish scratch page 5 : https://www.kuaidaili.com/free/inha/5

1.2 泥马代理

http://www.nimadaili.com/https/

# 爬取泥马ip的免费ip和port
base_url_https = "http://www.nimadaili.com/https/"
fp = open('./host/nimadaili_host_https.txt','w')

for i in range(1,6):
    url = base_url_https+ str(i)
    response = requests.get(url)
    text = response.text
    html = etree.HTML(text)
    ips = html.xpath('//table//tr//td[1]/text()')
    
    for j in range(len(ips)):
        ip = ips[j].split(':')[0]
        port = ips[j].split(':')[1]
        fp.write(ip)
        fp.write('\t')
        fp.write(port)
        fp.write('\t')
        fp.write('https')
        fp.write('\n')
    print("Finish scratch page {} : {}".format(i,url))
    time.sleep(3)
    
fp.close()
Finish scratch page 1 : http://www.nimadaili.com/https/1
Finish scratch page 2 : http://www.nimadaili.com/https/2
Finish scratch page 3 : http://www.nimadaili.com/https/3
Finish scratch page 4 : http://www.nimadaili.com/https/4
Finish scratch page 5 : http://www.nimadaili.com/https/5

2. 验证ip和port

2.1 使用httpbin网站验证ip和port是否可用

import requests


def get_proxys(host_file):
    """使用httpbin网站验证ip和port"""
    url_https = "https://httpbin.org/ip" # https头
    url_http =  "http://httpbin.org/ip" # http头

    # 读取保存的host文件
    fp = open(host_file,'r')
    ips = fp.readlines()
    proxys,useful_proxys,proxys_type = [],[],[]
    for ip in ips:
        temp_ip = ip.strip('\n').split('\t')
        # 我想提取有用的https头的ip和port
        proxy = temp_ip[2].lower()+'://'+temp_ip[0]+":"+temp_ip[1]
        proxies = {temp_ip[2].lower():proxy}
        proxys.append(proxies)
        proxys_type.append(temp_ip[2].lower())
    
    # 使用ip和port伪装的proxy访问网页,验ip和port
    for i,pro in enumerate(proxys):
        try:
            s = requests.session()
            s.keep_alive = False
            s.proxies = pro
            if proxys_type[i] == 'http':
                response = s.get(url_http,timeout=3)
            elif proxys_type[i] == 'https':
                response = s.get(url_https,timeout=5)
            else: 
                response = 'wrong'
            time.sleep(0.01)
            text = response.text
            print('The use of ip and port :\n{}'.format(pro))
            print('The validation ip and port :\n{}'.format(text))
            print("*"*30)
            useful_proxys.append(proxys[i])
        except Exception as e:
            pass
#             print(e)
#             print('*'*30)
    # 去除一些重复的可用的ip [https://blog.csdn.net/qq_43940950/article/details/117772034]
    return [dict(t) for t in set([tuple(d.items()) for d in useful_proxys])]
proxys = get_proxys("./host/nimadaili_host_https.txt")
The use of ip and port :
{'https': 'https://103.103.3.6:8080'}
The validation ip and port :
{
  "origin": "103.103.3.6"
}
... 
The use of ip and port :
{'https': 'https://169.57.1.84:80'}
The validation ip and port :
{
  "origin": "169.57.1.84"
}

******************************
proxys
[{'https': 'https://169.57.1.84:80'},
 {'https': 'https://159.8.114.37:8123'},
 {'https': 'https://220.163.129.150:808'},
 {'https': 'https://169.57.1.85:8123'},
 {'https': 'https://178.63.17.151:3128'},
 {'https': 'https://169.57.1.84:8123'},
 {'https': 'https://119.81.71.27:8123'},
 {'https': 'https://59.124.224.180:4378'},
 {'https': 'https://59.124.224.180:3128'},
 {'https': 'https://161.202.226.194:80'},
 {'https': 'https://119.81.71.27:80'},
 {'https': 'https://159.8.114.37:80'},
 {'https': 'https://103.103.3.6:8080'},
 {'https': 'https://118.190.244.234:3128'}]

2.2 再筛选符合最终目的要求的ip和port(这里以bilibili为例)

# 使用b站的api再验证一下
proxys_bilibili = []
for i in range(len(proxys)):
    print(proxys[i])
    s = requests.session()
    s.keep_alive = False
    s.proxies = proxys[i]
#     url = "https://www.ipip.net/"
    url = 'https://api.bilibili.com/x/player/v2?cid=170868290&aid=967607091'
#     url = "http://httpbin.org/ip"
#     url = "https://api.bilibili.com/x/v2/reply/main?jsonp=jsonp&next=1&type=1&oid=932056090&mode=3&plat=1"
    
    try:
        response = requests.get(url,proxies=proxys[i],timeout=6)
        json_response = json.loads(s.get(url).text)
        print(json_response['data']['ip_info']['ip'])
        print('*'*30)
        proxys_bilibili.append(proxys[i])
    except:
        print("the ip address is error : {}".format(proxys[i]))
        print("*"*30)
{'https': 'https://169.57.1.84:80'}
the ip address is error : {'https': 'https://169.57.1.84:80'}
...
{'https': 'https://220.163.129.150:808'}
220.163.129.150
******************************
{'https': 'https://178.63.17.151:3128'}
178.63.17.151
******************************
{'https': 'https://103.103.3.6:8080'}
103.103.3.6
******************************
{'https': 'https://118.190.244.234:3128'}
118.190.244.234
******************************
proxys_bilibili
[{'https': 'https://220.163.129.150:808'},
 {'https': 'https://178.63.17.151:3128'},
 {'https': 'https://59.124.224.180:4378'},
 {'https': 'https://103.103.3.6:8080'},
 {'https': 'https://118.190.244.234:3128'}]

  到此你已经白嫖到了ju够数量的ip和port了!!!还愣着干嘛?拿着去bilibili继续白嫖啊^^,别忘了学习啊(记得一键三连)

  • 1
    点赞
  • 0
    收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
©️2022 CSDN 皮肤主题:博客之星2021 设计师:Hiro_C 返回首页
评论

打赏作者

留小星

你的鼓励将是我创作的最大动力

¥2 ¥4 ¥6 ¥10 ¥20
输入1-500的整数
余额支付 (余额:-- )
扫码支付
扫码支付:¥2
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值