简单粗暴的提取西刺IP和端口(附源码,爬虫小白,求勿喷)

import urllib
import re
import time
from urllib import request
from urllib import parse
import chardet
proxy = {"http": "123.207.30.131:80"}
proxy_support = request.ProxyHandler(proxy)
opener = request.build_opener(proxy_support)
request.install_opener(opener)
url = "http://www.xicidaili.com/nn"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"}
patternIP = re.compile(r'(?<=<td>)[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}')
patternPORT = re.compile(r'(?<=<td>)[\d]{2,5}(?=</td>)')
req = request.Request(url, headers=headers)
response = request.urlopen(req)
html = response.read()
findIP = re.findall(patternIP,str(html))
findPORT = re.findall(patternPORT,str(html))
charset = chardet.detect(html)['encoding']
IP_data =[]
for i in range(len(findIP)):
     findIP[i] = findIP[i] + ":" + findPORT[i]
     IP_data.extend(findIP)
print(charset)
print(IP_data)

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值