0x001-利用审查元素获取url然后保存到url.txt文件
控制台代码如下:
var tag=document.getElementsByClassName('r');
for (var i=0;i<tag.length;i++){
var a=tag[i].getElementsByTagName("a");
console.log(a[0].href)
}
0x002-利用python代码对采集到的url进行过滤
import re
pattern = re.compile("VM(.*):5")
pattern1 = re.compile("(\w+.*?//.*?)/")
urls = []
with open('url.txt','r')as f:
for url in f:
url = url.strip()
url2 = re.sub(pattern,"",url)
url3 = re.search(pattern1,url2).group(1)
urls.append(url3.strip())
with open('url.txt','w')as a:
result = "\n".join(urls)
a.write(result)
0x003-成果如下