简单写一个爬虫

import requests
import re
import time
import os

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.162 Safari/537.36'
}

response = requests.get('https://www.vmgirls.com/12945.html',headers=headers)

html = response.text

print(response.request.headers)
dir_name = re.findall('<h1 class="post-title h3">(.*?)</h1>',html)[-1]
if not os.path.exists(dir_name):
    os.mkdir(dir_name)
urls = re.findall('a href="(.*?)" alt=".*?" title=".*?">',html)
print(urls)

for url in urls:
    time.sleep(1)
    file_name = url.split('/')[-1]
    response = requests.get(url, headers = headers)
    with open(dir_name + '/' + file_name,'wb') as f:
        f.write(response.content)

展开阅读全文

没有更多推荐了,返回首页

©️2019 CSDN 皮肤主题: 大白 设计师: CSDN官方博客
应支付0元
点击重新获取
扫码支付

支付成功即可阅读