抓取页面的保存本地未完成

最新推荐文章于 2024-06-05 13:50:02 发布

都护小弟弟

最新推荐文章于 2024-06-05 13:50:02 发布

阅读量109

点赞数

分类专栏： python 文章标签： css js python

本文链接：https://blog.csdn.net/Hoo_ligan/article/details/107854548

版权

python 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

# 
import requests
from bs4 import BeautifulSoup
from urllib import request
from lxml import etree
import time
import os
import re

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36'
}

url = input('请输入抓取网站: ')
cent = requests.get(url).content.decode('utf8')
htm = BeautifulSoup(cent, 'lxml')
name = htm.find_all('a')

def info(url):
    cent = requests.get(url).content.decode('utf8')
    html = BeautifulSoup(cent, 'lxml')
    if '=' in url:
        name = url.split(';')[-1].split('=')[1]
    else:
        name = 'index'
    with open('index_' + name + ".html", "w", encoding='utf-8') as file: 
        file.write(str(html))
        
for i in name:
    h = re.compile(r'href="/.*">').findall(str(i))
    if h:
        info(url + h[0].split('"')[1].split('/')[1])

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

都护小弟弟

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
抓取页面的保存本地未完成

# http://47.89.12.158:8080/js-camel-milk/import requestsfrom bs4 import BeautifulSoupfrom lxml import etreeimport timeurl = 'http://47.89.12.158:8080/js-camel-milk/'html = requests.get(url).contenthtm = BeautifulSoup(html, 'lxml')link = htm.select
复制链接

扫一扫