爬虫遇到HTTP Error 403的问题

最新推荐文章于 2024-06-25 08:34:47 发布

weixin_30651273

最新推荐文章于 2024-06-25 08:34:47 发布

阅读量1.1k

点赞数

文章标签： r语言

原文链接：http://www.cnblogs.com/rener0424/p/10970096.html

版权

# coding=gbk


from bs4 import BeautifulSoup
import requests
import urllib
x = 1
y = 1

def crawl(url):
    res = requests.get(url)
    soup = BeautifulSoup(res.text, 'html.parser')
    global y
    with open(f'F:/pachong/xnt/{y}.txt','w',encoding="utf-8") as f:
        f.write(str(soup))
        y += 1
    yinhuns = soup.select('img')
    print(yinhuns)
    for yh in yinhuns:
        print(yh)
        link = yh.get('src')
        print(link)
        global x    
        urllib.request.urlretrieve(link, f'F:/pachong/xnt/{x}.jpg')
        print(f'正在下载第{x}张图片')
        x += 1
        
for i in range(1,5):
    url = "https://acg.fi/hentai/23643.htm/" + str(i)
    
    try:
        crawl(url)
    except ValueError as f:
        continue
    except Exception as e:
        print(e)

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

weixin_30651273

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
爬虫遇到HTTP Error 403的问题

# coding=gbkfrom bs4 import BeautifulSoupimport requestsimport urllibx = 1y = 1def crawl(url): res = requests.get(url) soup = BeautifulSoup(res.text, 'html.parser') global y w...
复制链接

扫一扫