python爬虫(八、爬取图片社的小姐姐图片并下载)报错， import urllib.request,parser ，ImportError: No module named reques

最新推荐文章于 2022-01-09 21:53:21 发布

Steven全

最新推荐文章于 2022-01-09 21:53:21 发布

阅读量277

点赞数

分类专栏：爬虫 urlparser urllib2 文章标签： python html

本文链接：https://blog.csdn.net/weixin_42668334/article/details/116371513

版权

爬虫同时被 3 个专栏收录

5 篇文章 0 订阅

订阅专栏

urlparser

1 篇文章 0 订阅

订阅专栏

urllib2

1 篇文章 0 订阅

订阅专栏

目录：
一. 学习感言
二. 遇到的报错
三. 参考链接
四. 原文链接
五. 参考代码，python2.7
六. 原文代码，python3.x

一. 学习感言。学习爬虫N天，感觉天下的坑都不是一般的深。要习惯不断跳入坑里，再迅速弹出，才能习惯程序工作。学习爬虫N天，以为拿来的code可以直接使用，结果都是要debug一下才能使用。debug成功后回顾，才知道博主已经很不容易，像同事所说，只需要做几个小的修改就可以成功。可惜很多人都小的修改吓破了胆，连再尝试一下的勇气都没有了。
二。遇到的报错。言归正传，
在使用爬虫链接，https://blog.csdn.net/jziwjxjd/article/details/106864267，
python爬虫(八、爬取图片社的小姐姐图片并下载)的时候，陆续遇到3个报错，
import urllib.request,parser ，ImportError: No module named reques。
回头一看，这其中说的是2件事情，2个报错。
搜索，debug了一阵之后，发现是版本不同导致，我用的是python2.7, 作者用的python3.5.
把urllib.request 改为urllib2，把parser 改为urlparser 就可以pass 了。
python2.7, urllib2， urlparser
python3.5. urllib.request , url.parser
修改之后的code, python爬虫(八、爬取图片社的小姐姐图片并下载)
三参考链接
搜索，debug参考链接
#https://blog.csdn.net/qq_34802511/article/details/90754707
#https://blog.csdn.net/echojosedream/article/details/52938136
#https://blog.csdn.net/testcs_dn/article/details/55807101
#0503,0826.
#https://blog.csdn.net/u012720990/article/details/84952602 , urllib2 在 python3.x 中被改为urllib.request
https://blog.csdn.net/testcs_dn/article/details/55807101, urlparse模块在Python 3中重命名为urllib.parse

四。原文链接
pachong原文链接：
https://blog.csdn.net/jziwjxjd/article/details/106864267

五。参考代码，python2.7, pycharm2019.3
注意手工建目录 “D:\妹子图片”，否则IOError: [Errno 2] No such file or directory:

#!/usr/bin/env python
#-*- coding:utf-8 -*-
# author:StevenC     
# datetime:2021/5/2 23:34
# software: PyCharm
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
""" 
import re
import xlwt
import urllib2                             #urllib.request
import urlparse
from bs4 import BeautifulSoup

def ask(url):
    head = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"}
    re = urllib2.Request(url=url,headers=head)
    res = urllib2.urlopen(re)
    html = res.read().decode('utf-8')
    return html

def download_img(baseurl):
    html = ask(baseurl)
    soup = BeautifulSoup(html,'html.parser')
    k=0
    for item in soup.find_all('div',class_='list'):
        item = str(item)
        tupian = re.findall(findload,item)[0]
        url = "http:"+tupian
        name = "D:\\妹子图片\\"
        name = name + str(k) + ".jpg"
        k+=1
        img = urllib2.urlopen(url)
        f = open(name, 'wb')
        f.write(img.read())
        f.close

findload = re.compile('<img.*data-original="(.*?)"')
url = "http://699pic.com/tupian/xiaojiejie.html"
download_img(url)

六。原文代码（python3.x):
https://blog.csdn.net/jziwjxjd/article/details/106864267

import re
import xlwt
import urllib.request,parser
from bs4 import BeautifulSoup


def ask(url):
    head = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"}
    re = urllib.request.Request(url=url,headers=head)
    res = urllib.request.urlopen(re)
    html = res.read().decode('utf-8')
    return html

def download_img(baseurl):
    html = ask(baseurl)
    soup = BeautifulSoup(html,'html.parser')
    k=0
    for item in soup.find_all('div',class_='list'):
        item = str(item)
        tupian = re.findall(findload,item)[0]
        url = "http:"+tupian
        name = "D:\\妹子图片\\"
        name = name + str(k) + ".jpg"
        k+=1
        img = urllib.request.urlopen(url)
        f = open(name, 'wb')
        f.write(img.read())
        f.close


findload = re.compile('<img.*data-original="(.*?)"')
url = "http://699pic.com/tupian/xiaojiejie.html"
download_img(url)