python 爬虫demo

最新推荐文章于 2022-11-03 17:49:53 发布

chenqiangdage

最新推荐文章于 2022-11-03 17:49:53 发布

阅读量903

点赞数

分类专栏： python 文章标签：爬虫 python utf-8 html 图片

本文链接：https://blog.csdn.net/chenqiangdage/article/details/51168231

版权

这篇博客介绍了使用Python 3.4编写的一个简单爬虫示例，专注于抓取百度图片首页的图片。文章提到了可能遇到的UnicodeDecodeError问题，并给出了设置Python默认编码为utf-8的解决方法，包括创建bat文件来避免该错误。项目源代码托管在CSDN的git仓库中。

摘要由CSDN通过智能技术生成

python 3.4 所写爬虫

仅仅是个demo，已百度图片首页图片为例。能跑出图片上的图片；

使用 eclipse pydev 编写：

from SpiderSimple.HtmLHelper import *
import imp
import sys
imp.reload(sys)  
#sys.setdefaultencoding('utf-8')   


html = getHtml('http://image.baidu.com/')
try:
    getImage(html)
    exit()
except Exception as e:
    print(e)

HtmlHelper.py文件

上面的 SpiderSimple是自定义的包名

from urllib.request  import urlopen,urlretrieve
#正则库
import re
#打开网页
def getHtml(url):
    page = urlopen(url)                
    html = page.read()
    return html
#用正则爬里面的图片地址    
def getImage(Html):
    try:
        
        #reg = r'src="(.+?\.jpg)" class'
        #image = re.compile(reg)   
        image =  re.compile(r'<img[^>]*src[=\"\']+([^\"\']*)[\

最低0.47元/天解锁文章

chenqiangdage

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python 爬虫demo

python 3.4 所写爬虫仅仅是个demo，已百度图片首页图片为例。能跑出图片上的图片；使用 eclipse pydev 编写：from SpiderSimple.HtmLHelper import *import impimport sysimp.reload(sys) #sys.setdefaultencoding('utf-8') html = g
复制链接

扫一扫

专栏目录