python 爬虫的简单示例

最新推荐文章于 2024-06-05 17:45:18 发布

liyong635

最新推荐文章于 2024-06-05 17:45:18 发布

阅读量595

点赞数

分类专栏： python 文章标签： python 爬虫 Ubuntu Linux 图片

本文链接：https://blog.csdn.net/liyong635/article/details/49587729

版权

python 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

最近在学习python，最近将语法学习完毕之后，出于兴趣做了一个简单的爬虫，现在将代码粘贴如下:

#!/usr/bin/python
#conding=utf-8
import re
import urllib
import sys
import os

#获取参数，写入到目录所在的data目录下
times = sys.argv[1]
print times
def getHtml(url):
page = urllib.urlopen(url)
html = page.read()
return html

#验证当前url是否可以访问
def isable2visit(url):
statusCode =urllib.urlopen(url).getcode()
if (statusCode == 200):
return True
else:
return False

#创建目录
def createDir(dir):
if not os.path.exists(dir):
os.makedirs(dir)
print "Success to create file " + dir
return dir

#获取图片保存到本地
def getImg(html,x,times):
reg = r'src="(.+?\.jpg)" pic_ext'
imgre = re.compile(reg)
imglist = re.findall(imgre,html)
for imgurl in imglist:
local = dir + os.sep + str(x) + '.jpg'
urllib.urlretrieve(imgurl,local)
x+=1
return x

urls = raw_input("Enter the preFix of the url:")
if len(urls) == 0:
urls = "http://tieba.baidu.com/p/41254316"
print urls
x = 0
storeDir = "/home/liyong/python/spider/data/"+str(times)
dir = createDir(storeDir)

for i in range(100):
url = urls + str(i)
print "Done %.2f%%" % ((float(i)/100)*100)
if(isable2visit(url)):
x = html = getImg(getHtml(url),x,dir)
print "Done 100%"

爬取结果如下:

liyong635

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python 爬虫的简单示例

最近在学习python，最近将语法学习完毕之后，出于兴趣做了一个简单的爬虫，现在将代码粘贴如下:#!/usr/bin/python#conding=utf-8import reimport urllibimport sysimport os#获取参数，写入到目录所在的data目录下times = sys.argv[1]print timesdef get
复制链接

扫一扫