python爬虫（8）爬取tuchong网站美图

最新推荐文章于 2021-11-15 19:12:27 发布

枫奇

最新推荐文章于 2021-11-15 19:12:27 发布

阅读量8.7k

点赞数 1

分类专栏： python爬虫 python爬虫专题文章标签： python 爬虫图片 tuchong beautifulsoup

本文链接：https://blog.csdn.net/qiqiyingse/article/details/62231679

版权

本文介绍如何使用Python爬虫技术，以图虫网站的标签页中的题材类型为例，详细讲解爬取高清美图的过程，适用于私人收藏或制作壁纸。

摘要由CSDN通过智能技术生成

python爬虫——爬取tuchong网站美图

图虫网站的图片质量非常搞，、私人珍藏也好，做壁纸也好，都是非常不错的选择图虫主页传送门

本文从这个网站的标签页中的题材类型为例来进行爬取

根据本程序，基本上可以爬取这个网站所有的图片

#!/usr/bin/python
#coding:utf-8

import urllib2,time,uuid,urllib,os,sys,re
from bs4 import BeautifulSoup
reload(sys)
sys.setdefaultencoding('utf-8')

#获得网页内容
def getHtml(url):
	try:
		print url
		html = urllib2.urlopen(url).read()#.decode('utf-8')#解码为utf-8
	except:
		return
	return html

#获取主页下子页面地址	url,以及作者名字
def getUrls(html):
	if not html:
		print 'nothing can be found'
		return
	print 'start find url'
	mylist=[]
	soup=BeautifulSoup(html,'lxml')
	try:
		items=soup.find_all("div",{"class":"post-collage"})
		print len(items)
	
		for item in items:
			alist={}

			if item.find('a