羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞羞。。。
1、运行环境:win7,python2.7;须安装requsts库和lxml库(pip install requests,pip install lxml)。
2、运行代码
#-*-coding:utf8-*-
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
import requests
from lxml import etree
head = {'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.82 Safari/537.36'}
#原始的四条url
base_url = "http://www.immtb.com/xiaohua/list_6_"
urls = []
for i in range(1,5):
urls.append(base_url + str(i) + ".html")
print urls
#获取四条原始url内的图片超链接的值
hrefs = []
for url in urls:
print url
response = requests.get(url,headers=head)
print response
html = response.content
selector = etree.HTML(html)
hs = selector.xpath('//ul[@class="i_pic"]/li/a/@href')
print len(hs)
for h in hs:
hrefs.append(h)
print len(hrefs)
#获取每个图片的页面地址
f = open("a.txt","w")
for href in hrefs:
response = requests.get(href,headers=head)
print response
html = response.content
selector = etree.HTML(html)
hss = selector.xpath('//div[@class="page page_c"]/ul/li/a/@href')
print len(hss)
f.writelines(href+"\n")
i = href.rindex("/")
print i
for h in hss:
if "#" not in h:
f.writelines(href[:i+1] + h + "\n")
f.close()
#获取每个图片的地址
urls = []
f = open("a.txt","r")
for line in f.readlines():
urls.append(line[:-1])
f.close()
print urls
f2 = open("b.txt","a")
for url in urls:
response = requests.get(url,headers=head)
print response
html = response.content
selector = etree.HTML(html)
src = selector.xpath('//p[@id="showimg"]/a/img/@src')
print src
f2.writelines(src[0]+"\n")
print src[0]
f2.close()
#下载图片
urls = []
f = open("b.txt","r")
for line in f.readlines():
urls.append(line[:-1])
f.close()
i = 780
for url in urls:
response = requests.get(url,headers=head)
print response
f2 = open(str(i) + ".jpg","wb")
f2.write(response.content)
f2.close()
i += 1