Python爬虫实战之抓取淘宝MM照片（四）

最新推荐文章于 2020-12-04 00:22:03 发布

PatrickZheng

最新推荐文章于 2020-12-04 00:22:03 发布

阅读量970

点赞数

分类专栏： Python ---- 爬虫文章标签： python 爬虫

本文链接：https://blog.csdn.net/PatrickZheng/article/details/73472983

版权

最后添加上标题切换、本地目录创建、日志记录等，完善了整体代码。

过程中遇到了一个自己坑了自己的地方：中文乱码问题（据说python3解决了）！

一定要注意：

python代码文件开头要加上： # -- coding: utf-8 --

带中文的字符串前一定要加上 u，比如 (u”hi,你好”)

还有一点，我试过不是必需的。参见 http://blog.csdn.net/isfirst/article/details/52787341

淘女郎页面有分几个类别：
这里写图片描述

定位方法之前已经讲过，对应获取的代码：

# 获取所有标题
selections = driver.find_elements_by_xpath('//div[@class="listing_tab"]/li')

# 测试代码
for selection in selections:
    print selection.text
    pages = int(driver.find_element_by_xpath('//div[@class="paginations"]/span[@class="skip-wrap"]/em').text)
    print 'Total pages: %d' % pages
    selection.click()
    time.sleep(2)

完整代码运行后，本地会创建对应的文件夹（里面就是下载的图片）：
这里写图片描述

完整代码如下：

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Date    : 2017-06-18 22:21:15
# @Author  : kk (zwk.patrick@foxmail.com)
# @Link    : blog.csdn.net/PatrickZheng
# @Version : $Id$


from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from bs4