转载自https://www.cnblogs.com/dyfblog/p/6073069.html
xpath获取URL列表:
以博客园首页为例,如果我们需要的数据是文章
的 url 列表,最好使用 XPath,见图
#coding:utf8 from lxml import etree import requests url = 'http://www.cnblogs.com/' response = requests.get(url) response.encoding = 'utf8' html = response.text root = etree.HTML(html) node_list = root.xpath("//div[@class='post_item_body']/h3/a") for node in node_list: print node.attrib['href'] # 输出 ''' http://www.cnblogs.com/olivers/p/6073506.html http://www.cnblogs.com/-free/p/6073496.html ... ''' |