通过Xpath解析百度贴吧url源代码和爬取标题

蓬松的头发

于 2020-05-08 12:00:41 发布

阅读量609

点赞数

分类专栏：爬虫文章标签： xpath python

本文链接：https://blog.csdn.net/yanyuan_985/article/details/105992455

版权

爬虫专栏收录该内容

1 篇文章

订阅专栏

通过Xpath解析url源代码和爬取标题

在这里插入代码片
#拿到页面源代码
import requests
from lxml import etree
response = requests.get("https://tieba.baidu.com/f?kw=%E5%A4%A7%E6%95%B0%E6%8D%AE&ie=utf-8&pn=0")
htmlStr = response.content.decode("utf-8")
#print(htmlStr)
#拿到指定xpath
content = etree.HTML(htmlStr)
resposne = content.xpath("//li/div/div/div/div/a/text()")
print(resposne)