爬新闻，记录图片、文字位置

最新推荐文章于 2021-12-10 15:30:52 发布

努力才能被爱慕

最新推荐文章于 2021-12-10 15:30:52 发布

阅读量291

点赞数 1

文章标签：爬虫 Python

本文链接：https://blog.csdn.net/qq_39457091/article/details/86736001

版权

模板代码：

        
import lxml.etree

html = lxml.etree.HTML(resp.text)    # 网页
nodes = html.xpath("//div[@class='text']//*")    # 匹配新闻内容下的所有节点
for i in nodes:

    # 文字
    text = i.text    
    if text:
        print(i.text)

    # 图片
    img_url = i.get('src')
    if img_url:
        print(i.get('src'))

    # 文字（尾部）
    tail = i.tail
    if tail:
        print(i.tail)

Demo:

以搜狐新闻为例

import requests
import lxml.etree

resp = requests.get('http://m.sohu.com/a/292745933_116897/?pvid=000115_3w_a&_f=index_chan08news_1')

html = lxml.etree.HTML(resp.text)    # 网页
nodes = html.xpath("//div[@class='display-content']//*")    # 匹配新闻内容下的所有节点
for i in nodes:

    # 文字
    text = i.text
    if text:
        print(i.text)

    # 图片
    img_url = i.get('src')
    if img_url:
        print(i.get('src'))

    # 文字（尾部）
    tail = i.tail
    if tail:
        print(i.tail)

有用的帮我点下这个链接，外链：

http://www.yu-yuechina.com/