python爬虫模块之HTML解析模块

最新推荐文章于 2024-07-25 12:15:00 发布

dianyin7770

最新推荐文章于 2024-07-25 12:15:00 发布

阅读量174

点赞数

文章标签：爬虫 json python

原文链接：http://www.cnblogs.com/c-x-a/p/9175124.html

版权

这个就比较简单了没有什么好强调的，如果返回的json 就是直接按照键值取，如果是网页就是用lxml模块的html进行xpath解析。

from lxml import html
import json
class GetNodeList():
    def __init__(self):
        self.getdivxpath="//div[@class='demo']"
    def use_xpath(self,source):
        if len(source):
            root=html.fromstring(source) #html转换成dom对象
            nodelist=root.xpath(self.getdivxpath)#对dom对象进行xpath解析
            if len(nodelist):
                return nodelist
            return None
            
    def use_json(self, source,keyname):
        if len(source):
            jsonstr=json.loads(source)
            value=jsonstr.get(keyname) #根据具体的键值修改
            if len(value):
                return value
            return None

转载于:https://www.cnblogs.com/c-x-a/p/9175124.html