python 24 lxml解析+多线程
一、lxml解析
1.1 xpath基本概念
1 ). 树:整个html(xml)代码结构就是一个树结构
2)节点:树结构中的每一个元素(标签)就是一个节点
3 ) 根节点(根元素):html或者xml最外面的那个 标签(元素)
4)节点内容:标签内容
5)节点属性:标签属性
1.2 xml数据格式
-
xml和json一样,是一种通用的数据格式(绝大部分的编程语言都支持的数据格式)
-
xml是通过标签(元素)的标签内容和标签属性来保存数据的
示例:保存一个超市信息 { "name":"永辉超市", "address":"肖家河大厦","staff":[ { "name":"小明","id":"s001","position":"收银员","salary":4000}, { "name":"小亮","id":"s002","position":"清洁工","salary":3000}, { "name":"小红","id":"s003","position":"检查员","salary":4000}, { "name":"小蓝","id":"s004","position":"销售员","salary":6000} ], "goodlist":[ { "name":"泡面","price":3.5,"count":120,"discount":0.9}, { "name":"矿泉水","price":1,"count":500,"discount":0.98}, { "name":"火腿肠","price":2,"count":300,"discount":0.8}, { "name":"桃李面包","price":3,"count":180,"discount":0.95} ] }
<supermarket name="永辉超市" address="肖家河大厦"> <staffs> <staff id="s001" class="c1"> <name>小明</name> <position>收营员</position> <salary>4000</salary> </staff> <staff id="s002" class="c2"> <name>小花</name> <position>促销员</position> <salary>3500</salary> </staff> <staff id="s003" class="c1"> <name>