绝对路径:/step/step/...
相对路径://step/step/....
<div class="article-type article-type-yc" data-v-6fe2b6a7="">原创</div>
提取元数据:a/text() --------> 原创
提取属性数据:a/@class --------> article-type article-type-yc
提取目标数据:
//div[@class="article-type article-type-yc"]
-------->
<div class="article-type article-type-yc" data-v-6fe2b6a7="">原创</div>
应用过程:
1、requests
from lxml import etree
import requests
url = “ ”
html = requests.get(url).text
txt = etree.HTML()
2、scrapy
class CsdnSpider(scrapy.Spider):
name = 'csdn'
allowed_domains = ['']
start_urls = []
for i in range(1, 3):
url = 'http://......page{}'.format(i)
start_urls.append(url)
def parse(self, response):
res = response.xpath('//div[@class=" "]')