编辑:抱歉,我使用lxml,但它对Scrapy自己的选择器也一样。
对于您提供的特定HTML,这将起作用:
>>> s = """ label1
... value1
... label2
... value2
... """
>>>
>>> import lxml.html
>>> lxml.html.fromstring(s)
>>> soup = lxml.html.fromstring(s)
>>> soup.xpath("//text()")
[' label1 ', '\nvalue1 ', ' label2 ', '\nvalue2 ']
>>> res = soup.xpath("//text()")
>>> for i in xrange(0, len(res), 2):
... print res[i:i+2]
...
[' label1 ', '\nvalue1 ']
[' label2 ', '\nvalue2 ']
>>>
编辑2:
>>> bs = etree.xpath("//text()[preceding-sibling::b/text()]")
>>> for b in bs:
... if b.getparent().tag == "b":
... print [b.getparent().text, b]
...
[' label1 ', '\nvalue1 ']
[' label2 ', '\nvalue2 ']
[' label3 ', '\nvalue3 ']
另外,对于它的价值,如果你循环选择的元素,你想在for循环中的xpath中执行“./foo”,而不是“/ foo”。