至少有两个关键的东西可以依赖:带有id="lclbox"的容器框和与每个结果项对应的带有class="intrlu"的元素。在
如何从每个结果项中提取地址和电话号码可能会有所不同,这里有一个选项(绝对不漂亮),涉及通过regex检查每个span元素文本来定位电话号码:import re
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.google.com/?gws_rd=ssl#q=plumbers%2Bhouston%2Btx')
# waiting for results to load
wait = WebDriverWait(driver, 10)
box = wait.until(EC.visibility_of_element_located((By.ID, "lclbox")))
phone_re = re.compile(r"\(\d{3}\) \d{3}-\d{4}")
for result in box.find_elements_by_class_name("intrlu"):
for span in result.find_elements_by_tag_name("span"):
if phone_re.search(span.text):
parent = span.find_element_by_xpath("../..")
print parent.text
break
print " -"
我很确定它可以改进,但希望它能给你一个起点。印刷品:
^{pr2}$