python 爬虫源码 selenium_Python爬虫使用Selenium+PhantomJS抓取Ajax和动态HTML内容

最新推荐文章于 2020-12-01 19:14:16 发布

weixin_39720662

最新推荐文章于 2020-12-01 19:14:16 发布

阅读量74

点赞数

文章标签： python 爬虫源码 selenium

#/usr/bin/python

from urllib importrequestfrom lxml importetreefrom selenium importwebdriverimporttime#京东手机商品页面

url="http://item.jd.com/1312640.html"

#下面的xslt是通过集搜客的谋数台图形界面自动生成的

xslt_root = etree.XML("""\

<商品>

商品>

<价格>

价格>

<名称>

名称>

""")#使用webdriver.PhantomJS

browser=webdriver.PhantomJS(executable_path='C:\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe')

browser.get(url)

time.sleep(3)

transform=etree.XSLT(xslt_root)#执行js得到整个dom

html = browser.execute_script("return document.documentElement.outerHTML")

doc=etree.HTML(html)#用xslt从dom中提取需要的字段

result_tree =transform(doc)print(result_tree)

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

关注关注