python javascript开发web_Python web抓取javascript生成的内容

I am trying to use python3 to return the bibtex citation generated by http://www.doi2bib.org/. The url's are predictable so the script can work out the url without having to interact with the web page. I have tried using selenium, bs4, etc but cant get the text inside the box.

url = "http://www.doi2bib.org/#/doi/10.1007/s00425-007-0544-9"

import urllib.request

from bs4 import BeautifulSoup

text = BeautifulSoup(urllib.request.urlopen(url).read())

print(text)

Can anyone suggest a way of returning the bibtex citation as a string (or whatever) in python?

解决方案

You don't need BeautifulSoup here. There is an additional XHR request sent to the server to fill out the bibtex citation, simulate it, for example, with requests:

import requests

bibtex_id = '10.1007/s00425-007-0544-9'

url = "http://www.doi2bib.org/#/doi/{id}".format(id=bibtex_id)

xhr_url = 'http://www.doi2bib.org/doi2bib'

with requests.Session() as session:

session.get(url)

response = session.get(xhr_url, params={'id': bibtex_id})

print(response.content)

Prints:

@article{Burgert_2007,

doi = {10.1007/s00425-007-0544-9},

url = {http://dx.doi.org/10.1007/s00425-007-0544-9},

year = 2007,

month = {jun},

publisher = {Springer Science $\mathplus$ Business Media},

volume = {226},

number = {4},

pages = {981--987},

author = {Ingo Burgert and Michaela Eder and Notburga Gierlinger and Peter Fratzl},

title = {Tensile and compressive stresses in tracheids are induced by swelling based on geometrical constraints of the wood cell},

journal = {Planta}

}

You can also solve it with selenium. The key trick here is to use an Explicit Wait to wait for the citation to become visible:

from selenium import webdriver

from selenium.webdriver.common.by import By

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox()

driver.get('http://www.doi2bib.org/#/doi/10.1007/s00425-007-0544-9')

element = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '//pre[@ng-show="bib"]')))

print(element.text)

driver.close()

Prints the same as the above solution.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值