I wanted to do get some experience with html crawling, so I wanted to see if I could grab some values of the following site: http://www.iex.nl/Aandeel-Koers/11890/Royal-Imtech/koers.aspx
This site shows the price of imtech shares.
If you take a look at the site, you see there is 1 number shown in bold, this is the price of the share.
As you may have seen, this price changes, and that's okay. I only want the value at the time I run my script at this point in time.
but if you reload the page, you may notice how it first shows "laatste koers" and after a delay of 1 second it shows "realtime"
As you may have figured out by now, I'm interested in the "realtime" value.
Here is my question, how do I get this value, I've tried time.sleep(2) on different places. I've tried a timeout at the request. Both didn't work.
How can I fix this?
from lxml import html
import requests
pagina = 'http://www.iex.nl/Aandeel-Koers/11890/Royal-Imtech/koers.aspx'
page = requests.get(pagina)
tree = html.fromstring(page.text)
koers = tree.xpath('//span[@class="RealtimeLabel"]/text()')
prices = tree.xpath('//span[@id="ctl00_ctl00_Content_LeftContent_PriceDetails_lblLastPrice"]/text()')
print koers[0], pagina.split("/")[5], prices[0]
I get output like this
Laatste koers Royal-Imtech 0,093
While I want output like this
Realtime Royal-Imtech 0,093
解决方案
I would suggest use a wait until the element changes.
Find the block of code below to help you.
def wait_while(condition, timeout, delta=1):
"""
@condition: lambda function which checks if the text contains "REALTIME"
@timeout: Max waiting time
@delta: time after which another check has to be made
"""
max_time = time.time() + timeout
while max_time > time.time():
if condition():
return True
time.sleep(delta)
return False