I am working on a project which needs bit automation and web-scrapping for which I am using Selenium and BeautifulSoup (python2.7).
I want to open only one instance of a web browser and login to a website, keeping that session, I am trying to open new tabs which will be independently controlled by threads, each thread controlling a tab and performing their own task. How should I do it? An example code would be nice. Well here's my code:
def threadFunc(driver, tabId):
if tabId == 1:
#open a new tab and do something in it
elif tabId == 2:
#open another new tab with some different link and perform some task
.... #other cases
class tabThreads(threading.Thread):
def __init__(self, driver, tabId):
threading.Thread.__init__(self)
self.tabID = tabId
self.driver = driver
def run(self):
print "Executing tab ", self.tabID
threadFunc(self.driver, self.tabID)
def func():
# Created a main window
driver = webdriver.Firefox()
driver.get("...someLink...")
# This is the part where i am stuck, whether to create threads and send
# them the same web-driver to stick with the current session by using the
# javascript call "window.open('')" or use a separate for each tab to
# operate on individual pages, but that will open a new browser instance
# everytime a driver is created
thread1 = tabThreads(driver, 1)
thread2 = tabThreads(driver, 2)
...... #other threads
I am open to suggestions for using any other module, if needed
解决方案
My understanding is that Selenium drivers are not thread-safe. In the WebDriver spec, the Thread Safety section is empty...which I take to mean they have not addressed the topic at all. https://www.w3.org/TR/2012/WD-webdriver-20120710/#thread-safety
So while you could share the driver reference with multiple threads and make calls to the driver from multiple threads, there is no guarantee that the driver will be able to handle multiple asynchronous calls correctly.
Instead, you must either synchronize calls from multiple threads to ensure one is completed before the next starts, or you should have just one thread making Selenium API calls...potentially handling commands from a queue that is filled by multiple other threads.