RSelenium Basics
07002
> RSelenium无头浏览
07003
> RSelenium Vignette
07004
代码示例 # We want to make this as easy as possible to use
# So we need to install required packages for the user...
#
if (!require(RSelenium)) install.packages("RSelenium")
if (!require(XML)) install.packages("XML")
if (!require(RJSONIO)) install.packages("RSJONIO")
if (!require(stringr)) install.packages("stringr")
# Data
#
mainPage
businessPage
# StartServer
# We assume RSelenium is not setup,so we check if the RSelenium
# server is available,if not we install RSelenium server.
checkForServer()
# OK. now we start the server
RSelenium::startServer()
remDr
# We assume the user has installed Firefox and the Selenium IDE
# https://addons.mozilla.org/en-US/firefox/addon/selenium-ide/
#
# Ok we open firefix
remDr$open(silent = T) # Open up a firefox window...
# Now we open the browser and required URL...
# This is the page that matters...
remDr$navigate(businessPage)
# First things first on the first page,lets get the id's for the radio_button,# name Element,and button. We need all three.
#
radioButton
nameElement
searchButton
# Optional: we can highlight the radio elements returned
# lapply(radioButton,function(x){x$highlightElement()})
# Optional: we can highlight the nameElement returned
# lapply(nameElement,function(x){x$highlightElement()})
# Optional: we can highlight the searchButton returned
# lapply(searchButton,function(x){x$highlightElement()})
# Now we can select and press the third radio button
radioButton[[3]]$clickElement()
# We fill in the required name...
nameElement[[1]]$sendKeysToElement(list("PROAÑO & ASOCIADOS CIA. LTDA."))
# This is subtle but required the page triggers a drop down list,so rather than
# hitting the searchButton,we first select,and hit enter in the drop down menu...
selectElement
selectElement[[1]]$clickElement()
# OK,now we can click the search button,which will cause the next page to open
searchButton[[1]]$clickElement()
# New Page opens...
#
# Ok,so now we first pull the list of buttons...
finPageButton
# Now we can press the required button to open the page we want to get too...
finPageButton[[9]]$clickElement()
# We are now on the required page.
我们现在在目标页面上[见图片]
提取表值…
下一步是提取表值.为此,我们提取.z-listitem css选择器数据.现在我们可以检查以确认我们是否看到了数据行.我们这样做,所以我们现在可以提取返回的值并填充列表或Dataframe. # Ok,now we need to extract the table,we identify and pull out the
# '.z-listitem' and assign to modalWindow
modalWindow
# Now we can extract the lines from modalWindow... Now that each line is
# returned as a single line of text,so we split into three based on the
# line marker "/n'
lineText
lineText
在这里,结果是: > lineText
> lineText
[[1]]
[1] "10"
[2] "OPERACIONES DE INGRESO CON PARTES RELACIONADAS EN PARAÍSOS FISCALES,JURISDICCIONES DE MENOR IMPOSICIÓN Y REGÍMENES FISCALES PREFERENTES"
[3] "0.00"
处理隐藏数据.
Selenium WebDriver和RSelenium只与网页的可见元素进行交互.如果我们尝试读取整个表,我们将只返回可见(未隐藏)的表项.
我们可以通过滚动到表格底部来解决此问题.由于滚动操作,我们强制表填充.然后我们可以提取完整的表格. # Select the .z-listBox-body
modalWindow
# Now we tell the window we want to scroll to the bottom of the table
# This triggers the table to populate all the rows
modalWindow[[1]]$executeScript("window.scrollTo(0,document.body.scrollHeight)")
# Now we can extract the complete table
modalWindow
lineText
lineText
代码的作用.
上面的代码示例是自包含的.我的意思是它应该安装你需要的一切,包括所需的包.一旦依赖的R包安装,R代码将调用checkForServer(),如果未安装Selenium,则调用将安装它.这可能要花点时间
我的建议是你逐步完成代码,因为我没有包含任何延迟(在你想要的生产中),请注意我还没有针对速度进行优化,而是为了一点清晰[从我的角度来看] ……
该代码显示可用于:
> Mac OS X 10.11.5
> RStudio 0.99.893
> R版本3.2.4(2016-03-10) – “非常安全的菜肴”