问题
I'm trying to pull information from an html table. I want to add each element with in the table to a collection. This is what I have so far.
Dim htmlTable As Object
Dim coll2 As Collection
Set coll2 = New Collection
Set IE = New InternetExplorerMedium
With IE
'.AddressBar = False
'.MenuBar = False
.Navigate ("PASSWORDED SITE")
.Visible = True
End With
Set htmlTable = IE.Document.getElementById("ctl00_ContentPlaceHolder1_gvExtract")
Set tableItem = IE.Document.getElementsByTagName("td")
With coll2
For Each tableItem In htmlTable.innerHTML
.Add tableItem
Next
End With
I have a problem with this line For Each tableItem In htmlTable.innerText I tried diffent variations of htmlTable.innerText each throwing differant errors.
This is the HTML Extract for the table.
Delete | Download | Extract Date | User Id Owner |
---|---|---|---|
Delete | Work Order Inquiry - Work Order | 06/20/2017 07:50:37 | MBMAYO |
Delete | Work Order Inquiry - Work Order | 06/20/2017 07:39:29 | MBMAYO |
Delete | Work Order Inquiry - Work Order | 06/20/2017 07:23:54 | MBMAYO |
Delete | Work Order Inquiry - Work Order | 06/20/2017 07:16:16 | MBMAYO |
Delete | Work Order Inquiry - Work Order | 06/20/2017 07:14:06 | MBMAYO |
The goal is to store each
as an item for a collection and then retrieve the date for example 06/20/2017 07:50:37 from it. This table Grows so I think an array is out of the question?Edit from comment:
I have been trying call this function, I'm getting a object does not support this method error:
Public Function htmlCell(id As String) As String
htmlCell = IE.getElementById("ctl00_ContentPlaceHolder1_gvExtract")
.getElementsByTagName("td")(id).innerHTML
End Function
回答1:
What you probably need is something like this. HTH
Dim htmlTable As MSHTML.htmlTable
Dim htmlTableCells As MSHTML.IHTMLElementCollection
Dim htmlTableCell As MSHTML.htmlTableCell
Dim htmlAnchor As MSHTML.HTMLAnchorElement
Set htmlTable = ie.document.getElementById("ctl00_ContentPlaceHolder1_gvExtract")
Set htmlTableCells = htmlTable.getElementsByTagName("td")
With coll2
For Each htmlTableCell In htmlTableCells
If VBA.TypeName(htmlTableCell.FirstChild) = "HTMLAnchorElement" Then
Set htmlAnchor = htmlTableCell.FirstChild
.Add htmlAnchor.innerHTML
Else
.Add htmlTableCell.innerHTML
End If
Next
End With
Result
Dim el
For Each el In coll2
Debug.Print el
Next el
Output:
Delete
Work Order Inquiry - Work Order
06/20/2017 07:50:37
MBMAYO
Delete
Work Order Inquiry - Work Order
06/20/2017 07:39:29
MBMAYO
Delete
Work Order Inquiry - Work Order
06/20/2017 07:23:54
MBMAYO
Delete
Work Order Inquiry - Work Order
06/20/2017 07:16:16
MBMAYO
Delete
Work Order Inquiry - Work Order
06/20/2017 07:14:06
MBMAYO
回答2:
I would try something like this:
Dim htmlTable As Object
Dim collTD As Collection
Dim oNode as Object
' Set IE ...
Set htmlTable = IE.Document.getElementById("ctl00_ContentPlaceHolder1_gvExtract")
' You only want the td's inside htmlTable !
Set collTD = htmlTable.getElementsByTagName("td")
For Each oNode In collTD
Debug.Print oNode.InnerHTML
' Stop -> use Watch window to drill down into oNode subitems
Next oNode
and go from there.
回答3:
I think it should be something like this.
Sub Scrape_HTML()
Set ie = CreateObject("InternetExplorer.application")
With ie
.Visible = True
.navigate "your_URL_here"
' Wait for the page to fully load; you can't do anything if the page is not fully loaded
Do While .Busy Or _
.readyState <> 4
DoEvents
Loop
Set Links = ie.document.getElementsByTagName("tr")
RowCount = 1
' Scrape out the innertext of each 'tr' element.
With Sheets("DataSheet")
For Each lnk In Links
.Range("A" & RowCount) = lnk.innerText
RowCount = RowCount + 1
Next
End With
End Sub
来源:https://stackoverflow.com/questions/44653360/vba-scraping-information-from-html-table