关于抓取网页数据

参考(http://blog.csdn.net/hsn1982/archive/2003/04/01/9939.aspx)这篇文章是用VB6写的,我稍微修改了一下,用在.net2003中,用COM组件,AxWebBrowser就可以实现,代码如下(C#)

  Dim Table1 As HTMLTable, Tables As IHTMLElementCollection
        Dim Row As HTMLTableRow, Cell As HTMLTableCell
        Dim Page As Long
        Dim i, j, tmp, Text1, Text2
        Dim fi As New FileInfo("C:/temp.txt")
        Dim sw As StreamWriter
        TextBox1.Text = AxWebBrowser1.LocationURL
        Tables = AxWebBrowser1.Document.getElementsByTagName("Table")
        For Each Table1 In Tables
            If Microsoft.VisualBasic.Left(Table1.innerText, 4) = "市场概况" Then ' 找到需要的Table
                ' 将表格转换成“.csv”格式
                For i = 1 To Table1.rows.length - 1
                    Row = Table1.rows.item(i)
                    j = 0
                    For Each Cell In Row.cells
                        Text1 = Text1 + Trim(Row.cells.item(j).innerText) + ","
                        j = j + 1
                    Next
                    Text1 = Microsoft.VisualBasic.Left(Text1, Len(Text1) - 1) + vbCrLf
                Next

                If fi.Exists = False Then
                    'Create a file to write to.
                    sw = fi.CreateText()
                End If
                sw = fi.AppendText()
                sw.WriteLine(Microsoft.VisualBasic.Left(Text1, Len(Text1) - 2))
                Exit For
            End If
        Next
        sw.Close()
调试的时候经常连不上网,加上这句就OK了

   System.Net.GlobalProxySelection.Select = System.Net.GlobalProxySelection.GetEmptyWebProxy();

阅读更多
想对作者说点什么?

博主推荐

换一批

没有更多推荐了,返回首页