python爬虫获取下一页_Python爬虫怎么获取下一页的URL和网页内容？

最新推荐文章于 2023-05-20 18:06:19 发布

weixin_39815345

最新推荐文章于 2023-05-20 18:06:19 发布

阅读量457

点赞数

文章标签： python爬虫获取下一页

用BeautifulSoup爬取了第一页的内容，但是不知道剩下的页面怎么爬。

首页链接是长这样的：

http://gdemba.gicp.net:82/interunit/ListMain.asp?FirstEnter=Yes&Style=0000100003&UID={A270A117-76A7-4059-AB8F-B11AC370240B}&TimeID=39116.81

通过点击一个“后翻一页”的gif图片按钮跳转到下一页：

bVpPKK

第二页的链接是长这样的：

http://gdemba.gicp.net:82/interunit/ListMain.asp?Keywords=&Style=0000100003&DateLowerLimit=

2000-1-1&DateUpperLimit= 2015-9-11&DateLowerLimitModify=

2000-1-1&DateUpperLimitModify=

2015-9-11&Classification1=0&Classification2=0&Classification3=0&Classification4=0&Classification6=0&Classification7=0&Classification8=0&Class=&Department=001&CreatorName=&CreatorTypeID=&UID={A270A117-76A7-4059-AB8F-B11AC370240B}&SortField=&CustormCondition=&PageNo=2&TimeID=39453.14

这里怎么观察出URL的规律呢？

那个“后翻一页”的链接如下：

οnclick="javascript:window.location.href =

'ListMain.asp?Keywords=&Style=0000100003&DateLowerLimit=

2000-1-1&DateUpperLimit= 2015-9-11&DateLowerLimitModify=

2000-1-1&DateUpperLimitModify=

; ">

WIDTH="16" HEIGHT="16">

要怎么获取下一页的URL和网页内容呢？

有需要更多信息我可以补充上来。

补充代码：

import urllib

import urllib2

import cookielib

import re

import csv

import codecs

from bs4 import BeautifulSoup

cookie = cookielib.CookieJar()

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))

postdata = urllib.urlencode({

'LoginName':'02',

'Password':'dc20150820if'

})

req = urllib2.Request(

url = 'http://gdemba.gicp.net:82/VerifyUser.asp',

data = postdata

)

result = opener.open(req)

for item in cookie:

print 'Cookie：Name = '+item.name

print 'Cookie：Value = '+item.value

result = opener.open('http://gdemba.gicp.net:82/interunit/ListMain.asp?FirstEnter=Yes&Style=0000100003&UID={4C10B953-C0F3-4114-8341-81EF93DE7C55}&TimeID=49252.53')

info = result.read()

soup = BeautifulSoup(info, from_encoding="gb18030")

table = soup.find(id='Table11')

print table

client = ""

tag = ""

tel = ""

catalogue = ""

region = ""

client_type = ""

email = ""

creater = ""

department = ""

action = ""

f = open('table.csv', 'w')

csv_writer = csv.writer(f)

td = re.compile('td')

for row in table.find_all("tr"):

cells = row.find_all("td")

if len(cells) == 10:

client = cells[0].text

tag = cells[1].text

tel = cells[2].text

catalogue = cells[3].text

region = cells[4].text

client_type = cells[5].text

email = cells[6].text

creater = cells[7].text

department = cells[8].text

action = cells[9].text

csv_writer.writerow([x.encode('utf-8') for x in [client, tag, tel, catalogue, region, client_type, email, creater, department, action]])

f.close()

weixin_39815345

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python爬虫获取下一页_Python爬虫怎么获取下一页的URL和网页内容？

用BeautifulSoup爬取了第一页的内容，但是不知道剩下的页面怎么爬。首页链接是长这样的：http://gdemba.gicp.net:82/interunit/ListMain.asp?FirstEnter=Yes&Style=0000100003&UID={A270A117-76A7-4059-AB8F-B11AC370240B}&TimeID=39116.81通过点击一个“后翻一页”的...
复制链接

扫一扫