python的url中加入变量的值_Python URL变量int添加到字符串

pgno = 1

while pgno < 4304:

result = urllib.urlopen("http://www.example.comtraderesourcespincode.aspx?" +

"&GridInfo=Pincode0"+ pgno)

print pgno

html = result.read()

parser = etree.HTMLParser()

tree = etree.parse(StringIO.StringIO(html), parser)

pgno += 1

in http://.......=Pincode0 I need to add 1..for e.g like 'Pincode01', loop it 01 to 02, 03 .. for which I am using a while loop and the variable assigned is 'pgno'.

The problem is the counter is adding 1, but 'Pincode01' is not becoming 'Pincode02' ... therefore it is not opening the 2nd page of the site.

I even tried +str(pgno)) ... no luck.

Please show how to do it. I am not able to do this ...and have attempted it several times.

解决方案

Probably, you want this :

from urllib import urlopen

import re

pgno = 2

url = "http://www.eximguru.com/traderesources/pincode.aspx?&GridInfo=Pincode0%s" %str(pgno)

print url +'\n'

sock = urlopen(url)

htmlcode = sock.read()

sock.close()

x = re.search('%;">(\d+)'

'

(.+?)'

'

(.+?)'

'

(.+?)\r\n')

regx = re.compile(pat)

print '\n'.join(map(repr,regx.findall(htmlcode,x)))

result

http://www.eximguru.com/traderesources/pincode.aspx?&GridInfo=Pincode02

('110001', 'New Delhi', 'Delhi', 'Baroda House')

('110001', 'New Delhi', 'Delhi', 'Bengali Market')

('110001', 'New Delhi', 'Delhi', 'Bhagat Singh Market')

('110001', 'New Delhi', 'Delhi', 'Connaught Place')

('110001', 'New Delhi', 'Delhi', 'Constitution House')

('110001', 'New Delhi', 'Delhi', 'Election Commission')

('110001', 'New Delhi', 'Delhi', 'Janpath')

('110001', 'New Delhi', 'Delhi', 'Krishi Bhawan')

('110001', 'New Delhi', 'Delhi', 'Lady Harding Medical College')

('110001', 'New Delhi', 'Delhi', 'New Delhi Gpo')

('110001', 'New Delhi', 'Delhi', 'New Delhi Ho')

('110001', 'New Delhi', 'Delhi', 'North Avenue')

('110001', 'New Delhi', 'Delhi', 'Parliament House')

('110001', 'New Delhi', 'Delhi', 'Patiala House')

('110001', 'New Delhi', 'Delhi', 'Pragati Maidan')

('110001', 'New Delhi', 'Delhi', 'Rail Bhawan')

('110001', 'New Delhi', 'Delhi', 'Sansad Marg Hpo')

('110001', 'New Delhi', 'Delhi', 'Sansadiya Soudh')

('110001', 'New Delhi', 'Delhi', 'Secretariat North')

('110001', 'New Delhi', 'Delhi', 'Shastri Bhawan')

('110001', 'New Delhi', 'Delhi', 'Supreme Court')

('110002', 'New Delhi', 'Delhi', 'Rajghat Power House')

('110002', 'New Delhi', 'Delhi', 'Minto Road')

('110002', 'New Delhi', 'Delhi', 'Indraprastha Hpo')

('110002', 'New Delhi', 'Delhi', 'Darya Ganj')

I wrote this code after having studied the structure of the HTML source code with the following code (I think you'll understand it without any more explanations):

from urllib2 import Request,urlopen

import re

pgno = 2

url = "http://www.eximguru.com/traderesources/pincode.aspx?&GridInfo=Pincode0%s" %str(pgno)

print url +'\n'

sock = urlopen(url)

htmlcode = sock.read()

sock.close()

li = htmlcode.splitlines(True)

print '\n'.join(str(i) + ' ' + repr(line)+'\n' for i,line in enumerate(li) if 275

ch = ''.join(li[0:291])

from collections import defaultdict

didi =defaultdict(int)

for c in ch:

didi[c] += 1

print '\n\n'+repr(li[289])

print '\n'.join('%r -> %s' % (c,didi[c]) for c in li[289] if didi[c]<35)

.

Now, the problem is that the same HTML is returned for all the values of pgno. The site may detect it is a program that wants to connect and fetch data. This problem must be treated with the tools in urllib2, but I'm not trained to that.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值