python soup findall div tr td_python BeautifulSoup 获取页面多个子节点中的各个节点的内容...

页面html格式为

lyl5577d92

李永利

lyl5577d
469680008
2016-05-21 15:24:27.0
0
0000

1 importhttplib2 from BeautifulSoup importBeautifulSoup3

4

5 defmain():6 f = open('result','a')7

8 headers = {'Content-Type':'application/x-www-form-urlencoded',9 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',10 'Accept-Language': 'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3',11 'Accept-Encoding': 'gzip, deflate',12 'Referer': 'http://xxx.xxx.com/admin/userlist',13 'Cookie': 'JSESSIONID=9F6F2D03D2C11400B3D6731E90D73117',14 'User-Agent': 'User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:46.0) Gecko/20100101 Firefox/46.0',15 }16

17 conn = httplib.HTTPConnection('*.*.*.*', timeout=50)18

19 for p in range(1,1287):20 printp21 conn.request(method='GET',22 url="/admin/userlist?toPage=%s&sessionID=" %str(p),23 headers=headers)24 resp =conn.getresponse()25 html_doc =resp.read()26 mainSoup =BeautifulSoup(html_doc)27 for s in mainSoup.findAll('tr', attrs={'bgcolor':'#7bb5de'}):28 if 'style' not instr(s):29 continue

30 for d in s.findAll('td'):31 printd.getText(),32 f.write("%s" % d.getText().encode('utf-8')) #f.write("%s " % d.getText())==> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)33 f.write("%s\n" % d.getText().encode('utf-8'))34 print

35 f.close()36 conn.close()37

38

39 if __name__ == '__main__':40 main()41

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值