python soup findall div tr td_如何解析HTML表格Python和beautifulsoup并写入到CSV

I try to parse html page and fetch values for currencies and write to csv.

I have following code:

#!/usr/bin/env python

import urllib2

from BeautifulSoup import BeautifulSoup

contenturl = "http://www.bank.gov.ua/control/en/curmetal/detail/currency?period=daily"

soup = BeautifulSoup(urllib2.urlopen(contenturl).read())

table = soup.find('div', attrs={'class': 'content'})

rows = table.findAll('tr')

for tr in rows:

cols = tr.findAll('td')

for td in cols:

text = td.find(text=True) + ';'

print text,

print

The problem is, that I do not know, how to retrieve only values for currency.

I tried some regexp like '^[0-9]{3}' - start with 3 digits but it doesn't work.

解决方案

You'd be much better off picking out specific cells in the table. The td cells with the cell_c class contain data you are interested in, and the last one is always the currency exchange rate:

rows = table.findAll('tr')

for tr in rows:

cols = tr.findAll('td')

if 'cell_c' in cols[0]['class']:

# currency row

digital_code, letter_code, units, name, rate = [c.text for c in cols]

print digital_code, letter_code, units, name, rate

With the data in separate variables, you can now turn the text to decimal numbers, store them in a database, whatever.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值