beautifulsoup获取属性_获取使用BeautifulSoup属性值

I'm writing a python script which will extract the script locations after parsing from a webpage.

Lets say there are two scenarios :

and

I'm able to get the JS from the second scenario, that is when the JS is written within the tags.

But is there any way, I could get the value of src from the first scenario (i.e extracting all the values of src tags within script such as http://example.com/something.js)

Here's my code

#!/usr/bin/python

import requests

from bs4 import BeautifulSoup

r = requests.get("http://rediff.com/")

data = r.text

soup = BeautifulSoup(data)

for n in soup.find_all('script'):

print n

Output : Some JS

解决方案

It will get all the src values only if they are present. Or else it would skip that

from bs4 import BeautifulSoup

import urllib2

url="http://rediff.com/"

page=urllib2.urlopen(url)

soup = BeautifulSoup(page.read())

sources=soup.findAll('script',{"src":True})

for source in sources:

print source['src']

I am getting following two src values as result

http://imworld.rediff.com/worldrediff/js_2_5/ws-global_hm_1.js

http://im.rediff.com/uim/common/realmedia_banner_1_5.js

I guess this is what you want. Hope this is useful.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值