我有这样的HTML
Ages 15
getCurrentLocationVal("loc_loads1",29.45218856,59.38139268,1);
我正在尝试使用BeautifulSoup提取年龄15
所以我写了如下的python代码
码:
from bs4 import BeautifulSoup as bs
import urllib3
URL = 'html file'
http = urllib3.PoolManager()
page = http.request('GET', URL)
soup = bs(page.data, 'html.parser')
age = soup.find("span", {"class": "age"})
print(age.text)
输出:
Age 15 getCurrentLocationVal("loc_loads1",29.45218856,59.38139268,1);
我只想要15岁,而不是脚本标记中的函数.有什么办法只能获取文本:15岁?或以任何方式排除脚本标签的内容?
PS: there are too many script tags and different URLS. I don’t prefer
replace text from the output.