python怎么从网站提取数据_Python 3从体育网站提取HTML数据

I have been trying to extract data from a sports site and so far failing. I am Trying to extract the 35, Shots on Goal and 23 but have been failing.

35
Shots on Goal
23

from bs4 import BeautifulSoup

import requests

result = requests.get("https://www.scoreboard.com/uk/match/lvbns58C/#match-statistics;0")

src = result.content

soup = BeautifulSoup(src, 'html.parser')

stats = soup.find("div", {"class": "tab-statistics-0-statistic"})

print(stats)

This is the code I have been trying to use and when I run it I get "None" printed to me. Could someone help me so I can print out the data.

解决方案

As the website is rendered by javascript, possible option would load the page using selenium and then parse it with BeautifulSoup:

from bs4 import BeautifulSoup

from selenium import webdriver

from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.common.by import By

from selenium.webdriver.support.ui import WebDriverWait

# initialize selenium driver

chrome_options = webdriver.ChromeOptions()

chrome_options.add_argument('--headless')

chrome_options.add_argument('--no-sandbox')

chrome_options.add_argument('--disable-dev-shm-usage')

wd = webdriver.Chrome('<>', options=chrome_options)

# load page via selenium

wd.get("https://www.scoreboard.com/uk/match/lvbns58C/#match-statistics;0")

# wait 30 seconds until element with class mainGrid will be loaded

table = WebDriverWait(wd, 30).until(EC.presence_of_element_located((By.ID, 'statistics-content')))

# parse content of the table

soup = BeautifulSoup(table.get_attribute('innerHTML'), 'html.parser')

print(soup)

# close selenium driver

wd.quit()

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值