世界上流量最大的网站有哪些,也许我们都能脱口而出,比如 Google,YouTube,Facebook 还有 PxxnHub 等等,今天我们就通过多个维度来看看,那些叱咤全球的流量网站!
数据获取
首先我们还是先抓取数据,目标网站是如下地址
https://www.visualcapitalist.com/the-50-most-visited-websites-in-the-world/
该页面有如下一个表格,里面罗列的全球流量前50的网站,我们就抓取这个数据
下面进行编码,使用 requests 访问页面,通过 BeautifulSoup 解析网页
import requests
import pandas as pd
from bs4 import BeautifulSoup
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36"}
res = requests.get("https://www.visualcapitalist.com/the-50-most-visited-websites-in-the-world/", headers=headers)
soup = BeautifulSoup(res.text)
tbody = soup.find("table").find("tbody")
tr_list = tbody.find_all("tr")
data_list = []
for tr in tr_list:
tds = tr.find_all("td")
tmp = []
for td in tds:
tmp.append(td.text)
data_list.append(tmp)
print(data_list)
Output:
[['1', 'Google.com', '92.5B', 'U.S.', 'Search Engines'],
['2', 'Youtube.com', '34.6B', 'U.S.', 'TV Movies and Streaming'],
['3',
'Facebook.com',
'25.5B',
'U.S.',
'Social Networks and Online Communities'],
['4',
'Twitter.com',
'6.6B',
'U.S.',
'Social Networks and Online Communities'],
['5', 'Wikipedia.org', '6.1B', 'U.S.', 'Dictionaries and Encyclopedias'],
['6',
'Instagram.com',
'6.1B',
'U.S.',
'Social Networks and Online Communities'],
....
拿到上述数据之后,我们整理成 DataFrame 形式
df = pd.DataFrame(data_list)
df.rename(co