python实现爬取非小号相关性（btc）数据

最新推荐文章于 2024-04-26 16:04:21 发布

达科索斯

最新推荐文章于 2024-04-26 16:04:21 发布

阅读量1.9k

点赞数 5

分类专栏：爬虫量化文章标签：爬虫

本文链接：https://blog.csdn.net/weixin_40264579/article/details/115423096

版权

量化同时被 2 个专栏收录

8 篇文章

订阅专栏

爬虫

3 篇文章

订阅专栏

该博客详细介绍了如何使用Python结合Selenium和XPath爬取非小号网站上的数字货币（特别是BTC）相关性数据。首先，下载并配置chromedriver到环境变量，然后利用XPath定位元素，通过Selenium模拟浏览器行为获取完整的网页源码。接着，解析HTML找到所有币种的URL，请求每个URL获取BTC相关性数据。最后，将数据保存到CSV文件中，便于进一步分析。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

python实现爬取非小号相关性（btc）数据

下载chromedriver并且配置到PATH

首先我们使用谷歌测试插件来爬取网页源码，原因是普通的requests.get()在网页没有加载完全的时候就直接爬取了，最后得到的html源码是不完整的。插件下载地址为 chromedriver.
下载后记得将其配置到环境变量PATH中，并放在一下文件夹下：
在这里插入图片描述

配置xpath、selenium环境

xpath用来定位元素、selenium用来调用插件

conda install xpath
conda install selenium

然后对chromedriver插件进行测试：

from selenium import webdriver
browser = webdriver.Chrome()

这个时候如果会自动打开一个chrome浏览器窗口，就说明测试成功了。

定位元素

首先打开非小号的官网，按下F12审查：
在这里插入图片描述
通过选择器发现每个币种的url都在div class="ivu-table-cell"下，然后我们就通过xpath来解析并获取这些div。

1.模拟浏览器打开非小号官网，并停留5秒钟（为了使得网页完全打开），然后获取并解析源码

 	browser.get(url) 
    time.sleep(5)
    page_text = browser.page_source
    tree = etree.HTML(page_text)
    li_list = tree.xpath("//div[@class='ivu-table-cell']")

2.保存所有币种的url
我们可以看到url就藏在a标签中于是我们通过xpath解析a标签的href。

    url_list = []
    for coin_url in li_list:
        if len(coin_url.xpath('./a/@href')) == 0:
            continue
        url_list.append(str(coin_url.xpath('./a/@href')[0]))

3.对所有币种url进行请求获得BTC相关性数据
首先我们随便打开一个币种的网页：
在这里插入图片描述
然后同样用选择器选择到相关性的位置，然后在源码对应的地方右键，选择copy – xpath，就能获取这个相关性数据了。

    coef_list = []
    for temp_url in url_list:
        try:
            temp_text = requests.get(temp_url).text
            temp_text = etree.HTML(temp_text)
            coef = temp_text.xpath("//body/div[@id='__nuxt']/div[@id='__layout']/section[1]/div[1]/div[1]/div[1]/div[1]/div[3]/div[2]/div[2]/div[8]/span[2]")
            coef = float(coef[0].text.split('\n')[1])
            coef_list.append(coef)
            print(temp_url + 'is finished!!!')
        except Exception as e:
            coef_list.append(None)
            print (e)

保存数据

最后将数据保存为csv文件。

    coin_name = [x.split('/')[4] for x in url_list]
    coef_df = pd.DataFrame([coin_name, coef_list]).T
    coef_df.columns = ['coin', 'coef']
    coef_df.to_csv(os.getcwd() + '\coef_df.csv', index = False)

文件打开后就是所有币种对BTC的相关系数了！
在这里插入图片描述

完整代码

import requests
from lxml import etree
from selenium import webdriver
import time
import pandas as pd
import os


browser = webdriver.Chrome()
url = 'https://www.feixiaohao.com'
datapath = os.getcwd()

if __name__ == "__main__":
    browser.get(url) 
    time.sleep(5)
    page_text = browser.page_source
    tree = etree.HTML(page_text)
    li_list = tree.xpath("//div[@class='ivu-table-cell']")
    
    # 找出所有币种的url
    url_list = []
    for coin_url in li_list:
        if len(coin_url.xpath('./a/@href')) == 0:
            continue
        url_list.append(str(coin_url.xpath('./a/@href')[0]))

    url_list = [url + x for x in url_list ]
    
    # 根据所有币种URL爬取BTC相关性
    coef_list = []
    for temp_url in url_list:
        try:
            temp_text = requests.get(temp_url).text
            temp_text = etree.HTML(temp_text)
            coef = temp_text.xpath("//body/div[@id='__nuxt']/div[@id='__layout']/section[1]/div[1]/div[1]/div[1]/div[1]/div[3]/div[2]/div[2]/div[8]/span[2]")
            coef = float(coef[0].text.split('\n')[1])
            coef_list.append(coef)
            print(temp_url + 'is finished!!!')
        except Exception as e:
            coef_list.append(None)
            print (e)
            
    # 保存数据
    coin_name = [x.split('/')[4] for x in url_list]
    coef_df = pd.DataFrame([coin_name, coef_list]).T
    coef_df.columns = ['coin', 'coef']
    
    coef_df.to_csv(os.getcwd() + '\coef_df.csv', index = False)