Flask+echarts+Flask项目部署_flask+echarts+jinja项目-CSDN博客

本文链接：https://blog.csdn.net/m0_55848559/article/details/122413936

本文介绍了如何使用Scrapy爬虫框架获取球员身价数据，然后利用Flask和Echarts结合AJAX在前端展示数据。在后端，详细讲解了Flask应用程序的数据获取，并阐述了在阿里云服务器上通过Gunicorn部署应用以及Nginx的配置和启动，最终实现了数据可视化的线上访问。

摘要由CSDN通过智能技术生成

球员身价数据数据可视化

数据获取scrapy爬虫框架

scrapy简介

框架构成：

引擎：自动运行，无需关注，会自动组织所有的请求对象，分发给下载器

调度器：有自己的调度规则，自动去除重复url

Spiders：Spider类定义了如何爬取某个(或某些)网站。包括了爬取的动作(例如:是否跟进链接)以及如何从网页的内容中提取结构化数据 (爬取item)。 Spider就是您定义爬取的动作及分析某个网页(或者是有些网页)的地方。

管道：数据持久化存储，并去重

大致流程：

spider将从获取的首页（start_url）开始爬取，获取url交给调度器入队列，调度器再将url交给下载器，下载器获取response后再交给spider调用回调函数 parse_item方法，将使用XPath处理HTML并生成一些数据填入 Item 中，pipeline对数据进行持久化存储。后续，spider解析的数据交给pipeline处理，url交给调度器入队列

项目创建

	##### 1.创建scrapy工程

		scrapy startproject football

	##### 2.创建爬虫文件

		scrapy genspider players sofifa.com

	##### 3.编写spider类

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from football.items import FootballItem

# 继承scrapy.Spider类(CrawlSpider父类也是scrapy.Spider)
class PlayersSpider(CrawlSpider):
    # 用于区别Spider。 该名字必须是唯一的，您不可以为不同的Spider设定相同的名字。
    name = 'players'
    # 作用域，限制爬虫只在该域名下
    allowed_domains = ['sofifa.com']
    # 第一个被获取到的页面的URL将是该列表之一。 后续的URL将会从获取到的数据中提取。
    start_urls = ['https://sofifa.com/players/']
	# 一个包含一个(或多个) Rule 对象的集合(list)。 每个 Rule 对爬取网站的动作定义了特定表现。
    rules = (
        # 获取链接（根据allow规则），依次发送请求，并且继续跟进，调用指定回调函数（callback）
        Rule(LinkExtractor(allow=r'/?offset=\d+'), callback='parse_item', follow=False),
    )
	# 回调函数，从下载器获得response，对response解析的函数
    def parse_item(self, response):
        name_list = response.xpath('//tbody/tr/td[2]/a[1]/div/text()')
        country_list = response.xpath('//tbody/tr/td[2]/img/@title')
        age_list = response.xpath('//tbody/tr/td[3]/text()')
        overall_rating_list = response.xpath('//tbody/tr/td[4]/span/text()')
        potential_list = response.xpath('//tbody/tr/td[5]/span/text()')
        club_list = response.xpath('//tbody/tr/td[6]//a/text()')
        value_list = response.xpath('//tbody/tr/td[7]/text()')
        wage_list = response.xpath('//tbody/tr/td[8]/text()')
        total_stats_list = response.xpath('//tbody/tr/td[9]/span/text()')

        for i in range(len(name_list)):
            name = name_list[i].extract()
            try:
                # extract()将匹配的结果转换为Unicode字符串
                country = country_list[i].extract()
            except IndexError:
                country = None
            try:
                age = age_list[i].extract()
            except IndexError:
                age = None
            try:
                overall_rating = overall_rating_list[i].extract()
            except IndexError:
                overall_rating = None
            try:
                potential = potential_list[i].extract()
            except IndexError:
                potential = None
            try:
                club = club_list[i].extract()
            except IndexError:
                club = 0
            try:
                value1 = value_list[i].extract()
                if value1[-1