【项目实战】基于python+爬虫的电影数据分析及可视化系统

注意:该项目只展示部分功能,如需了解,文末咨询即可。

在这里插入图片描述

1.开发环境

开发语言:Python
技术框架:Flask、爬虫
数据库:MySQL
开发工具:PyCharm

2 系统设计

2.1 设计背景

在数字时代,大数据技术在各行各业得到广泛应用,尤其在影视行业,通过分析大量的数据可以帮助人们更好地了解市场趋势和用户偏好。然而传统的数据分析方法难以处理如此庞大和复杂的数据集,且结果的展示不够直观,这就迫切需要开发一个既能高效处理大数据又能直观展示分析结果的系统。通过实现这一系统,不仅能够为用户提供直观的数据分析结果,帮助他们快速了解影视作品的各种统计信息,而且能够为影视行业提供决策支持,比如电影推荐和评分预测等功能,有助于提升用户体验和满意度。本系统的开发展示了大数据技术和可视化技术在影视数据分析领域的应用价值,对促进该领域的技术进步和创新发展具有重要意义。

2.2 设计内容

本课题旨在开发一个基于python的电影数据分析及可视化,该系统利用Scrapy爬虫技术从“豆瓣网站”爬取相关电影数据,然后通过Python和Spark技术进行数据处理和清理,最终存储到MySQL数据库中。系统的核心功能包括电影标签、电影星际、电影评分区间、年代、数量、词云等。所有这些功能都通过Echarts可视化框架以图形的方式展现,同时,系统还提供用户管理和登录功能,为管理员和用户提供不同的操作权限和数据展示。
1. 系统架构
系统采用分层架构设计,主要包括以下几个层次:
数据采集层: 使用Scrapy爬虫框架
数据处理层: 使用Python进行数据清洗和处理
数据存储层: 使用MySQL数据库
数据分析层: 使用Python进行数据分析
数据可视化层: 使用Echarts框架
应用层: 提供用户界面和交互功能
2. 主要功能模块
2.1 数据采集模块

使用Scrapy框架爬取豆瓣网站的电影数据
设计爬虫规则,提取电影标题、评分、标签、上映年份等信息
实现数据的增量更新和定期爬取
2.2 数据处理模块
使用Python和Spark进行数据清洗,去除重复和无效数据
数据格式化,统一数据格式
数据转换,将原始数据转换为适合分析的结构
2.3 数据存储模块
设计MySQL数据库schema,包括电影信息表、用户表等
实现数据的批量导入和更新
优化数据库查询性能
2.4 数据分析模块
实现电影标签分析功能
开发电影评分区间统计功能
实现电影年代分布分析
电影数量统计分析
词云分析功能
2.5 数据可视化模块
使用Echarts框架实现各种图表展示
包括柱状图、饼图、折线图、词云图等
实现图表的交互功能,如数据筛选、缩放等
2.6 用户管理模块
实现用户注册、登录功能
用户权限管理,区分管理员和普通用户
根据用户权限展示不同的数据和功能
3. 系统流程
数据采集: Scrapy爬虫定期从豆瓣网站爬取电影数据
数据处理: 使用PySpark进行数据清洗和转换
数据存储: 将处理后的数据存入MySQL数据库
数据分析: 使用Python进行各种统计分析
数据可视化: 使用Echarts将分析结果以图表形式展示
用户交互: 用户通过Web界面查看分析结果,管理员可进行系统管理

3 系统页面展示

3.1 用户页面

在这里插入图片描述
在这里插入图片描述

3.2 后台页面

在这里插入图片描述
在这里插入图片描述

3.3 功能展示视频

基于python+爬虫的豆瓣电影数据分析及可视化系统

4 更多推荐

计算机毕设选题精选汇总
基于Hadoop大数据电商平台用户行为分析与可视化系统
Django+Python数据分析岗位招聘信息爬取与分析
基于python爬虫的商城商品比价数据分析
基于Python的网络小说榜单信息爬取与数据可视化系统
基于Spark大数据的餐饮外卖数据分析可视化系统

5 部分功能代码

5.1 爬虫代码

# # -*- coding: utf-8 -*-

# 数据爬取文件

import scrapy
import pymysql
import pymssql
from ..items import DianyingxinxiItem
import time
from datetime import datetime,timedelta
import datetime as formattime
import re
import random
import platform
import json
import os
import urllib
from urllib.parse import urlparse
import requests
import emoji
import numpy as np
import pandas as pd
from sqlalchemy import create_engine
from selenium.webdriver import ChromeOptions, ActionChains
from scrapy.http import TextResponse
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
# 电影信息
class DianyingxinxiSpider(scrapy.Spider):
    name = 'dianyingxinxiSpider'
    spiderUrl = 'https://m.douban.com/rexxar/api/v2/movie/recommend?refresh=0&start={}&count=20&selected_categories=%7B%22%E7%B1%BB%E5%9E%8B%22:%22%E5%96%9C%E5%89%A7%22%7D&uncollect=false&tags=%E5%96%9C%E5%89%A7&ck=IsFz'
    start_urls = spiderUrl.split(";")
    protocol = ''
    hostname = ''
    realtime = False

    headers = {
        'Referer':'https://movie.douban.com/explore',
'Cookie':'用你的Cookie'
    }

    def __init__(self,realtime=False,*args, **kwargs):
        super().__init__(*args, **kwargs)
        self.realtime = realtime=='true'

    def start_requests(self):

        plat = platform.system().lower()
        if not self.realtime and (plat == 'linux' or plat == 'windows'):
            connect = self.db_connect()
            cursor = connect.cursor()
            if self.table_exists(cursor, 'c2xklu0u_dianyingxinxi') == 1:
                cursor.close()
                connect.close()
                self.temp_data()
                return
        pageNum = 1 + 1

        for url in self.start_urls:
            if '{}' in url:
                for page in range(1, pageNum):

                    next_link = url.format(page)
                    yield scrapy.Request(
                        url=next_link,
                        headers=self.headers,
                        callback=self.parse
                    )
            else:
                yield scrapy.Request(
                    url=url,
                    headers=self.headers,
                    callback=self.parse
                )

    # 列表解析
    def parse(self, response):
        _url = urlparse(self.spiderUrl)
        self.protocol = _url.scheme
        self.hostname = _url.netloc
        plat = platform.system().lower()
        if not self.realtime and (plat == 'linux' or plat == 'windows'):
            connect = self.db_connect()
            cursor = connect.cursor()
            if self.table_exists(cursor, 'c2xklu0u_dianyingxinxi') == 1:
                cursor.close()
                connect.close()
                self.temp_data()
                return
        data = json.loads(response.body)
        try:
            list = data["items"]
        except:
            pass
        for item in list:
            fields = DianyingxinxiItem()


            try:
                fields["title"] = str( item["title"])

            except:
                pass
            try:
                fields["year"] = int( item["year"])
            except:
                pass
            try:
                fields["picture"] = str( item["pic"]["normal"])

            except:
                pass
            try:
                fields["ypxx"] = str( item["card_subtitle"])

            except:
                pass
            try:
                fields["pingfen"] = float( item["rating"]["value"])
            except:
                pass
            try:
                fields["pjs"] = int( item["rating"]["count"])
            except:
                pass
            try:
                fields["wxs"] = float( item["rating"]["star_count"])
            except:
                pass
            try:
                fields["tags"] = str( item["tags"][0]["name"])

            except:
                pass
            try:
                fields["plnr"] = str( item["comment"]["comment"])

            except:
                pass
            try:
                fields["uname"] = str( item["comment"]["user"]["name"])

            except:
                pass
            try:
                fields["xqdz"] = str('https://movie.douban.com/subject/'+ item["id"])

            except:
                pass

            yield fields

    # 详情解析
    def detail_parse(self, response):
        fields = response.meta['fields']
        return fields

    # 数据清洗
    def pandas_filter(self):
        engine = create_engine('mysql+pymysql://root:123456@localhost/spiderc2xklu0u?charset=UTF8MB4')
        df = pd.read_sql('select * from dianyingxinxi limit 50', con = engine)

        # 重复数据过滤
        df.duplicated()
        df.drop_duplicates()

        #空数据过滤
        df.isnull()
        df.dropna()

        # 填充空数据
        df.fillna(value = '暂无')

        # 异常值过滤

        # 滤出 大于800 和 小于 100 的
        a = np.random.randint(0, 1000, size = 200)
        cond = (a<=800) & (a>=100)
        a[cond]

        # 过滤正态分布的异常值
        b = np.random.randn(100000)
        # 3σ过滤异常值,σ即是标准差
        cond = np.abs(b) > 3 * 1
        b[cond]

        # 正态分布数据
        df2 = pd.DataFrame(data = np.random.randn(10000,3))
        # 3σ过滤异常值,σ即是标准差
        cond = (df2 > 3*df2.std()).any(axis = 1)
        # 不满⾜条件的⾏索引
        index = df2[cond].index
        # 根据⾏索引,进⾏数据删除
        df2.drop(labels=index,axis = 0)

    # 去除多余html标签
    def remove_html(self, html):
        if html == None:
            return ''
        pattern = re.compile(r'<[^>]+>', re.S)
        return pattern.sub('', html).strip()

    # 数据库连接
    def db_connect(self):
        type = self.settings.get('TYPE', 'mysql')
        host = self.settings.get('HOST', 'localhost')
        port = int(self.settings.get('PORT', 3306))
        user = self.settings.get('USER', 'root')
        password = self.settings.get('PASSWORD', '123456')

        try:
            database = self.databaseName
        except:
            database = self.settings.get('DATABASE', '')

        if type == 'mysql':
            connect = pymysql.connect(host=host, port=port, db=database, user=user, passwd=password, charset='utf8')
        else:
            connect = pymssql.connect(host=host, user=user, password=password, database=database)
        return connect

    # 断表是否存在
    def table_exists(self, cursor, table_name):
        cursor.execute("show tables;")
        tables = [cursor.fetchall()]
        table_list = re.findall('(\'.*?\')',str(tables))
        table_list = [re.sub("'",'',each) for each in table_list]

        if table_name in table_list:
            return 1
        else:
            return 0

    # 数据缓存源
    def temp_data(self):

        connect = self.db_connect()
        cursor = connect.cursor()
        sql = '''
            insert into `dianyingxinxi`(
                id
                ,title
                ,year
                ,picture
                ,ypxx
                ,pingfen
                ,pjs
                ,wxs
                ,tags
                ,plnr
                ,uname
                ,xqdz
            )
            select
                id
                ,title
                ,year
                ,picture
                ,ypxx
                ,pingfen
                ,pjs
                ,wxs
                ,tags
                ,plnr
                ,uname
                ,xqdz
            from `c2xklu0u_dianyingxinxi`
            where(not exists (select
                id
                ,title
                ,year
                ,picture
                ,ypxx
                ,pingfen
                ,pjs
                ,wxs
                ,tags
                ,plnr
                ,uname
                ,xqdz
            from `dianyingxinxi` where
                `dianyingxinxi`.id=`c2xklu0u_dianyingxinxi`.id
            ))
            order by rand()
            limit 50;
        '''

        cursor.execute(sql)
        connect.commit()
        connect.close()

5.2 电影信息代码

# coding:utf-8
__author__ = "ila"

import logging, os, json, configparser
import time
from datetime import datetime

from flask import request, jsonify,session
from sqlalchemy.sql import func,and_,or_,case
from sqlalchemy import cast, Integer,Float
from api.models.brush_model import *
from . import main_bp
from utils.codes import *
from utils.jwt_auth import Auth
from configs import configs
from utils.helper import *
import random
import smtplib
from email.mime.text import MIMEText
from email.utils import formataddr
from email.header import Header
from utils.baidubce_api import BaiDuBce
from api.models.config_model import config




from flask import current_app as app
from utils.spark_func import spark_read_mysql
from utils.hdfs_func import upload_to_hdfs
from utils.mapreduce1 import MRMySQLAvg


# 注册接口
@main_bp.route("/python9532dr50/dianyingxinxi/register", methods=['POST'])
def python9532dr50_dianyingxinxi_register():
    if request.method == 'POST':
        msg = {'code': normal_code, 'message': 'success', 'data': [{}]}
        req_dict = session.get("req_dict")


        error = dianyingxinxi.createbyreq(dianyingxinxi, dianyingxinxi, req_dict)
        if error!=None:
            msg['code'] = crud_error_code
            msg['msg'] = "注册用户已存在"
        return jsonify(msg)

# 登录接口
@main_bp.route("/python9532dr50/dianyingxinxi/login", methods=['GET','POST'])
def python9532dr50_dianyingxinxi_login():
    if request.method == 'GET' or request.method == 'POST':
        msg = {"code": normal_code, "msg": "success", "data": {}}
        req_dict = session.get("req_dict")
        req_model = session.get("req_dict")
        try:
            del req_model['role']
        except:
            pass


        datas = dianyingxinxi.getbyparams(dianyingxinxi, dianyingxinxi, req_model)
        if not datas:
            msg['code'] = password_error_code
            msg['msg']='密码错误或用户不存在'
            return jsonify(msg)


        req_dict['id'] = datas[0].get('id')
        try:
            del req_dict['mima']
        except:
            pass


        return Auth.authenticate(Auth, dianyingxinxi, req_dict)


# 登出接口
@main_bp.route("/python9532dr50/dianyingxinxi/logout", methods=['POST'])
def python9532dr50_dianyingxinxi_logout():
    if request.method == 'POST':
        msg = {
            "msg": "退出成功",
            "code": 0
        }
        req_dict = session.get("req_dict")

        return jsonify(msg)

# 重置密码接口
@main_bp.route("/python9532dr50/dianyingxinxi/resetPass", methods=['POST'])
def python9532dr50_dianyingxinxi_resetpass():
    '''
    '''
    if request.method == 'POST':
        msg = {"code": normal_code, "msg": "success"}

        req_dict = session.get("req_dict")

        if req_dict.get('mima') != None:
            req_dict['mima'] = '123456'

        error = dianyingxinxi.updatebyparams(dianyingxinxi, dianyingxinxi, req_dict)

        if error != None:
            msg['code'] = crud_error_code
            msg['msg'] = error
        else:
            msg['msg'] = '密码已重置为:123456'
        return jsonify(msg)

# 获取会话信息接口
@main_bp.route("/python9532dr50/dianyingxinxi/session", methods=['GET'])
def python9532dr50_dianyingxinxi_session():
    '''
    '''
    if request.method == 'GET':
        msg = {"code": normal_code, "data": {}}
        req_dict={"id":session.get('params').get("id")}
        msg['data']  = dianyingxinxi.getbyparams(dianyingxinxi, dianyingxinxi, req_dict)[0]

        return jsonify(msg)

# 分类接口(后端)
@main_bp.route("/python9532dr50/dianyingxinxi/page", methods=['GET'])
def python9532dr50_dianyingxinxi_page():
    '''
    '''
    if request.method == 'GET':
        msg = {"code": normal_code, "msg": "success",  "data":{"currPage":1,"totalPage":1,"total":1,"pageSize":10,"list":[]}}
        req_dict = session.get("req_dict")
        userinfo = session.get("params")

        try:
            __hasMessage__=dianyingxinxi.__hasMessage__
        except:
            __hasMessage__=None
        if __hasMessage__ and __hasMessage__!="否":
            tablename=session.get("tablename")
            if tablename!="users" and session.get("params")!=None and dianyingxinxi!='chat':
                req_dict["userid"]=session.get("params").get("id")

        tablename=session.get("tablename")
        if tablename=="users" :
            try:
                pass
            except:
                pass
        else:
            mapping_str_to_object = {}
            for model in Base_model._decl_class_registry.values():
                if hasattr(model, '__tablename__'):
                    mapping_str_to_object[model.__tablename__] = model

            try:
                __isAdmin__=mapping_str_to_object[tablename].__isAdmin__
            except:
                __isAdmin__=None
            try:
                __authSeparate__ =mapping_str_to_object[tablename].__authSeparate__
            except:
                __authSeparate__ = None

            if __isAdmin__!="是" and __authSeparate__ == "是" and session.get("params")!=None:
                req_dict["userid"]=session.get("params").get("id")
            else:
                try:
                    del req_dict["userid"]
                except:
                    pass



        clause_args = []
        or_clauses = or_(*clause_args)

        msg['data']['list'], msg['data']['currPage'], msg['data']['totalPage'], msg['data']['total'], \
        msg['data']['pageSize']  = dianyingxinxi.page(dianyingxinxi, dianyingxinxi, req_dict, or_clauses)

        return jsonify(msg)

# 排序接口
@main_bp.route("/python9532dr50/dianyingxinxi/autoSort", methods=['GET'])
def python9532dr50_dianyingxinxi_autosort():
    '''
    '''
    if request.method == 'GET':
        msg = {"code": normal_code, "msg": "success",  "data":{"currPage":1,"totalPage":1,"total":1,"pageSize":10,"list":[]}}
        req_dict = session.get("req_dict")
        req_dict['sort']='clicktime'
        req_dict['order']='desc'

        try:
            __browseClick__= dianyingxinxi.__browseClick__
        except:
            __browseClick__=None

        if __browseClick__ =='是':
            req_dict['sort']='clicknum'
        elif __browseClick__ =='时长':
            req_dict['sort']='browseduration'
        else:
            req_dict['sort']='clicktime'
        msg['data']['list'], msg['data']['currPage'], msg['data']['totalPage'], msg['data']['total'], \
        msg['data']['pageSize']  = dianyingxinxi.page(dianyingxinxi, dianyingxinxi, req_dict)

        return jsonify(msg)

# 分页接口(前端)
@main_bp.route("/python9532dr50/dianyingxinxi/list", methods=['GET'])
def python9532dr50_dianyingxinxi_list():
    '''
    '''
    if request.method == 'GET':
        msg = {"code": normal_code, "msg": "success",  "data":{"currPage":1,"totalPage":1,"total":1,"pageSize":10,"list":[]}}
        req_dict = session.get("req_dict")
        if req_dict.__contains__('vipread'):
            del req_dict['vipread']
            
        userinfo = session.get("params")

        try:
            __foreEndList__=dianyingxinxi.__foreEndList__
        except:
            __foreEndList__=None

        if __foreEndList__ and __foreEndList__!="否":
            tablename=session.get("tablename")
            if tablename!="users" and session.get("params")!=None:
                req_dict['userid']=session.get("params").get("id")

        try:
            __foreEndListAuth__=dianyingxinxi.__foreEndListAuth__
        except:
            __foreEndListAuth__=None

        if __foreEndListAuth__ and __foreEndListAuth__!="否":
            tablename=session.get("tablename")
            if tablename!="users" and session.get("params")!=None:
                req_dict['userid']=session.get("params").get("id")

        tablename=session.get("tablename")
        if tablename=="users" :
            try:
                del req_dict["userid"]
            except:
                pass
        else:
            mapping_str_to_object = {}
            for model in Base_model._decl_class_registry.values():
                if hasattr(model, '__tablename__'):
                    mapping_str_to_object[model.__tablename__] = model

            try:
                __isAdmin__=mapping_str_to_object[tablename].__isAdmin__
            except:
                __isAdmin__=None

            if __isAdmin__!="是" and session.get("params")!=None:
                req_dict["userid"]=session.get("params").get("id")

        if 'luntan' in 'dianyingxinxi':
            if 'userid' in req_dict.keys():
                del req_dict["userid"]


        if 'discuss' in 'dianyingxinxi':
            if 'userid' in req_dict.keys():
                del req_dict["userid"]

        msg['data']['list'], msg['data']['currPage'], msg['data']['totalPage'], msg['data']['total'], \
        msg['data']['pageSize']  = dianyingxinxi.page(dianyingxinxi, dianyingxinxi, req_dict)

        return jsonify(msg)

# 保存接口(后端)
@main_bp.route("/python9532dr50/dianyingxinxi/save", methods=['POST'])
def python9532dr50_dianyingxinxi_save():
    '''
    '''
    if request.method == 'POST':
        msg = {"code": normal_code, "msg": "success", "data": {}}
        req_dict = session.get("req_dict")
        for key in req_dict:
            if req_dict[key] == '':
                req_dict[key] = None

        error= dianyingxinxi.createbyreq(dianyingxinxi, dianyingxinxi, req_dict)
        if error!=None:
            msg['code'] = crud_error_code
            msg['msg'] = error
        return jsonify(msg)

# 添加接口(前端)
@main_bp.route("/python9532dr50/dianyingxinxi/add", methods=['POST'])
def python9532dr50_dianyingxinxi_add():
    '''
    '''
    if request.method == 'POST':
        msg = {"code": normal_code, "msg": "success", "data": {}}
        req_dict = session.get("req_dict")
        try:
            __foreEndListAuth__=dianyingxinxi.__foreEndListAuth__
        except:
            __foreEndListAuth__=None

        if __foreEndListAuth__ and __foreEndListAuth__!="否":
            tablename=session.get("tablename")
            if tablename!="users":
                req_dict['userid']=session.get("params").get("id")

        error= dianyingxinxi.createbyreq(dianyingxinxi, dianyingxinxi, req_dict)
        if error!=None:
            msg['code'] = crud_error_code
            msg['msg'] = error
        return jsonify(msg)

# 踩、赞接口
@main_bp.route("/python9532dr50/dianyingxinxi/thumbsup/<id_>", methods=['GET'])
def python9532dr50_dianyingxinxi_thumbsup(id_):
    '''
    '''
    if request.method == 'GET':
        msg = {"code": normal_code, "msg": "success", "data": {}}
        req_dict = session.get("req_dict")
        id_=int(id_)
        type_=int(req_dict.get("type",0))
        rets=dianyingxinxi.getbyid(dianyingxinxi, dianyingxinxi,id_)

        update_dict={
        "id":id_,
        }
        if type_==1:#赞
            update_dict["thumbsupnum"]=int(rets[0].get('thumbsupnum'))+1
        elif type_==2:#踩
            update_dict["crazilynum"]=int(rets[0].get('crazilynum'))+1
        error = dianyingxinxi.updatebyparams(dianyingxinxi, dianyingxinxi, update_dict)
        if error!=None:
            msg['code'] = crud_error_code
            msg['msg'] = error
        return jsonify(msg)

# 获取详情信息(后端)
@main_bp.route("/python9532dr50/dianyingxinxi/info/<id_>", methods=['GET'])
def python9532dr50_dianyingxinxi_info(id_):
    '''
    '''
    if request.method == 'GET':
        msg = {"code": normal_code, "msg": "success", "data": {}}

        data = dianyingxinxi.getbyid(dianyingxinxi, dianyingxinxi, int(id_))
        if len(data)>0:
            msg['data']=data[0]
        #浏览点击次数
        try:
            __browseClick__= dianyingxinxi.__browseClick__
        except:
            __browseClick__=None

        if __browseClick__  and  "clicknum"  in dianyingxinxi.__table__.columns:
            click_dict={"id":int(id_),"clicknum":str(int(data[0].get("clicknum") or 0)+1)}
            ret=dianyingxinxi.updatebyparams(dianyingxinxi,dianyingxinxi,click_dict)
            if ret!=None:
                msg['code'] = crud_error_code
                msg['msg'] = ret
        return jsonify(msg)

# 获取详情信息(前端)
@main_bp.route("/python9532dr50/dianyingxinxi/detail/<id_>", methods=['GET'])
def python9532dr50_dianyingxinxi_detail(id_):
    '''
    '''
    if request.method == 'GET':
        msg = {"code": normal_code, "msg": "success", "data": {}}

        data = dianyingxinxi.getbyid(dianyingxinxi, dianyingxinxi, int(id_))
        if len(data)>0:
            msg['data']=data[0]

        #浏览点击次数
        try:
            __browseClick__= dianyingxinxi.__browseClick__
        except:
            __browseClick__=None

        if __browseClick__ and "clicknum" in dianyingxinxi.__table__.columns:
            click_dict={"id":int(id_),"clicknum":str(int(data[0].get("clicknum") or 0)+1)}
            ret=dianyingxinxi.updatebyparams(dianyingxinxi,dianyingxinxi,click_dict)
            if ret!=None:
                msg['code'] = crud_error_code
                msg['msg'] = ret
        return jsonify(msg)

# 更新接口
@main_bp.route("/python9532dr50/dianyingxinxi/update", methods=['POST'])
def python9532dr50_dianyingxinxi_update():
    '''
    '''
    if request.method == 'POST':
        msg = {"code": normal_code, "msg": "success", "data": {}}
        req_dict = session.get("req_dict")
        if req_dict.get("mima") and "mima" not in dianyingxinxi.__table__.columns :
            del req_dict["mima"]
        if req_dict.get("password") and "password" not in dianyingxinxi.__table__.columns :
            del req_dict["password"]
        try:
            del req_dict["clicknum"]
        except:
            pass


        error = dianyingxinxi.updatebyparams(dianyingxinxi, dianyingxinxi, req_dict)
        if error!=None:
            msg['code'] = crud_error_code
            msg['msg'] = error


        return jsonify(msg)

# 删除接口
@main_bp.route("/python9532dr50/dianyingxinxi/delete", methods=['POST'])
def python9532dr50_dianyingxinxi_delete():
    '''
    '''
    if request.method == 'POST':
        msg = {"code": normal_code, "msg": "success", "data": {}}
        req_dict = session.get("req_dict")

        error=dianyingxinxi.delete(
            dianyingxinxi,
            req_dict
        )
        if error!=None:
            msg['code'] = crud_error_code
            msg['msg'] = error
        return jsonify(msg)

# 投票接口
@main_bp.route("/python9532dr50/dianyingxinxi/vote/<int:id_>", methods=['POST'])
def python9532dr50_dianyingxinxi_vote(id_):
    '''
    '''
    if request.method == 'POST':
        msg = {"code": normal_code, "msg": "success"}


        data= dianyingxinxi.getbyid(dianyingxinxi, dianyingxinxi, int(id_))
        for i in data:
            votenum=i.get('votenum')
            if votenum!=None:
                params={"id":int(id_),"votenum":votenum+1}
                error=dianyingxinxi.updatebyparams(dianyingxinxi,dianyingxinxi,params)
                if error!=None:
                    msg['code'] = crud_error_code
                    msg['msg'] = error
        return jsonify(msg)


@main_bp.route("/python9532dr50/dianyingxinxi/sectionStat/pingfen", methods=['GET'])
def python9532dr50_dianyingxinxi_sectionStat_pingfen():
    '''
    分段统计接口
    '''
    if request.method == 'GET':
        msg = {"code": normal_code, "msg": "success", "data": []}
        where = " where 1=1"
        tablename=session.get("tablename")
        sql = """
            SELECT '四分以下' as pingfen,case when t.四分以下 is null then 0 else t.四分以下 end total
            from 
            (select
            sum(case when pingfen >= 0 and pingfen <= 4 then 1 else 0 end) as 四分以下,            sum(case when pingfen >= 4.1 and pingfen <= 6 then 1 else 0 end) as 四分到六分,            sum(case when pingfen >= 6.1 and pingfen <= 8 then 1 else 0 end) as 六分到八分,            sum(case when pingfen >= 8.1 and pingfen <= 10 then 1 else 0 end) as 八到十分            from dianyingxinxi """ +where+""" ) t union all            SELECT '四分到六分' as pingfen,case when t.四分到六分 is null then 0 else t.四分到六分 end total
            from 
            (select
            sum(case when pingfen >= 0 and pingfen <= 4 then 1 else 0 end) as 四分以下,            sum(case when pingfen >= 4.1 and pingfen <= 6 then 1 else 0 end) as 四分到六分,            sum(case when pingfen >= 6.1 and pingfen <= 8 then 1 else 0 end) as 六分到八分,            sum(case when pingfen >= 8.1 and pingfen <= 10 then 1 else 0 end) as 八到十分            from dianyingxinxi """ +where+""" ) t union all            SELECT '六分到八分' as pingfen,case when t.六分到八分 is null then 0 else t.六分到八分 end total
            from 
            (select
            sum(case when pingfen >= 0 and pingfen <= 4 then 1 else 0 end) as 四分以下,            sum(case when pingfen >= 4.1 and pingfen <= 6 then 1 else 0 end) as 四分到六分,            sum(case when pingfen >= 6.1 and pingfen <= 8 then 1 else 0 end) as 六分到八分,            sum(case when pingfen >= 8.1 and pingfen <= 10 then 1 else 0 end) as 八到十分            from dianyingxinxi """ +where+""" ) t union all            SELECT '八到十分' as pingfen,case when t.八到十分 is null then 0 else t.八到十分 end total
            from 
            (select
            sum(case when pingfen >= 0 and pingfen <= 4 then 1 else 0 end) as 四分以下,            sum(case when pingfen >= 4.1 and pingfen <= 6 then 1 else 0 end) as 四分到六分,            sum(case when pingfen >= 6.1 and pingfen <= 8 then 1 else 0 end) as 六分到八分,            sum(case when pingfen >= 8.1 and pingfen <= 10 then 1 else 0 end) as 八到十分            from dianyingxinxi """ +where+""" ) t  """

        data = db.session.execute(sql)
        data = data.fetchall()
        results = []
        for i in range(len(data)):
            result = {
                'pingfen': decimalEncoder(data[i][0]),
                'total': decimalEncoder(data[i][1])
            }
            results.append(result)
            
        msg['data'] = results
        return jsonify(msg)
@main_bp.route("/python9532dr50/dianyingxinxi/sectionStat/wxs", methods=['GET'])
def python9532dr50_dianyingxinxi_sectionStat_wxs():
    '''
    分段统计接口
    '''
    if request.method == 'GET':
        msg = {"code": normal_code, "msg": "success", "data": []}
        where = " where 1=1"
        tablename=session.get("tablename")
        sql = """
            SELECT '二分以下' as wxs,case when t.二分以下 is null then 0 else t.二分以下 end total
            from 
            (select
            sum(case when wxs >= 0 and wxs <= 2 then 1 else 0 end) as 二分以下,            sum(case when wxs >= 2.1 and wxs <= 4 then 1 else 0 end) as 二到4分,            sum(case when wxs >= 4.1 and wxs <= 5 then 1 else 0 end) as 四分到五分            from dianyingxinxi """ +where+""" ) t union all            SELECT '二到4分' as wxs,case when t.二到4分 is null then 0 else t.二到4分 end total
            from 
            (select
            sum(case when wxs >= 0 and wxs <= 2 then 1 else 0 end) as 二分以下,            sum(case when wxs >= 2.1 and wxs <= 4 then 1 else 0 end) as 二到4分,            sum(case when wxs >= 4.1 and wxs <= 5 then 1 else 0 end) as 四分到五分            from dianyingxinxi """ +where+""" ) t union all            SELECT '四分到五分' as wxs,case when t.四分到五分 is null then 0 else t.四分到五分 end total
            from 
            (select
            sum(case when wxs >= 0 and wxs <= 2 then 1 else 0 end) as 二分以下,            sum(case when wxs >= 2.1 and wxs <= 4 then 1 else 0 end) as 二到4分,            sum(case when wxs >= 4.1 and wxs <= 5 then 1 else 0 end) as 四分到五分            from dianyingxinxi """ +where+""" ) t  """

        data = db.session.execute(sql)
        data = data.fetchall()
        results = []
        for i in range(len(data)):
            result = {
                'wxs': decimalEncoder(data[i][0]),
                'total': decimalEncoder(data[i][1])
            }
            results.append(result)
            
        msg['data'] = results
        return jsonify(msg)


# 分组统计接口
@main_bp.route("/python9532dr50/dianyingxinxi/group/<columnName>", methods=['GET'])
def python9532dr50_dianyingxinxi_group(columnName):
    '''
    分组统计接口
    '''
    if request.method == 'GET':
        msg = {"code": normal_code, "msg": "success", "data": {}}
        req_dict = session.get("req_dict")
        userinfo = session.get("params")


        msg['data'] = dianyingxinxi.groupbycolumnname(dianyingxinxi,dianyingxinxi,columnName,req_dict)
        msg['data'] = msg['data'][:10]
        msg['data'] = [ {**i,columnName:str(i[columnName])} if columnName in i else i for i in msg['data']]
        json_filename='dianyingxinxi'+f'_group_{columnName}.json'

        where = ' where 1 = 1 '
        sql = "SELECT COUNT(*) AS total, " + columnName + " FROM dianyingxinxi " + where + " GROUP BY " + columnName
        with open(json_filename, 'w', encoding='utf-8') as f:
            f.write(json.dumps(msg['data'], indent=4, ensure_ascii=False))
        app.executor.submit(upload_to_hdfs, json_filename)
        app.executor.submit(MRMySQLAvg.run)
        return jsonify(msg)

# 按值统计接口
@main_bp.route("/python9532dr50/dianyingxinxi/value/<xColumnName>/<yColumnName>", methods=['GET'])
def python9532dr50_dianyingxinxi_value(xColumnName, yColumnName):
    '''
    按值统计接口,
    {
        "code": 0,
        "data": [
            {
                "total": 10.0,
                "shangpinleibie": "aa"
            },
            {
                "total": 20.0,
                "shangpinleibie": "bb"
            },
            {
                "total": 15.0,
                "shangpinleibie": "cc"
            }
        ]
    }
    '''
    if request.method == 'GET':
        msg = {"code": normal_code, "msg": "success", "data": {}}
        req_dict = session.get("req_dict")
        userinfo = session.get("params")


        msg['data'] = dianyingxinxi.getvaluebyxycolumnname(dianyingxinxi,dianyingxinxi,xColumnName,yColumnName,req_dict)
        msg['data'] = msg['data'][:10]
        return jsonify(msg)

# 按日期统计接口
@main_bp.route("/python9532dr50/dianyingxinxi/value/<xColumnName>/<yColumnName>/<timeStatType>", methods=['GET'])
def python9532dr50_dianyingxinxi_value_riqi(xColumnName, yColumnName, timeStatType):
    '''
    按日期统计接口
    '''
    if request.method == 'GET':
        msg = {"code": normal_code, "msg": "success", "data": {}}

        userinfo = session.get("params")
        where = ' where 1 = 1 '
        sql = ''
        if timeStatType == '日':
            sql = "SELECT DATE_FORMAT({0}, '%Y-%m-%d') {0}, sum({1}) total FROM dianyingxinxi {2} GROUP BY DATE_FORMAT({0}, '%Y-%m-%d')".format(xColumnName, yColumnName, where, '%Y-%m-%d')

        if timeStatType == '月':
            sql = "SELECT DATE_FORMAT({0}, '%Y-%m') {0}, sum({1}) total FROM dianyingxinxi {2} GROUP BY DATE_FORMAT({0}, '%Y-%m')".format(xColumnName, yColumnName, where, '%Y-%m')

        if timeStatType == '年':
            sql = "SELECT DATE_FORMAT({0}, '%Y') {0}, sum({1}) total FROM dianyingxinxi {2} GROUP BY DATE_FORMAT({0}, '%Y')".format(xColumnName, yColumnName, where, '%Y')

        data = db.session.execute(sql)
        data = data.fetchall()
        results = []
        for i in range(len(data)):
            result = {
                xColumnName: decimalEncoder(data[i][0]),
                'total': decimalEncoder(data[i][1])
            }
            results.append(result)
            
        msg['data'] = results
        json_filename='dianyingxinxi'+f'_value_{xColumnName}_{yColumnName}.json'
        with open(json_filename, 'w', encoding='utf-8') as f:
            f.write(json.dumps(results, indent=4, ensure_ascii=False))
        app.executor.submit(upload_to_hdfs, json_filename)
        app.executor.submit(MRMySQLAvg.run)

        return jsonify(msg)

# 按值统计(多)
@main_bp.route("/python9532dr50/dianyingxinxi/valueMul/<xColumnName>", methods=['GET'])
def python9532dr50_dianyingxinxi_valueMul(xColumnName):

    if request.method == 'GET':
        msg = {"code": normal_code, "msg": "success", "data": []}

        req_dict = session.get("req_dict")
        userinfo = session.get("params")

        where = ' where 1 = 1 '

        for item in req_dict['yColumnNameMul'].split(','):
            sql = "SELECT {0}, sum({1}) AS total FROM dianyingxinxi {2} GROUP BY {0} LIMIT 10".format(xColumnName, item, where)
            L = []
            data = db.session.execute(sql)
            data = data.fetchall() 
            for i in range(len(data)):
                result = {
                    xColumnName: decimalEncoder(data[i][0]),
                    'total': decimalEncoder(data[i][1])
                }
                L.append(result)
            msg['data'].append(L)

        return jsonify(msg)

# 按值统计(多)
@main_bp.route("/python9532dr50/dianyingxinxi/valueMul/<xColumnName>/<timeStatType>", methods=['GET'])
def python9532dr50_dianyingxinxi_valueMul_time(xColumnName):

    if request.method == 'GET':
        msg = {"code": normal_code, "msg": "success", "data": []}

        req_dict = session.get("req_dict")
        userinfo = session.get("params")
        timeStatType = req_dict['timeStatType']
        where = ' where 1 = 1 '

        for item in req_dict['yColumnNameMul'].split(','):
            sql = ''
            if timeStatType == '日':
                sql = "SELECT DATE_FORMAT({0}, '%Y-%m-%d') {0}, sum({1}) total FROM dianyingxinxi {2} GROUP BY DATE_FORMAT({0}, '%Y-%m-%d') LIMIT 10".format(xColumnName, item, where, '%Y-%m-%d')

            if timeStatType == '月':
                sql = "SELECT DATE_FORMAT({0}, '%Y-%m') {0}, sum({1}) total FROM dianyingxinxi {2} GROUP BY DATE_FORMAT({0}, '%Y-%m') LIMIT 10".format(xColumnName, item, where, '%Y-%m')

            if timeStatType == '年':
                sql = "SELECT DATE_FORMAT({0}, '%Y') {0}, sum({1}) total FROM dianyingxinxi {2} GROUP BY DATE_FORMAT({0}, '%Y') LIMIT 10".format(xColumnName, item, where, '%Y')
            L = []
            data = db.session.execute(sql)
            data = data.fetchall() 
            for i in range(len(data)):
                result = {
                    xColumnName: decimalEncoder(data[i][0]),
                    'total': decimalEncoder(data[i][1])
                }
                L.append(result)
            msg['data'].append(L)

        return jsonify(msg)

import math
def cosine_similarity(a, b):
    numerator = sum([a[key] * b[key] for key in a if key in b])
    denominator = math.sqrt(sum([a[key]**2 for key in a])) * math.sqrt(sum([b[key]**2 for key in b]))
    return numerator / denominator

#收藏协同算法
@main_bp.route("/python9532dr50/dianyingxinxi/autoSort2", methods=['GET'])
def python9532dr50_dianyingxinxi_autoSort2():
    if request.method == 'GET':
        user_ratings = {}
        req_dict = session.get("req_dict")
        userinfo = session.get("params")
        sql = "select * from storeup where type = 1 and tablename = 'dianyingxinxi' order by addtime desc"
        data = db.session.execute(sql)
        data_dict = [dict(zip(result.keys(), result)) for result in data.fetchall()]

        for item in data_dict:
            if user_ratings.__contains__(item["userid"]):
                ratings_dict = user_ratings[item["userid"]]
                if ratings_dict.__contains__(item["refid"]):
                    ratings_dict[str(item["refid"])]+=1
                else:
                    ratings_dict[str(item["refid"])] =1
            else:
                user_ratings[item["userid"]] = {
                    str(item["refid"]):1
                }
        sorted_recommended_goods=[]
        try:
            # 计算目标用户与其他用户的相似度
            similarities = {other_user: cosine_similarity(user_ratings[userinfo.get("id")], user_ratings[other_user])
                            for other_user in user_ratings if other_user != userinfo.get("id")}
            # 找到与目标用户最相似的用户
            most_similar_user = sorted(similarities, key=similarities.get, reverse=True)[0]
            # 找到最相似但目标用户未购买过的商品
            recommended_goods = {goods: rating for goods, rating in user_ratings[most_similar_user].items() if
                                 goods not in user_ratings[userinfo.get("id")]}
            # 按评分降序排列推荐
            sorted_recommended_goods = sorted(recommended_goods, key=recommended_goods.get, reverse=True)
        except:
            pass

        L = []
        where = " AND ".join([f"{key} = '{value}'" for key, value in req_dict.items() if key!="page" and key!="limit" and key!="order"and key!="sort"])
        if where:
            sql = f'''SELECT * FROM (SELECT * FROM dianyingxinxi WHERE {where}) AS table1 WHERE id IN ('{"','".join(sorted_recommended_goods)}') union all SELECT * FROM (SELECT * FROM dianyingxinxi WHERE {where}) AS table1 WHERE id NOT IN ('{"','".join(sorted_recommended_goods)}')'''
        else:
            sql ="select * from dianyingxinxi where id in ('%s"%("','").join(sorted_recommended_goods)+"') union all select * from dianyingxinxi where id not in('%s"%("','").join(sorted_recommended_goods)+"')"
        data = db.session.execute(sql)
        data_dict = [dict(zip(result.keys(), result)) for result in data.fetchall()]
        for online_dict in data_dict:
            for key in online_dict:
                if 'datetime.datetime' in str(type(online_dict[key])):
                    online_dict[key] = online_dict[key].strftime(
                        "%Y-%m-%d %H:%M:%S")
                elif 'datetime' in str(type(online_dict[key])):
                    online_dict[key] = online_dict[key].strftime(
                        "%Y-%m-%d %H:%M:%S")
                else:
                    pass
            L.append(online_dict)

        return jsonify({"code": 0, "msg": '',  "data":{"currPage":1,"totalPage":1,"total":1,"pageSize":5,"list": L[0:int(req_dict['limit'])]}})


# 总数量
@main_bp.route("/python9532dr50/dianyingxinxi/count", methods=['GET'])
def python9532dr50_dianyingxinxi_count():
    '''
    '''
    if request.method == 'GET':
        msg = {"code": normal_code, "msg": "success",  "data": 0}
        req_dict = session.get("req_dict")
        userinfo = session.get("params")


        msg['data']  = dianyingxinxi.count(dianyingxinxi, dianyingxinxi, req_dict)

        return jsonify(msg)

# 统计接口
@main_bp.route("/python9532dr50/dianyingxinxi/remind/<columnName>/<type>", methods=['GET'])  #
def python9532dr50_dianyingxinxi_remind(columnName,type):
    '''
    '''
    if request.method == 'GET':
        msg = {"code": normal_code, 'count': 0}
        # 组合查询参数
        params = session.get("req_dict")
        remindstart = 0
        remindend =9999990
        if int(type)==1:#数字
            if params.get('remindstart') == None and params.get('remindend') != None:
                remindstart = 0
                remindend = int(params['remindend'])
            elif params.get('remindstart') != None and params.get('remindend') == None:
                remindstart = int(params['remindstart'])
                remindend = 999999
            elif params.get('remindstart') == None and params.get('remindend') == None:
                remindstart = 0
                remindend = 999999
            else:
                remindstart = params.get('remindstart')
                remindend =  params.get('remindend')
        elif int(type)==2:#日期
            current_time=int(time.time())
            if params.get('remindstart') == None and params.get('remindend') != None:
                starttime=current_time-60*60*24*365*2
                params['remindstart'] = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(starttime))
                endtime=current_time+60*60*24*params.get('remindend')
                params['remindend'] = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(endtime))

            elif params.get('remindstart') != None and params.get('remindend') == None:
                starttime= current_time - 60 * 60 * 24 * params.get('remindstart')
                params['remindstart']=time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(starttime))
                endtime=current_time+60*60*24*365*2
                params['remindend'] = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(endtime))
            elif params.get('remindstart') == None and params.get('remindend') == None:
                starttime = current_time - 60 * 60 * 24 * 365 * 2
                params['remindstart'] = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(starttime))
                endtime = current_time + 60 * 60 * 24 * 365 * 2
                params['remindend'] = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(endtime))

        data = dianyingxinxi.getbetweenparams(
            dianyingxinxi,
            dianyingxinxi,
            columnName,
            {
                "remindStart": remindstart,
                "remindEnd": remindend
            }
        )

        msg['count'] = len(data)
        return jsonify(msg)

#分类列表
@main_bp.route("/python9532dr50/dianyingxinxi/lists", methods=['GET'])
def python9532dr50_dianyingxinxi_lists():
    if request.method == 'GET':
        msg = {"code": normal_code, "msg": "success", "data": []}
        list,_,_,_,_ = dianyingxinxi.page(dianyingxinxi,dianyingxinxi,{})
        msg['data'] = list
        return jsonify(msg)


源码项目、定制开发、文档报告、PPT、代码答疑
希望和大家多多交流!!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值