管理员文件上传flask后端数据清洗和处理(实验项目第八期)

暗雾飘扬

已于 2023-12-01 13:18:44 修改

阅读量164

点赞数

分类专栏： python机器学习_实验项目文章标签： flask python 后端数据挖掘

于 2023-12-01 13:17:05 首次发布

本文链接：https://blog.csdn.net/m0_72541842/article/details/134728532

版权

python机器学习_实验项目专栏收录该内容

12 篇文章 1 订阅

订阅专栏

管理员文件上传flask后端数据清洗和处理

管理员的数据清洗

管理员的数据清洗

数据清洗和处理是训练机器模型的前置条件
建议在jupyter里运行代码，成功以后再添加到后端

此处我的功能需求是：
将三张表里的数据清洗为一张表，用于训练机器模型，流程如下图：
在这里插入图片描述

一、文件名后缀名检查

后端收到了文件以后当然要检查后缀名是否符合要求
（比如excel就有xsl和xslx版本，一个老一个新）

#检查文件后缀名是否符合规范的函数
def allowed_file(filename):
    return "." in filename and filename.rsplit('.', 1)[1].lower() in set(['xlsx'])

二、文件存入临时目录

前端代码：

<el-upload ref="uploadRef" class="upload-demo" :action="uploadUrl" :auto-upload="false" :on-success="handleSuccess"
    :on-error="handleError" :with-credentials="true" :multiple="true"> 
    <!-- with-credentials支持发送 cookie 凭证信息 -->
    <template #trigger>
      <el-button type="primary">选择文件</el-button>
    </template>

    <el-button class="ml-3" type="success" @click="submitUpload">
      上传文件
    </el-button>
  </el-upload>


setup() {
    const uploadRef = ref(null)
    const uploadUrl = 'http://localhost:5000/upload' // 后端服务器的 上传文件接口

    const submitUpload = () => {
      uploadRef.value.submit()
    }

    const handleSuccess = (response, file, fileList) => {
      that.$message.success('上传成功');
      console.log('上传成功')
      console.log(response) // 从后端返回的响应数据
      // 在这里可以执行其他成功处理逻辑
    }

    const handleError = (error, file, fileList) => {
      that.$message.error('上传失败');
      console.error('上传失败')
      console.error(error) // 错误信息
      // 在这里可以执行其他错误处理逻辑
    }
	return {
	      uploadRef,
	      uploadUrl,
	      submitUpload,
	      handleSuccess,
	      handleError
	}
  },

后端代码：
使用flask的os包来存储文件到一个自定义的临时目录

# 单张表存入目录，等待清洗处理
def admin_data_filter():
    # 先获取前端的表
    excel_file = request.files['file']  # 获取前端传来的单个文件对象

    #1、存放待处理数据文件目录（自定义）
    path = r'C:\Test\files'    #r 声明其后字符串不需要转义，因为 ** \ ** 在被做为转义字符使用
    #os.path.exists 函数判断文件夹是否存在
    folder = os.path.exists(path)
    # 判断是否存在文件夹如果不存在则创建为文件夹
    if not folder:
        # os.makedirs 传入一个path路径，生成一个递归的文件夹；
        # 如果文件夹存在，就会报错,因此创建文件夹之前，需要使用os.path.exists(path)函数判断文件夹是否存在；
        os.makedirs(path)   # makedirs 创建文件时如果路径不存在会创建这个路径
        print('文件夹创建成功：', path)
    else:
        print('文件夹已经存在：', path)

    #2、上传文件
    if (excel_file == None):  # 如果文件为空
        print("空文件")
        return jsonify({  # 读取失败直接返回500
            'status': 'fail',
            'message': '上传 Excel 文件失败'
        }), 500

    if (excel_file and allowed_file(excel_file.filename)):  #如果文件后缀名正确并且文件存在
        print("文件读取成功&&文件后缀名正确")
        excel_file.save(os.path.join(path, excel_file.filename))    #保存文件到指定的文件夹(path)中
        #返回200
        return jsonify({
            'status': 'success',
            'message': 'Files uploaded successfully'
        }), 200
    else:
        #返回500
        return jsonify({
            'status': 'error',
            'message': '文件后缀名错误/文件不存在'
        }), 500

三、清洗存储目录里的3张表，处理为一张’清洗过的数据.xlsx’

（前端需要上传三张表）

def admin_data_process():
    # 1、清洗出来的数据的存放目录
    download_folder = r'C:\Test\processed_files'  # r 声明其后字符串不需要转义，因为 ** \ ** 在被做为转义字符使用
    # os.path.exists 函数判断文件夹是否存在
    folder = os.path.exists(download_folder)
    # 判断是否存在文件夹如果不存在则创建为文件夹
    if not folder:
        # os.makedirs 传入一个path路径，生成一个递归的文件夹；
        # 如果文件夹存在，就会报错,因此创建文件夹之前，需要使用os.path.exists(path)函数判断文件夹是否存在；
        os.makedirs(download_folder)  # makedirs 创建文件时如果路径不存在会创建这个路径
        print('文件夹创建成功：', download_folder)
    else:
        print('文件夹已经存在：', download_folder)

    # 2、读取待处理数据
    upload_folder = 'C:/Test/files'  #待处理数据的存放地址
    file_path1 = os.path.join(upload_folder,'数据1.xlsx')
    file_path2 = os.path.join(upload_folder, '数据2.xlsx')
    file_path3 = os.path.join(upload_folder, '数据3.xlsx')
    if not os.path.exists(file_path1):
        print("数据1文件不存在")
        return jsonify({
            'status':'数据1文件不存在',
            'error':'File not found'
        }),500
    if not os.path.exists(file_path1):
        print("数据2文件不存在")
        return jsonify({
            'status':'数据2文件不存在',
            'error':'File not found'
        }),500
    if not os.path.exists(file_path1):
        print("数据3文件不存在")
        return jsonify({
            'status':'数据3文件不存在',
            'error':'File not found'
        }),500
    #使用pandas读取excel文件
    try:
        #2.1、读取数据1
        ori_score_data = pd.read_excel(file_path1)
        #2.2 读取数据2
        ori_job_data = pd.read_excel(file_path2)
        #2.3 读取数据3
        total_data = pd.read_excel(file_path3)

    except Exception as e:
        return jsonify({
            'status': 'pd读取数据失败',
            'error': f'Failed to read Excel file: {str(e)}'
        }),500
    print("使用pd读取文件成功")
    
# 3、进行数据处理
#请根据自己需要书写清洗数据逻辑（推荐先在jupyter中运行再复制到后端）
#得到一个train_data(DataForm类型)

#4保存train_data到download_folder里的'清洗过的数据.xlsx'里
    try:
        download_file_path = os.path.join(download_folder,'清洗过的数据.xlsx')  # 新文件的路径
        train_data.to_excel(download_file_path,index=False)            #将DataForm保存为excel文件
        print("数据清洗完成 并 上传到目录里")
        # 所有数据处理完成，返回200
        return jsonify({
            'status': 'success',
            'message': 'Process and upload file successfully'
        }), 200
    except Exception as e:
        print("数据清洗出现问题")
        return jsonify({
            'status': 'error',
            'message': f'Fail to process and upload file: {str(e)}'
        }), 500

四、前端下载清理过的文件

后端代码：

#让客户端下载清洗过的数据功能
def admin_data_download():
    # 1、清洗出来的数据的临时存放位置
    download_folder = r'C:\Test\processed_files'
    file_path = os.path.join(download_folder, '清洗过的数据.xlsx')

    #如果不存在‘清洗过的数据.xlsx’文件
    if not os.path.exists(file_path):
        return jsonify({'error': 'File not found'}), 404

    #函数将文件发送给客户端进行下载
    #as_attachment=True 参数表示将文件作为附件下载，而不是在浏览器中打开
    return send_file(file_path, as_attachment=True)

前端代码：

//下载数据
getDownload(){
      axios.get('/download',{
        responseType: 'blob'                    // 设置响应类型为 blob,以便获取二进制数据
      })
        .then(response => {
          // 将响应的文件保存到本地
          const url = window.URL.createObjectURL(new Blob([response.data]));  //使用Blob对象将response的二进制文件封装起来
          const a = document.createElement('a');  
          a.href = url;                        //设置<a>元素的herf属性为临时的url
          a.download = '清洗过的数据.xlsx';     //设置<a>元素的download属性为文件名
          document.body.appendChild(a);        //将<a>元素添加到页面的body中
          a.click();                           //模拟用户点击<a>元素进行下载
          document.body.removeChild(a);        //从页面的body中移除<a>元素
          window.URL.revokeObjectURL(url);     //撤销临时URL，释放资源
          //弹窗显示成功
          this.$message.success('下载文件成功')
        })
        .catch(error => {
          console.error('Download error:',error);
          this.$message.error('下载文件失败')
        })
    }

暗雾飘扬

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
管理员文件上传flask后端数据清洗和处理(实验项目第八期)

（比如excel就有xsl和xslx版本，一个老一个新）使用flask的os包来存储文件到一个自定义的临时目录。建议在jupyter里运行代码，成功以后再添加到后端。后端收到了文件以后当然要检查后缀名是否符合要求。数据清洗和处理是训练机器模型的前置条件。（前端需要上传三张表）
复制链接

扫一扫

专栏目录