【Python】读取最新Excel表存至MySQL数据库

最新推荐文章于 2022-11-11 10:26:03 发布

百分之50

最新推荐文章于 2022-11-11 10:26:03 发布

阅读量1.4k

点赞数 2

分类专栏： Python

本文链接：https://blog.csdn.net/bfz_50/article/details/102532084

版权

Python 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

实现效果：从指定目录下读取新增的点位表并存入MySQL数据库
阅读提示：请直接将程序复制到你的IDE下运行，并根据报错信息进行修改。

1、读取Excel

读取Excel文件有很多种方法，比较简单的可以用openpyxl模块

1.1、安装openpyxl模块

打开终端，输入

pip install openpyxl

即可安装，如果使用pycharm，可以进入Preferences(Win下应该是Setting)-Project:文件名.py-Project Interpreter，然后点击底部的+号进入Available Packages，在搜索框输入"openpyxl"，选中搜索到的项目，点击底部的Install Package即可。

1.2、导入模块与打开Excel

在该python程序目录下新建一个Excel，我们可以起名为“测试读取Excel.xlsx”，然后在Sheet1下的A1表格内任意输入一段字符，如123，之后在python程序中输入如下内容：

import openpyxl

# 打开excel文件,获取工作簿对象
wb1 = openpyxl.load_workbook('测试读取Excel.xlsx')

sheets = wb1.sheetnames
print('包含工作表如下：', sheets, type(sheets))
sheet1 = wb1['Sheet1']
print('sheet1坐标为（1，1）的内容如下：', sheet1.cell(1, 1).value)

输出：

包含工作表如下： ['Sheet1', 'Sheet2', 'Sheet3'] <class 'list'>
sheet1坐标为（1，1）的内容如下： 123

Process finished with exit code 0

可以看到打印了Excel表格的表单和sheet1中A1的内容。

2、定时读取新数据

需求上要求表格内容更新就要读取到并存入数据库，所以我们要判断文件的修改时间，每次只读取最新修改的Excel文件

2.1、获取最新的文件

# -*- coding:UTF-8 -*-
import datetime
import os
import os.path

path_name = "../1、读取Excel/"
file_list = os.listdir(path_name)


def get_file_time(file_name):
    if os.path.isdir(path_name + file_name):
        return 0
    else:
        return os.path.getmtime(path_name + file_name)


# 获取最后改动的文件名
file_list.sort(key=get_file_time)

# 获取文件时间
d = datetime.datetime.fromtimestamp(os.path.getmtime(path_name + file_list[-1]))

print('最后改动的文件是：' + file_list[-1] + "，时间：" + d.strftime("%Y年%m月%d日 %H时%M分%S秒"))

说明：

1、“…/1、读取Excel/”是指上一层目录下的《1、读取Excel》文件夹，由于我这个程序放在和Excel表格不同的文件夹下，所以通过相对目录去定位文件夹，如果你Excel文件和程序放在同一个目录下，用“./”即可，也可使用绝对路径。

2、os.listdir会获取该目录下的所有文件和文件夹，并返回一个列表

3、get_file_time函数的作用，首先判断该文件是否为文件夹，如果是文件夹则返回0，否则返回文件的修改时间

4、列表的sort方法用于排序列表，key参数用于指定排序的方式，要求传入函数。具体操作步骤为，首先将list里面每个数据都传入函数，得到一组返回值，然后将原列表按照新的返回值进行排序。举个例子

def f1(l1):
    return len(l1)


aList = ['Google', 'Firefox', '360', 'QQ']
aList.sort(key=f1)    # 按长度排序
print(aList)

输出：

['QQ', '360', 'Google', 'Firefox']

5、file_list[-1]是指按照文件修改时间排列后，列表的最后一个数据。

6、fromtimestamp将时间戳格式转化为字符串日期时间，如1570858824.155304转化为2019-10-12 13:40:24.155304，然后strftime将字符串时间转化为自定义排列的格式。

2.2、获取最新的Excel

通过len(file_list)得出文件列表的长度，然后从最后一个逐渐往前遍历，每个将文件名与“.xlsx”匹配，如果包含子串关系则不返回None，将该文件名传给new_file后结束循环，如果该文件不是xlsx格式，则继续寻找下一个最新文件。

import re
import os.path
import os
import datetime

path_name = "../1、读取Excel/"


def get_file_time(file_name):
    if os.path.isdir(path_name + file_name):
        return 0
    else:
        return os.path.getmtime(path_name + file_name)


def find_new_file():
    file_list = os.listdir(path_name)
    # 按照时间排序文件列表
    file_list.sort(key=get_file_time)

    new_file = file_list[-1]
    for i in range(len(file_list), 0, -1):
        if re.search(".xlsx", file_list[i - 1]) is not None:
            new_file = file_list[i - 1]
            break

    # 获取文件改动时间
    d = datetime.datetime.fromtimestamp(os.path.getmtime(path_name + new_file))
    print('最后改动的文件是：' + new_file + "，时间：" + d.strftime("%Y年%m月%d日 %H时%M分%S秒"))

    return path_name + new_file


find_new_file()

输出：

最后改动的文件是：测试读取Excel.xlsx，时间：2019年10月08日 16时19分34秒

Process finished with exit code 0

3、存至MySQL

批量存入MySQL，由于MySQL需要元组数据，所以我们需要先将Excel数据转为元组，然后以列表组织元组，批量导入。

3.1、Excel转化为元组列表

1、先定义一个元组列表，列表的长度按照Excel的行来定义，列表里存入元组，每个元组存放一行数据

2、一般第一行为表头，不需要读，所以列表长度为行数-1

3、从sheet第二行开始读，sheet.cell函数需要从1开始，假如sheet有100行，sheet1_max_row=100，下标为1-100，而list是从0开始，下标0-99，由于第一行不读，将所以需要sheet第2行放入list第0行，将sheet第100行放入list第98行，for循环是读不到最后一个数的，所以循环终止为sheet1_max_row + 1=101，当i1=100时读到最后一行，然后填入list98行，所以填入list时需要-2

4、元组内元素不可被修改，但可以添加元素，所以需要在内循环前先插入第一个元素，后续通过不断的添加补充后续数据

import openpyxl


def read_excel(file_name):
    # 打开excel文件,获取工作簿对象
    wb1 = openpyxl.load_workbook(file_name)
    # 从工作薄中获取表单(sheet)对象
    sheets = wb1.sheetnames
    print('该Excel文件包含工作表如下：', sheets, type(sheets))
    sheet1 = wb1.active
    return sheet1


# 将Excel数据转化为元组列表，核心算法之一
def excel_to_tuple_list(sheet1):

    sheet1_max_row = sheet1.max_row
    sheet1_max_column = sheet1.max_column

    print("最大行：{}".format(sheet1_max_row))
    print("最大列：{}".format(sheet1_max_column))

    # 定义一个元组列表，用来存放Excel数据
    # 一行就是一个元组，100行row就是100个元组，第一行不读，故-1
    # 一个元组有7个元素
    tuple_list_excel_data = [0 for i2 in range(sheet1_max_row-1)]

    # 首行表头忽略，从第二行开始读
    for i1 in range(2, sheet1_max_row + 1):

        # 将每行第一个数据放入元组
        tuple1 = (sheet1.cell(i1, 1).value,)

        # 从每行第二个数开始
        for j1 in range(2, sheet1_max_column + 1):
            # 之后每次添加一个元素，直到该行结束
            tuple2 = (sheet1.cell(i1, j1).value,)
            tuple1 = tuple1 + tuple2

        # 处理完毕后将该元组放入列表，列表从0开始
        tuple_list_excel_data[i1 - 2] = tuple1

    # 检查列表数据是否完善
    for tuple3 in tuple_list_excel_data:
        print(tuple3)

    return tuple_list_excel_data

# 文件名可以用find_new_file获取
excel_to_tuple_list(read_excel("./测试Excel.xlsx"))

执行上述代码即可实现读取Excel并转化为元组列表，并显示出来。

3.2、创建数据库

3.2.1、安装MySQL

以下仅针对MacOS，Windows请自行查找教程。

1、由于官网速度特别慢，建议找其他网站下载MySQL，比如

https://www.cr173.com/mac/430352.html

2、安装时Configuration有一步要选择Use Legacy Password Encryption，否则会跟低版本不兼容

3、用户名默认为root，同时配置密码

4、进入系统偏好设置，在最底下选择MySQL

5、点击Stars MySQL Server即可

6、使用Navicat for MySQL或Sequel Pro登录MySQL数据库，输入之前配置即可，host可输入127.0.01或localhost

3.2.2、创建数据库

1、首先需要安装mysql.connector，可以使用

pip install mysql-connector

安装，如果使用pycharm，也可以进入Preferences(Win下应该是Setting)-Project:文件名.py-Project Interpreter，然后点击底部的+号进入Available Packages，在搜索框输入"mysql.connector"，选中搜索到的项目，点击底部的Install Package即可。

2、使用以下代码即可创建数据库

import mysql.connector

db1 = mysql.connector.connect(
    host="localhost",
    user="root",
    passwd="12345678",
)

cursor1 = db1.cursor()


# 创建数据库名称有规范，不能用#和.等符号，可以用下划线
create_db_name = "test_db1"
cursor1.execute("CREATE DATABASE " + create_db_name)
# 重复创建会报错

运行成功后到Navicat刷新一下即可看到新创建的数据库

3.2.3、创建表单和主键

使用以下代码即可创建表单和主键，表单名称不可纯数字。

通过connect方法连接数据库，通过cursor方法获取对数据库的操作对象，然后通过execute方法执行相关语句。

import mysql.connector

db1 = mysql.connector.connect(
    host="localhost",
    user="root",
    passwd="12345678",
    database="test_db1"

)
cursor1 = db1.cursor()

'''
创建表的时候我们一般都会设置一个主键（PRIMARY KEY），主键内容不可重复，否则报错
'''
# 不可以纯数字
table_name = "table_name1"
create_table = ('CREATE TABLE ' + table_name +
                ' (' +
                'data varchar(255) PRIMARY KEY, ' +
                'time varchar(255), ' +
                'order_number varchar(255), ' +
                'start_time varchar(255), ' +
                'stop_time varchar(255), ' +
                'time_quantum varchar(255), ' +
                'type varchar(255)' +
                ')')

cursor1.execute(create_table)
cursor1.execute("SHOW TABLES")

for x in cursor1:
    print(x)

创建完毕后刷新，即可在Navicat看到这个数据库下新增的表单和7个字段。

3.3、插入数据

使用以下代码可以实现将Excel文件存至MySQL。

批量插入数据使用executemany方法，注意该方法的第一个参数指定插入的格式，第二个接收数据，只能接收元组格式，所以我在一开始将数据转化为了元组列表以方便批量插入。

使用IGNORE关键字可以忽略主键重复的数据，避免主键重复插入报错。

import openpyxl
import mysql.connector


def read_excel(file_name):
    # 打开excel文件,获取工作簿对象
    wb1 = openpyxl.load_workbook(file_name)
    # 从工作薄中获取表单(sheet)对象
    sheets = wb1.sheetnames
    print('该Excel文件包含工作表如下：', sheets, type(sheets))
    sheet1 = wb1.active
    return sheet1


# 将Excel数据转化为元组列表，核心算法之一
def excel_to_tuple_list(sheet1):

    sheet1_max_row = sheet1.max_row
    sheet1_max_column = sheet1.max_column

    print("最大行：{}".format(sheet1_max_row))
    print("最大列：{}".format(sheet1_max_column))

    # 定义一个元组列表，用来存放Excel数据
    # 一行就是一个元组，100行row就是100个元组，第一行不读，故-1
    # 一个元组有7个元素
    tuple_list_excel_data = [0 for i2 in range(sheet1_max_row-1)]

    # 首行表头忽略，从第二行开始读
    for i1 in range(2, sheet1_max_row + 1):

        # 将每行第一个数据放入元组
        tuple1 = (sheet1.cell(i1, 1).value,)

        # 从每行第二个数开始
        for j1 in range(2, sheet1_max_column + 1):
            # 之后每次添加一个元素，直到该行结束
            tuple2 = (sheet1.cell(i1, j1).value,)
            tuple1 = tuple1 + tuple2

        # 处理完毕后将该元组放入列表，列表从0开始
        tuple_list_excel_data[i1 - 2] = tuple1

    # 检查列表数据是否完善
    for tuple3 in tuple_list_excel_data:
        print(tuple3)

    return tuple_list_excel_data


def send_mysql(tuple_list):
    # 连接到MySQL
    # 这里要修改成自己的MySQL地址
    db1 = mysql.connector.connect(
        host="localhost",
        user="root",
        passwd="12345678",
        database="test_db1"
    )
    cursor1 = db1.cursor()
    cursor1.execute("SHOW TABLES")
    for x in cursor1:
        print('将插入MySQL表：', x)

    sql = ("INSERT IGNORE INTO 204_207 " +
           "(data, time, order_number, start_time, stop_time, time_quantum, type) " +
           "VALUES (%s, %s, %s, %s, %s, %s, %s)")

    val = tuple_list

    cursor1.executemany(sql, val)

    db1.commit()  # 数据表内容有更新，必须使用到该语句
    print(cursor1.rowcount, "记录插入成功。\n")

# 文件名可以用find_new_file获取
send_mysql(excel_to_tuple_list(read_excel("./测试Excel.xlsx")))

4、定时读取与按钮中止

后面部分比较简单，所以将两部分合并在一起，以下是完整代码。

1、定义一个类A用于跑线程，线程中使用true循环实现不断读取，当检测到i1为0时结束循环

2、使用tkinter模块创建一个窗口，定义该窗口的长宽和坐标，以及对应的文字和按钮

3、不同的按钮对应不同的功能，分别对应启动读取和停止读取

# -*- coding:UTF-8 -*-
import datetime
import os
import os.path
import openpyxl
import mysql.connector
import re
import time
import tkinter
import threading

# 此处需要改成点位表所在路径，可使用绝对路径
path_name = "./Excel文件目录/"
# 全局变量用于按钮操作停止线程读取
i1 = 1


def get_file_time(file_name):
    if os.path.isdir(path_name + file_name):
        return 0
    else:
        return os.path.getmtime(path_name + file_name)


def find_new_file():
    file_list = os.listdir(path_name)
    # 按照时间排序文件列表
    file_list.sort(key=get_file_time)

    new_file = file_list[-1]
    for i in range(len(file_list), 0, -1):
        if re.search(".xlsx", file_list[i - 1]) is not None:
            new_file = file_list[i - 1]
            break

    # 获取文件改动时间
    d = datetime.datetime.fromtimestamp(os.path.getmtime(path_name + new_file))
    print('最后改动的文件是：' + new_file + "，时间：" + d.strftime("%Y年%m月%d日 %H时%M分%S秒"))

    return path_name + new_file


def read_excel(file_name):
    # 打开excel文件,获取工作簿对象
    wb1 = openpyxl.load_workbook(file_name)
    # 从工作薄中获取表单(sheet)对象
    sheets = wb1.sheetnames
    print('该Excel文件包含工作表如下：', sheets, type(sheets))
    sheet1 = wb1.active
    return sheet1


# 将Excel数据转化为元组列表，核心算法之一
def excel_to_tuple_list(sheet1):

    sheet1_max_row = sheet1.max_row
    sheet1_max_column = sheet1.max_column

    print("最大行：{}".format(sheet1_max_row))
    print("最大列：{}".format(sheet1_max_column))

    # 定义一个元组列表，用来存放Excel数据
    # 一行就是一个元组，100行row就是100个元组，第一行不读，故-1
    # 一个元组有7个元素
    tuple_list_excel_data = [0 for i2 in range(sheet1_max_row-1)]

    # 首行表头忽略，从第二行开始读
    for i1 in range(2, sheet1_max_row + 1):

        # 将每行第一个数据放入元组
        tuple1 = (sheet1.cell(i1, 1).value,)

        # 从每行第二个数开始
        for j1 in range(2, sheet1_max_column + 1):
            # 之后每次添加一个元素，直到该行结束
            tuple2 = (sheet1.cell(i1, j1).value,)
            tuple1 = tuple1 + tuple2

        # 处理完毕后将该元组放入列表，列表从0开始
        tuple_list_excel_data[i1 - 2] = tuple1

    # 检查列表数据是否完善
    # for tuple3 in tuple_list_excel_data:
    #     print(tuple3)

    return tuple_list_excel_data


def send_mysql(tuple_list):
    # 连接到MySQL
    db1 = mysql.connector.connect(
        host="localhost",
        user="root",
        passwd="12345678",
        database='bm'
    )
    cursor1 = db1.cursor()
    cursor1.execute("SHOW TABLES")
    for x in cursor1:
        print('将插入MySQL表：', x)

    sql = ("INSERT IGNORE INTO 204_207 " +
           "(data, time, order_number, start_time, stop_time, time_quantum, type) " +
           "VALUES (%s, %s, %s, %s, %s, %s, %s)")

    val = tuple_list

    cursor1.executemany(sql, val)

    db1.commit()  # 数据表内容有更新，必须使用到该语句
    print(cursor1.rowcount, "记录插入成功。\n")


class A(threading.Thread):
    def __init__(self):
        threading.Thread.__init__(self)

    def run(self):
        global i1
        i1 = 1
        while True:
            send_mysql(excel_to_tuple_list(read_excel(find_new_file())))
            if i1 == 0:
                break
            time.sleep(950)


def start():
    label['text'] = '读取中。。。'
    a = A()
    a.start()


def stop():
    label['text'] = '停止读取'
    global i1
    i1 = 0


window = tkinter.Tk(className='读取Excel写入mysql')
# "窗口宽x窗口高+窗口位于屏幕x轴+窗口位于屏幕y轴"
window.geometry('300x100+400+300')

label = tkinter.Label(window)
label['text'] = '点击启动开始读取目录下最新Excel文件'
label.pack()

button = tkinter.Button(window)
button['text'] = '启动'

button['command'] = start
button.pack()

button2 = tkinter.Button(window)
button2['text'] = '停止'
button2['command'] = stop
button2.pack()

window.mainloop()

百分之50

关注

2
点赞
踩
14

收藏

觉得还不错? 一键收藏
0
评论
【Python】读取最新Excel表存至MySQL数据库

实现效果：从指定目录下读取新增的点位表并存入MySQL数据库1、读取Excel读取Excel文件有很多种方法，比较简单的可以用openpyxl模块1.1、安装openpyxl模块打开终端，输入pip install openpyxl即可安装，如果使用pycharm，可以进入Preferences(Win下应该是Setting)-Project:文件名.py-Project Inter...
复制链接

扫一扫

专栏目录