用python读取Excel数据，并插入到MySQL数据库

最新推荐文章于 2024-07-04 04:10:13 发布

chen072086

最新推荐文章于 2024-07-04 04:10:13 发布

阅读量2.2k

点赞数 1

文章标签：数据库

版权声明：未经本人允许请勿转载 https://blog.csdn.net/weixin_43179111/article/details/82745390

文章目录

一、业务场景

最近在工作中遇到这样一个问题：为了系统功能权限设置，收集了若干Excel文件，表中数据样式如下：
数据已经过处理
其中标三角号的表示需要权限，无权限则无任何数据。
目前初级需求是将姓名，员工号，开通权限的报表编号整理出来，并插入MySQL数据库，插入效果如下图：

二、用到的包

1.xlrd

python中用于Excel文件读取的包，在线文档地址：https://xlrd.readthedocs.io/en/latest/

主要方法：

    file = xlrd.open_workbook(file_path) #打开Excel文件
    sheet_1 = file.sheet_by_index(0) #根据sheet页的排序选取sheet
    row_content = sheet_1.row_values(3) #获取指定行的数据，返回列表，排序自0开始
    row_number = sheet_1.nrows #获取有数据的最大行数
    col_number = sheet_1.ncols #获取有数据的最大列数

2.pymysql

1官方网址：https://pypi.org/project/PyMySQL/

主要方法：

db = pymysql.connect('localhost','username','password','dbname') #建立数据库连接
cusor = db.cursor() # 使用 cursor() 方法创建一个游标对象 cursor

# SQL 插入语句
sql = "INSERT INTO EMPLOYEE(FIRST_NAME, \
       LAST_NAME, AGE, SEX, INCOME) \
       VALUES ('%s', '%s', '%d', '%c', '%d' )" % \
       ('Mac', 'Mohan', 20, 'M', 2000)
try:
   # 执行sql语句
   cursor.execute(sql)
 # 提交到数据库执行
   db.commit()
except:
   # 发生错误时回滚
   db.rollback()
# 关闭数据库连接
db.close()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

三、Excel数据格式化思路

1.数据结构选择

最开始我是没想一步到位，直接整理好然后插数据库的。我是想能不能把Excel里的数据格式化下来，存入文件中。最理想的存储结构就是字典了，把每一行数据都看作是一个字典，行表头作为键，数据作为值。每个报表有权限记为1，无权限记为0。整理出来的结果大概是这样子：

row1 = {'name':'张三','id':'3333','apart':'震惊部','office':'震惊科','A':0,'B':1,'C':1,'D':1,'E':1,'F':1,'G':0,'H':0}

具体代码实现如下：

def parse(file_path):
    file = xlrd.open_workbook(file_path)
    sheet_1 = file.sheet_by_index(0)
    report_name = sheet_1.row_values(2) #获取报表名称行数据
    row_num = sheet_1.nrows #获取行数
    report_num = sheet_1.ncols #获取列数
    for i in range(3,row_num): #循环每一行数据
        row = sheet_1.row_values(i) #获取行数据
        dict = {}
        dict['name']= "".join(row[0].split()) #姓名
        sap_id = "".join(str(row[1]).split())
        dict['id'] = sap_id.split('.')[0] #编号
        dict['partment'] = "".join(row[2].split()) #部门
        dict['office'] = "".join(row[3].split()) #科室
        for j in range(4,report_num):
            if row[j] is not '': #如果行内没有数据，则对应报表名称无权限，设为0，否则为1
                dict[report_name[j]] = 1
            else:
                dict[report_name[j]] = 0
        print(dict)

2.数据存储

本例用的存储方案是使用json.dumps()的格式化存储，将字典转化为字符串存储，使用时再转换回来，实现代码如下：

def write_line(dict):
    line = '{}\n'.format(json.dumps(dict))
    f.write(line)

3.多个文件的情况

如果提报上来的文档有多个，手工输入文件名也是一项繁琐的工作，其实在我的这项工作中就有20个文件左右。所以又使用os包做了一个文档遍历的函数，具体代码如下：

def listdir(path): #传入根目录
    file_list = []
    for file in os.listdir(path):
        file_path = os.path.join(path, file) #获取绝对路径
        if os.path.isdir(file_path): #如果还是文件夹，就继续迭代本函数
            listdir(file_path)
        elif os.path.splitext(file_path)[1] == '.xls' or os.path.splitext(file_path)[1] == '.xlsx': #判断文件是否是Excel文件
            file_list.append(file_path)
    return file_list #返回Excel文件路径列表

4.主程序

将上述几个函数结合一下，主程序就可以有了：

if __name__ == '__main__':
    path = r'C:\Users\Administrator\Desktop\权限提报汇总表-v1.0'
    file_list = listdir(path)
    f = open('portal.txt','w',encoding='utf-8')
    #print(file_list)
    for file_name in file_list:
        print('start translating',file_name)
        parse(file_name)
        print('translate complete',file_name)
    f.close()

四、MySQL数据插入思路

1.格式化文本解析

首先再使用json.loads()方法，将字符串解析回来

def get_data(file_name):
    with open(file_name,'r') as f: #读取文件
        content = f.read()
        list = content.split('\n') #以换行符分割，每一个字典作为列表中的一项
        dict_list = []
        for item in list:
            dict_list.append(json.loads(item)) #循环恢复字典结构
        return dict_list

2.反向查找值为1的报表名

这里用到了字典的反向查找，查找值为1的报表名，并将用户姓名和ID一块写入元组。由于一个用户不止一个报表权限，所以发现一个写一个。

    data_list = []
    for k in dict:
        if dict[k] == 1:
            data_list.append((dict['name'],dict['id'],k))

这样我们就可以得到最终要插入数据库的结构了。

3.插入数据库

data_list已经是最终的结构化数据了，而且会有不止一条数据，所以我们用了executemany()这个方法，用于批量执行SQL语句。

def insert_db(data):
    db = pymysql.connect('localhost','username','password','dbname')
    cusor = db.cursor()
    sql = """INSERT INTO `USER_INFO` (`USER_NAME`,`SAP_ID`,`MODEL_NAME`) VALUES (%s,%s,%s)
    """
    try:
        cusor.executemany(sql,data) #sql执行
        db.commit() #提交到数据库
    except Exception as e: #获取报错信息
        print(e)
    db.close()

4.大功告成

五、注意事项&踩坑

解析Excel出来后，有时数据会出现’/xa0’这个字符，百度以后发现是不间断空白符，解决方案：分割再组合dict['partment'] = "".join(row[2].split()) #部门
插入数据库时，有一个报错，提示信息：pymysql.err.ProgrammingError: (1064, “You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘1.00748912842339’’, ‘‘0’’, ‘‘0’’, ‘‘16.114739990234’’, ‘‘0.00759455235674977’’, ’ at line 1”)（原报错信息找不到了，搜的我的一个回帖，帖子链接），查了半天，发现是在变量代换的时候，如果本身是字符串，就不用再给%s加引号了，然后就好了。