python 数据库的插入问题（速度，分隔符，编码的处理）

最新推荐文章于 2022-03-24 17:00:20 发布

maybe_fate

最新推荐文章于 2022-03-24 17:00:20 发布

阅读量567

点赞数

分类专栏： python database 文章标签： python mysql encode replace/sub

本文链接：https://blog.csdn.net/maybe_fate/article/details/79920443

版权

python 同时被 2 个专栏收录

5 篇文章 0 订阅

订阅专栏

database

2 篇文章 0 订阅

订阅专栏

1.今天尝试从csv文件中（或是其它文本文件中）读取数据让后插入MySQL数据库中。我本来是想读一条然后插入一条的，结果200万的数据插了9个小时竟然直插入了70万左右，一想不行便使用先读出10000条数据将其存入list中，然后采用：

insert into table values(row1, row2, ...)

的方法进行插入处理，竟然90秒就完成了200万条数据的插入工作。

附上代码：

import csv
import time
from DateBase.connect_DB import connect_db

def insert_user_infor():
    db = connect_db()
    cursor = db.cursor()
    file_path = './ceshi.csv'
    with open(file_path, encoding='utf-8') as file_read:
        lines = csv.reader(file_read)
        for line in lines:
            name = line[1]
            gender = line[2]
            location = line[3]
            uid = line[4]

            sql = "insert into user_infor_copy values('%s', '%s', '%s', '%s')" % (name, gender, location, uid)
            try:
                cursor.execute(sql)
                db.commit()
            except:
                db.rollback()
def main():
    time1 = time.time()
    print(time1)
    insert_user_infor()
    time2 = time.time()
    print(time2)
    print(time2 - time1)

if __name__ == '__main__':
    main()

'''这段代码就是用来从csv文件中读取数据，然后插入到数据库中'''

import csv
import time
from DateBase.connect_DB import connect_db

def read_user_infor():
    db = connect_db()
    # cursor = db.cursor()

    file_path = './weibo_users.csv'
    # file_path = './ceshi.csv'
    with open(file_path, encoding='utf-8') as file_read:
        lines = csv.reader(file_read)
        line_count = 0 # 计数用来实现多行插入
        user_infor_list = []
        for line in lines:
            # false,小神万里,m,湖北 武汉,44528425,农民
            name = line[1]
            gender = line[2]
            location = line[3]
            if '其他' in location:
                continue
            uid = line[4]

            line_count += 1
            if line_count % 10000 != 0:
                print(line_count)
                user_infor_list.append([name, gender, location, uid])
            else:
                user_infor_list.append([name, gender, location, uid])
                # 进行插入处理
                insert_user_infor(db, user_infor_list)
                user_infor_list = [] # 重新为空

        insert_user_infor(db, user_infor_list)

def insert_user_infor(db, user_infor_list):
    cursor = db.cursor()

    sql = "insert into user_infor_copy VALUES"
    for data in user_infor_list:
        sql += "('%s', '%s', '%s', '%s')" % (data[0], data[1], data[2], data[3])
        if data != user_infor_list[-1]:
            sql += ","
            # sql = "insert into user_infor values('%s', '%s', '%s', '%s')" % (name, gender, location, uid)
    try:
        cursor.execute(sql)
        db.commit()
    except:
        db.rollback()

def main():
    time1 = time.time()
    print(time1)
    read_user_infor()
    time2 = time.time()
    print(time2)
    print(time2 - time1)

if __name__ == '__main__':
    main()

2.先说说编码问题

在数据库中插入中文文本（比如微博正文），经常会出现一些特殊符号。如果采用utf-8编码就会报以下的错误：

Incorrect string value: '\xF0\x9F\x98\xAD",...' for column 'commentContent' at row 1

后来参考大神的文章《彻底解决：Incorrect string value: '\xF0\x9F\x98\xAD",...' for column 'commentContent' at row 1》点击打开链接，终于找到了解决方法：

1）修改mysql数据库的编码为uft8mb4：这个在navicat中很好修改

2）修改python连接数据库的代码：（即修改编码设置）

db = pymysql.connect(
        host='localhost',
        user='root',
        db='chen', # 这个需要注意
        password='*******',
        charset='utf8mb4'
    )

这样就可以插入一些“非法”字符串了。

3.分割符

关于分割符的问题，在我的之前的文章中就有说明分隔符的天坑。而今天我遇到的问题不在于此，我先写一段代码：

sql = "insert into table values('%s', '%s')" % (name, content)
cursor = db.cursor()
cursor.execute(sql)
db.commit()

这段代码看似没什么问题，但我们要记住cursor.execute(sql)中的sql实际上是一个字符串，作为元素之间的分割符有“,”，但是其实还有“'”即英文单引号，但如果我们传进去的变量本身就含有英文单引号，这就会出现一个重大错误。因为程序本身传入的是一个字符串，它并不能识别分隔符。所以在这里我们需要注意，传入的参数中的英文单引号与双引号要先进性一些处理。

但这里我有发现了一些问题，在我处理单双引号的时候我发现这样的写法：

import re

pattern = re.compile(r'\'')
content = re.sub(pattern, '', content)

并不能有效的删除英文单引号，原因这我也不清楚，这就导致我不能使用一段代码来同时清除英文单双引号，而只能分开处理：

def clear_txt(weibo_txt):
    weibo_txt = weibo_txt.strip()
    weibo_txt = weibo_txt.replace(' ', '')
    weibo_txt = weibo_txt.replace('\t', '')
    weibo_txt = weibo_txt.replace('\n', '')

    pattern = re.compile(r'[#*]|[.*]|[_*]|[哈*]|[%*]|[&*]|[◆*]|["]|[★*]]') # 处理了双引号
    characters = pattern.findall(weibo_txt, re.I)
    for charater in characters:
        weibo_txt = weibo_txt.replace(charater, '', re.I)

    pattern = re.compile(r"[“”【】'A-Za-z]") # 处理了单引号
    weibo_txt = re.sub(pattern, '', weibo_txt)

    return weibo_txt

4.以上这些问题居然困扰了我一天我也是服气了。至此，希望自己也有点长进吧。

需要完整代码的请在下面说一声。。。

maybe_fate

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python 数据库的插入问题（速度，分隔符，编码的处理）

1.今天尝试从csv文件中（或是其它文本文件中）读取数据让后插入MySQL数据库中。我本来是想读一条然后插入一条的，结果200万的数据插了9个小时竟然直插入了70万左右，一想不行便使用先读出10000条数据将其存入list中，然后采用：insert into table values(row1, row2, ...)的方法进行插入处理，竟然90秒就完成了200万条数据的插入工作。附上代码：impo...
复制链接

扫一扫