背景
在Linux上部署的时候, 使用pandas
模块读取Excel, 然后pymysql
模块入库, 结果发生了错误
Traceback (most recent call last):
File "/usr/local/lib64/python3.6/site-packages/tornado/web.py", line 1702, in _execute
result = method(*self.path_args, **self.path_kwargs)
File "/root/.jenkins/workspace/git_python_web_linux/handler.py", line 1135, in post
self._replyPrivateMessage(message)
File "/root/.jenkins/workspace/git_python_web_linux/handler.py", line 1221, in _replyPrivateMessage
self._junming(filepath)
File "/root/.jenkins/workspace/git_python_web_linux/handler.py", line 1263, in _junming
sdk.mysql(config.db_qihua).insmany('qihua.log_bjmantis_cost',data_db)
File "/root/.jenkins/workspace/git_python_web_linux/sdk.py", line 65, in insmany
self._cur.executemany(sqlInsert,listTuple)
File "/usr/local/lib64/python3.6/site-packages/pymysql/cursors.py", line 182, in executemany
self.rowcount = sum(self.execute(query, arg) for arg in args)
File "/usr/local/lib64/python3.6/site-packages/pymysql/cursors.py", line 182, in <genexpr>
self.rowcount = sum(self.execute(query, arg) for arg in args)
File "/usr/local/lib64/python3.6/site-packages/pymysql/cursors.py", line 146, in execute
query = self.mogrify(query, args)
File "/usr/local/lib64/python3.6/site-packages/pymysql/cursors.py", line 125, in mogrify
query = query % self._escape_args(args, conn)
TypeError: not all arguments converted during string formatting
尝试1: pymysql模块的escape_string方法
from pymysql.converters import escape_string
...
data_database = [escape_string(x) for x in data]
...
还是报错, 所以应该不是字符的问题.
打印出来肉眼观察, 也没有字符问题
尝试2: 修改pandas.read_excel引擎
修改引擎, 是可以解决字符问题, 但出现了新的问题
...
excel = pandas.read_excel(filepath, engine='openpyxl')
...
正常入库. 但是, 无法读取xls
文件
Traceback (most recent call last):
File "/usr/local/lib64/python3.6/site-packages/tornado/web.py", line 1702, in _execute
result = method(*self.path_args, **self.path_kwargs)
File "/root/.jenkins/workspace/git_python_web_linux/handler.py", line 1135, in post
self._replyPrivateMessage(message)
File "/root/.jenkins/workspace/git_python_web_linux/handler.py", line 1224, in _replyPrivateMessage
self._qingzhilan(filepath)
File "/root/.jenkins/workspace/git_python_web_linux/handler.py", line 1296, in _qingzhilan
df = pandas.read_excel(filepath,sheet_name=['Sheet1'], engine='openpyxl')
File "/usr/local/lib64/python3.6/site-packages/pandas/util/_decorators.py", line 296, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib64/python3.6/site-packages/pandas/io/excel/_base.py", line 304, in read_excel
io = ExcelFile(io, engine=engine)
File "/usr/local/lib64/python3.6/site-packages/pandas/io/excel/_base.py", line 867, in __init__
self._reader = self._engines[engine](self._io)
File "/usr/local/lib64/python3.6/site-packages/pandas/io/excel/_openpyxl.py", line 480, in __init__
super().__init__(filepath_or_buffer)
File "/usr/local/lib64/python3.6/site-packages/pandas/io/excel/_base.py", line 353, in __init__
self.book = self.load_workbook(filepath_or_buffer)
File "/usr/local/lib64/python3.6/site-packages/pandas/io/excel/_openpyxl.py", line 492, in load_workbook
filepath_or_buffer, read_only=True, data_only=True, keep_links=False
File "/usr/local/lib/python3.6/site-packages/openpyxl/reader/excel.py", line 345, in load_workbook
data_only, keep_links, rich_text)
File "/usr/local/lib/python3.6/site-packages/openpyxl/reader/excel.py", line 123, in __init__
self.archive = _validate_archive(fn)
File "/usr/local/lib/python3.6/site-packages/openpyxl/reader/excel.py", line 93, in _validate_archive
raise InvalidFileException(msg)
openpyxl.utils.exceptions.InvalidFileException: openpyxl does not support the old .xls file format, please use xlrd to read this file, or convert it to the more recent .xlsx file format.
尝试3: 回退xlrd版本
注: Linux平台上默认应该用pip3
替代pip
pip list | grep xlrd # 查看现有xlrd版本
# pip list | findstr 'xlrd' ## Windows平台查包
pip uninstall xlrd # 卸载包
pip install xlrd==1.2.0 # 安装包
回退到1.2.0版本后,我之前遇到的错误消失了。我的代码再次可以正常运行,我能够继续我的数据分析工作。
总结
Linux平台使用pandas.read_excel
并且可能会有xls
和xlsx
格式的话, 建议回退xlrd
版本到1.2.0来解决字符和格式问题