最近在做Python项目性能优化,发现通过Cython将Pyhton代码转为C语言再编译为可执行文件.so,可大幅提高运行速度。
以之前做过的列表拼接为例
import datetime
def main():
start_time = datetime.datetime.now()
test_list = []
for chunk in range(20000000):
test_list.extend([chunk])
end_time = datetime.datetime.now()
print(f'{(end_time - start_time).seconds}.{(end_time - start_time).microseconds}')
在未优化前,执行速度如下:
>>> python3 -c "from extend import main;main()"
2.196843
优化后:
>>> python3 -c "from extend import main;main()"
0.629144
速度提升约3倍
下面讲一下编译过程
首先准备setup.py文件
# setup.py
from distutils.core import setup
from Cython.Build import cythonize
setup(ext_modules=cythonize('extend.py'))
再执行编译
>>> python3 setup.py build_ext --inplace
Compiling extend.py because it changed.
[1/1] Cythonizing extend.py
/usr/local/lib/python3.8/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /Users/microfat/test2/extend.py
tree = Parsing.p_module(s, pxd, full_module_name)
running build_ext
building 'extend' extension
creating build
creating build/temp.macosx-11-x86_64-3.8
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include -I/usr/local/opt/openssl@1.1/include -I/usr/local/opt/sqlite/include -I/usr/local/opt/tcl-tk/include -I/usr/local/Cellar/python@3.8/3.8.10/Frameworks/Python.framework/Versions/3.8/include/python3.8 -c extend.c -o build/temp.macosx-11-x86_64-3.8/extend.o
extend.c:2998:5: warning: 'tp_print' is deprecated [-Wdeprecated-declarations]
0,
^
/usr/local/Cellar/python@3.8/3.8.10/Frameworks/Python.framework/Versions/3.8/include/python3.8/cpython/object.h:260:5: note: 'tp_print' has been explicitly marked deprecated here
Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
^
/usr/local/Cellar/python@3.8/3.8.10/Frameworks/Python.framework/Versions/3.8/include/python3.8/pyport.h:515:54: note: expanded from macro 'Py_DEPRECATED'
#define Py_DEPRECATED(VERSION_UNUSED) __attribute__((__deprecated__))
^
1 warning generated.
clang -bundle -undefined dynamic_lookup -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk build/temp.macosx-11-x86_64-3.8/extend.o -L/usr/local/lib -L/usr/local/opt/openssl@1.1/lib -L/usr/local/opt/sqlite/lib -L/usr/local/opt/tcl-tk/lib -o /Users/microfat/test2/extend.cpython-38-darwin.so
注意,这里的inplace参数的效果是在与源python文件同级的目录生成.so文件,如果没有inplace参数则会默认生成在同级目录下的build目录中。
最终目录结构如下
.
├── build
│ └── temp.macosx-11-x86_64-3.8
│ └── extend.o
├── extend.c
├── extend.cpython-38-darwin.so
├── extend.py
└── setup.py
2 directories, 5 files
介绍一下编译过程的细节
根据官方文档,首先会编译为.c文件,再将.c编译为.so文件
编译Python代码为.so文件的优势不仅仅是能提高运行速度,还能获得保护源码的好处。
未来将会尝试对Flask项目进行编译
2021-06-24更新
今天在win10上进行编译时出现错误:
Error compiling Cython file:
------------------------------------------------------------
...
device_info_detail['location_la'] = ''
device_info_detail['location_lo'] = ''
device_info_list.append(device_info_detail)
count+=1
print(f"\r{count}",end='',flush=True)
^
------------------------------------------------------------
spider.py:99:34: Expected ')', found '='
Traceback (most recent call last):
File "C:\Users\gaoxi\Desktop\AdBlue_dispenser\setup.py", line 3, in <module>
setup(ext_modules=cythonize('spider.py'))
File "C:\Users\gaoxi\AppData\Local\Programs\Python\Python39\lib\site-packages\Cython\Build\Dependencies.py", line 1102, in cythonize
cythonize_one(*args)
File "C:\Users\gaoxi\AppData\Local\Programs\Python\Python39\lib\site-packages\Cython\Build\Dependencies.py", line 1225, in cythonize_one
raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: spider.py
但这里并没有语法错误,而且在Mac上编译时没有报错
搜了一下,原来是因为cython是基于Python2的语法。要解决上面的问题有三种方式:
- compiler_directives
extensions = cythonize(
extensions,
compiler_directives={'language_level' : "3"}) # or "2" or "3str"
)
- language_level
extensions = cythonize(extensions, language_level = "3")
- 可以通过在Python文件开头加入注释:
# cython: language_level=3
参考:https://stackoverflow.com/a/53992016/7151777
参考:https://stackoverflow.com/a/35582289/7151777
今天在将代码中的模块编译后,再使用pyinstaller打包入口文件main.py,一切都显得那么正常,但在运行exe文件时出现找不到第三方库的问题。
思考了一下,应该是因为编译后,该模块的导入信息被隐藏了,pyinstaller无法获知,故解决办法也很简单
方法一:
在任意一个未编译python文件(如入口文件main.py)中导入第三方库
# main.py
import requests
import pandas
import bs4
...
方法二:
在打包命令中显示引入第三方库
>>> pyinstaller -F --hidden-import requests --hidden-import pandas --hidden-import bs4 main.py
2021-06-25更新
今天在进行编译时,出现如下报错,查了一下是因为__init__.py文件造成的干扰
creating build/lib.macosx-11-x86_64-3.8/models
clang -bundle -undefined dynamic_lookup -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk build/temp.macosx-11-x86_64-3.8/./swagger_server/models/current_activity_response.o -L/usr/local/lib -L/usr/local/opt/openssl@1.1/lib -L/usr/local/opt/sqlite/lib -L/usr/local/opt/tcl-tk/lib -o build/lib.macosx-11-x86_64-3.8/models/current_activity_response.cpython-38-darwin.so
copying build/lib.macosx-11-x86_64-3.8/models/current_activity_response.cpython-38-darwin.so -> models
error: could not create 'models/current_activity_response.cpython-38-darwin.so': No such file or directory
.
├── xxx
├── models
│ ├── __init__.py
│ ├── current_activity_response.py
│ ├── xxx
│ └── xxx
└── setup.py
# setup.py
from setuptools import setup
from Cython.Build import cythonize
setup(
ext_modules=cythonize('./models/phone_response.py', language_level = "3")
)
解决办法就是就是将__init__.py文件删除或者改名
参考:https://github.com/jmschrei/pomegranate/issues/382#issuecomment-613676285