博主转译文章,原文见:
http://jakevdp.github.io/blog/2016/08/25/conda-myths-and-misconceptions/
I've spent much of the last decade using Python for my research, teaching Python tools to other scientists and developers, and developing Python tools for efficient data manipulation, scientific and statistical computation, and visualization. The Python-for-data landscape has changed immensely since I first installed NumPy and SciPy via a flickering CRT display. Among the new developments since those early days, the one with perhaps the broadest impact on my daily work has been the introduction ofconda, the open-source cross-platform package manager first released in 2012.
In the four years since its initial release, many words have been spilt introducing conda and espousing its merits, but one thing I have consistently noticed is the number of misconceptions that seem to remain in the (often fervent) discussions surrounding this tool. I hope in this post to do a small part in putting these myths and misconceptions to rest.
I've tried to be as succinct as I can, but if you want to skim this article and get the gist of the discussion, you can read each heading along with the the bold summary just below it.
我在过去十年一直在使用Python做研究,教其他学者和开发人员使用Python工具,为高效的数据操作、科学与统计计算、以及可视化开发Python工具。自从我借助闪烁的CRT显示器第一次安装Numpy和Scipy以来,使用Python研究数据的景象已经发生巨大的变化。从早先的时期以来,对我日常工作影响也许最广泛的一项进步就是conda的引入,一个从2012年首次发布的开源跨平台的包管理器。
自从起始发布的四年来,人们已经费了很多口舌来介绍conda以及弘扬其价值,但我注意到在围绕其(通常热烈)的讨论中始终存在种种误解。我希望在本帖中平息一小部分其谬见于误解。
我尽可能地简洁,但如果你想略读本文并获得讨论的主旨,你可以阅读每个标题和其下的粗体结语。
Myth #1: Conda is a distribution, not a package managerwheels
Reality: Conda is a package manager; Anaconda is a distribution. Although Conda is packaged with Anaconda, the two are distinct entities with distinct goals.
A software distribution is a pre-built and pre-configured collection of packages that can be installed and used on a system. Apackage manager is a tool that automates the process of installing, updating, and removing packages. Conda, with its "conda install
", "conda update
", and "conda remove
" sub-commands, falls squarely under the second definition: it is a package manager.
Perhaps the confusion here comes from the fact that Conda is tightly coupled to two software distributions:Anaconda andMiniconda. Anaconda is a full distribution of the central software in the PyData ecosystem, and includes Python itself along with binaries for several hundred third-party open-source projects. Miniconda is essentially an installer for an empty conda environment, containing only Conda and its dependencies, so that you can install what you need from scratch.
But make no mistake: Conda is as distinct from Anaconda/Miniconda as is Python itself, and (if you wish) can be installed without ever touching Anaconda/Miniconda. For more on each of these, see the conda FAQ.
谬见#1:Conda是一个发行版,而非一个软件包管理器
事实:Conda是一个包管理器,Anaconda是一个发行版。虽然Conda打包在Anaconda内,但两者是用于不同目标的不同事物。
软件发行版是预构建和预配置的包的集合,可以在系统上安装和使用。包管理器是用于自动化安装、更新和删除包的处理工具。Conda,连同其“conda install”,“conda update”和“conda remove”子命令,完全属于第二个定义:它是一个包管理器。
也许这里的混淆由来于Conda紧密结合到了两个软件发行版:Anaconda和Miniconda。 Anaconda是包含了PyData生态系统的核心软件的完整发行版,包括Python本身以及几百个第三方开源项目的二进制文件。 Miniconda实质上是一个空的conda环境,只包含Conda和其依赖项,以使你可以从头按需安装。
但不要弄错:Conda相对于Anaconda/Miniconda是独立的,就如Python本身,(如果你愿意)可以脱离Anaconda/Miniconda安装。
Myth #2: Conda is a Python package manager
Reality: Conda is a general-purpose package management system, designed to build and manage software of any type from any language. As such, it also works well with Python packages.
Because conda arose from within the Python (more specifically PyData) community, many mistakenly assume that it is fundamentally a Python package manager. This is not the case: conda is designed to manage packages and dependencies withinany software stack. In this sense, it's less like pip, and more like a cross-platform version of apt or yum.
If you use conda, you are already probably taking advantage of many non-Python packages; the following command will list the ones in your environment:
$ conda search --canonical | grep -v 'py\d\d'
On my system, there are 350 results: these are packages within my Conda/Python environment thatare fundamentally unmanageable by Python-only tools like pip & virtualenv.
谬见#2:Conda是一个Python包管理器
事实:Conda是一种通用的包管理系统,用于构建和管理来自任何语言的任何类型的软件。因此,它也适用于Python包。
因为conda发起于Python(更具体地说是PyData)社区,许多人错误地认为它基本上是一个Python包管理器。情况并非如此:conda用于管理任何软件库中的包和依赖项。在这个意义上,它不像pip,更像是apt或yum的跨平台版本。
如果你使用conda,你可能已经使用了很多非Python包。以下命令可列出当前环境中的包:
$ conda search --canonical | grep -v 'py\d\d'
在我的系统上,有350个结果:这些是我的Conda/Python环境中的包,此环境基本上是只考虑Python的工具如pip和virtualenv无法管理的。
Myth #3: Conda and pip are direct competitors
Reality: Conda and pip serve different purposes, and only directly compete in a small subset of tasks: namely installing Python packages in isolated environments.
Pip, which stands forPipInstalls Packages, is Python's officially-sanctioned package manager, and is most commonly used to install packages published on thePython Package Index (PyPI). Both pip and PyPI are governed and supported by thePython Packaging Authority (PyPA).
In short, pip is a general-purpose manager for Python packages; conda is a language-agnostic cross-platform environment manager. For the user, the most salient distinction is probably this: pip installspython packages withinany environment; conda installs any package withinconda environments. If all you are doing is installingPython packages within anisolated environment, conda and pip+virtualenv are mostly interchangeable, modulo some difference in dependency handling and package availability. By isolated environment I mean a conda-env or virtualenv, in which you can install packages without modifying your system Python installation.
Even setting aside Myth #2, if we focus on just installation of Python packages, conda and pip serve different audiences and different purposes. If you want to, say, manage Python packages within an existing system Python installation, conda can't help you: by design, it can only install packages within conda environments. If you want to, say, work with the many Python packages which rely on external dependencies (NumPy, SciPy, and Matplotlib are common examples), while tracking those dependencies in a meaningful way, pip can't help you: by design, it manages Python packages and only Python packages.
Conda and pip are not competitors, but rather tools focused on different groups of users and patterns of use.
谬见#3:Conda和pip是直接竞争对手
事实:Conda和pip服务于不同的目的,只在一个小的任务子集中直接竞争:在隔离的环境中安装Python包。
Pip表示 Pip Installs Packages,是Python的官方认可的包管理器,常用于安装在Python包索引(PyPI)上发布的包。pip和PyPI都由Python包装管理局(PyPA)管理和支持。
简言之,pip是一个Python包的通用管理器;conda是一个与语言无关的跨平台环境管理器。对于用户,最显著的区别可能是:pip可在任何环境中安装python包;conda可在conda环境中安装任何包。如果你要做的是在隔离的环境中安装Python包,conda和pip+virtualenv大多是可互换的,除了依赖处理和包可用性存在一些差异。隔离的环境指的是conda环境或virtualenv环境,你可以在其中安装软件包,而不改动系统中的Python安装。
即使不管谬见#2,如果我们只关注安装Python包,conda和pip仍服务不同的受众和不同的目的。如果说你想在现有的系统的Python安装中管理Python包,conda帮不上你:它只能在conda环境中安装包。如果说你想,使用基于外部依赖项的许多Python包(常见的例如NumPy,SciPy和Matplotlib),同时以一种有效追踪这些依赖项,pip就帮不上你了:它只管理Python包。
Conda和pip不是竞争对手,而是关注不同用户群和不同使用模式的工具。
According to the Zen of Python, when doing anything in Python "There should be one – and preferably only one – obvious way to do it." So why would the creators ofconda
muddy the field by introducing a new way to install Python packages? Why didn't they contribute back to the Python community and improvepip
to overcome its deficiencies?
As it turns out, that is exactly what they did. Prior to 2012, the developers of the PyData/SciPy ecosystem went to great lengths to work within the constraints of the package management solutions developed by the Python community. As far back as 2001, the NumPy project forked distutils in an attempt to make it handle the complex requirements of a NumPy distribution. They bundled a large portion ofNETLIB into a single monolithic Python package (you might know this asSciPy), in effect creating a distribution-as-python-package to circumvent the fact that Python's distribution tools cannot manage these extra-Python dependencies in any meaningful way. An entire generation of scientific Python users spent countless hours struggling with the installation hell created by this exercise of forcing a square peg into a round hole – and those were just ones lucky enough to be using Linux. If you were on Windows, forget about it. To read some of the details about these pain-points and how they led to Conda, I'd suggest Travis Oliphant's2013 blog post on the topic.
But why didn't Conda's creators just talk to the Python packaging folks and figure out these challenges together? As it turns out, they did.
The genesis of Conda came after Guido van Rossum was invited to speak at the inaugural PyData meetup in 2012; in a Q&A on the subject of packaging difficulties, he told us that when it comes to packaging, "it really sounds like your needs are so unusual compared to the larger Python community that you're just better off building your own" (See video of this discussion). Even while following this nugget of advice from the BDFL, the PyData community continued dialog and collaboration with core Python developers on the topic: one more public example of this was the invitation of CPython core developer Nick Coghlan to keynote at SciPy 2014 (See video here). He gave an excellent talk which specifically discusses pip and conda in the context of the "unsolved problem" of software distribution, and mentions the value of having multiple means of distribution tailored to the needs of specific users.
Far from insinuating that Conda is divisive, Nick and others at the Python Packaging Authority officiallyrecognize conda as one of many important redistributors of Python code, and are working hard to better enable such tools to work seamlessly with the Python Package Index.
谬见#4:起初创造conda是不负责任和制造分裂
事实:Conda的开创者们为将Python的标准包装推向了极限工作了十多年,并且只有在明确了这是唯一合理的前进道路时才创造了第二个工具。
根据Python的哲学,用Python做任何事情 “应该有一个——最好只有一个 ——明显的方式来完成”。那么为什么conda的开创者会引入一种新的方式安装Python包来搞乱呢?为什么他们不回Python社区改进pip来克服其缺陷呢?
事实证明,这正是他们所做的。2012年之前,PyData/SciPy生态系统的开发者全力地在Python社区开发的包管理解决方案的限制下工作。早在2001年,NumPy项目试图使它处理NumPy发行版的复杂需求。他们将大部分的NETLIB绑定成一个整体的Python包(你可能知道这是SciPy),实际相当于创建了一个python包发行版,以避开Python的分发工具不能有效管理这些Python体系外的依赖项这一问题。一整代的Python科学计算用户花费了不计其数的时间应付这一安装中的混乱地狱,总是在迫使方桩打进圆孔里——这还是幸运地使用Linux情况下。如果你想用在Windows,还是算了吧。关于这些痛点以及如何导致Conda的细节,我建议阅读Travis Oliphant的2013年博客文章。
但是为什么Conda的创作者不和Python打包的人员谈谈,一起解决呢?事实证明,他们谈了。
Conda的起源来自Guido van Rossum受邀在2012年首届PyData聚会上发表演讲;在一个关于打包难题的问答中,他告诉我们说,“确实听起来你们的需求与广义的Python社区相比很不一样,你们最好建立自己专属的”(见原文链接视频)。即使遵循这个来自BDFL的宝贵建议,PyData社区仍继续与核心Python开发者在这一主题上保持对话和合作:另一个公开的例子是邀请CPython核心开发人员Nick Coghlan在SciPy 2014上的主题演讲(见原文链接视频)。他特别谈到了在软件分发的“未解决问题”的背景下的pip和conda,涉及了拥有多种符合特定用户需求的分发方式的价值。
显然Conda不是在制造分裂,Nick和其他人在PyPA正式认可conda作为Python代码的重要的分发器之一,并且正在努力更好地使这些工具与PyPI无缝地工作。
Myth #5: conda doesn't work with virtualenv, so it's useless for my workflow
Reality: You actually can install (some) conda packages within a virtualenv, but better is to use Conda's own environment manager: it is fully-compatible with pip and has several advantages over virtualenv.
virtualenv/venv are utilites that allow users to create isolated Python environments that work with pip
. Conda has its ownbuilt-in environment manager that works seamlessly with both conda and pip, and in fact has several advantages over virtualenv/venv:
- conda environments integrate management of different Python versions, including installation and updating of Python itself. Virtualenvs must be created upon an existing, externally managed Python executable.
- conda environments can track non-python dependencies; for example seamlessly managing dependencies and parallel versions of essential tools like LAPACK or OpenSSL
- Rather than environments built on symlinks – which break the isolation of the virtualenv and can be flimsy at times for non-Python dependencies – conda-envs are true isolated environments within a single executable path.
- While virtualenvs are not compatible with conda packages, conda environments areentirely compatible with pip packages. First
conda install pip
, and then you canpip install
any available package within that environment. You can evenexplicitly list pip packages in conda environment files, meaning the full software stack is entirely reproducible from a single environment metadata file.
That said, if you would like to use conda within your virtualenv, it is possible:
$ virtualenv test_conda
$ source test_conda/bin/activate
$ pip install conda
$ conda install numpy
This installs conda's MKL-enabled NumPy package within your virtualenv. I wouldn't recommend this: I can't find documentation for this feature, and the result seems to be fairly brittle – for example, trying toconda update python
within the virtualenv fails in a very ungraceful and unrecoverable manner, seemingly related to the symlinks that underly virtualenv's architecture. This appears not to be some fundamental incompatibility between conda and virtualenv, but rather related to some subtle inconsistencies in the build process, and thus is potentially fixable (seeconda Issue 1367 andanaconda Issue 498, for example).
If you want to avoid these difficulties, a better idea would be to pip install conda
and thencreate a new conda environment in which to install conda packages. For someone accustomed topip
/virtualenv
/venv
command syntax who wants to try conda, the conda docs include a translation table between conda
andpip
/virtualenv
commands.
谬见#5:conda不能使用virtualenv,所以它对我的工作流没有用
事实:你实际上可以在一个virtualenv中安装(一些)conda包,但更好的是使用Conda自己的环境管理器:它与pip完全兼容,并且比virtualenv有几个优点。
virtualenv/venv是允许用户创建与pip一起使用的隔离的Python环境的实用程序。Conda有自己的内置环境管理器,可以与conda和pip无缝工作,并且事实上比virtualenv/venv有几个优点:
-
conda 环境集成了不同 Python 版本的管理,包括 Python 本身的安装和更新。 Virtualenvs 必须在现有的、外部管理的 Python 可执行文件上创建。
-
conda 环境可以追踪非 python 依赖项, 例如无缝管理底层工具 如 LAPACK 或 OpenSSL的依赖项和 并行版本。
-
与建立在符号链接上的环境不同——那样就 破坏了 virtualenv 的隔离性,并且对于非 Python 依赖关系有时可能是脆弱的—— conda-envs 是真正的建立在单独的可执行路径中的隔离环境。
-
虽然 virtualenvs 与 conda 包不兼容,但 conda 环境与 pip 包完全兼容。首先conda install pip ,之后你可以在那个环境里pip install 任何可用的包。你甚至可以在 conda 环境文件中显式地列出 pip 包,这意味着整个软件库可以从单独的环境元数据文件中完全重现。
然而,如果你想在你的virtualenv中使用conda,也是可以的:
$ virtualenv test_conda
$ source test_conda/bin/activate
$ pip install conda
$ conda install numpy
这将你的virtualenv中安装conda的启用了MKL的NumPy包。我不推荐这么做:我找不到这个功能的文档,并且结果似乎相当脆弱——例如,试图在virtualenv中conda update python
会导致以一个很不体面和不可恢复的方式失败,似乎与virtualenv架构所使用的符号链接方式有关。 似乎conda和virtualenv之间不存在根本的不兼容,而是与构建过程中的一些微妙的不一致有关,因此是可能解决的(例如,conda Issue 1367和anaconda Issue 498,见原文链接)。
如果你想避免这些困难,一个更好的方法是pip install conda,然后创建一个新的conda环境安装conda包。对于习惯于使用pip/virtualenv/venv命令语法的人,conda文档包含一份conda和pip/virtualenv命令之间的转换表。
Myth #6: Now that pip useswheels, conda is no longer necessary
Reality: wheels address just one of the many challenges that prompted the development of conda, and wheels have weaknesses that Conda's binaries address.
One difficulty which drove the creation of Conda was the fact that pip could distribute only source code, not pre-compiled binary distributions, an issue that was particularly challenging for users building extension-heavy modules like NumPy and SciPy. After Conda had solved this problem in its own way, pip itself added support for wheels, a binary format designed to address this difficulty withinpip
. With this issue addressed within the common tool, shouldn't Conda early-adopters now flock back to pip?
Not necessarily. Distribution of cross-platform binaries was only one of the many problems solved within conda. Compiled binaries spotlight the other essential piece of conda: the ability to meaningfully track non-Python dependencies. Because pip's dependency tracking is limited to Python packages, the main way of doing this within wheels is to bundle released versions of dependencies with the Python package binary, which makes updating such dependencies painful (recent security updates to OpenSSL come to mind). Additionally, conda includes a true dependency resolver, a component which pip currently lacks.
For scientific users, conda also allows things like linking builds to optimized linear algebra libraries, as Continuum does with its freely-provided MKL-enabled NumPy/SciPy. Conda can even distribute non-Python build requirements, such asgcc
, which greatly streamlines the process of building other packages on top of the pre-compiled binaries it distributes. If you try to do this using pip's wheels, you better hope that your system has compilers and settings compatible with those used to originally build the wheel in question.
谬见#6:现在pip使用wheels,conda不再有必要了
事实:wheels只是处理了推动conda发展的许多挑战之一,而wheels有的弱点是Conda的二进制文件可处理的。
驱动Conda创建的一个难题是,pip可以只分发源代码,而不是预编译的二进制发行版。这个问题对于构建诸如NumPy和SciPy这类重扩展(译者注:这类包使用大量的预编译二进制文件实现核心计算功能)模块的用户来说尤其有挑战性。 Conda以独有的方式解决了这个问题后,pip本身增加了对wheels的支持,这是一个用来在pip内部处理这个难题的二进制格式。当用常规工具解决了这个问题后,Conda的早期使用者现在不是应该回到pip吗?
不必要。跨平台二进制文件的分发只是conda中解决的许多问题之一。编译的二进制文件聚焦了conda的另一个重要部分:有效追踪非Python依赖项的能力。因为pip的依赖追踪只限于Python包,所以在wheels中的主要实现方式是将依赖项的发布版本与Python包二进制包捆绑在一起,这使得更新这样的依赖项很痛苦(想想最近OpenSSL的安全更新)。此外,conda包括一个真正的依赖解析器,这个组件是pip目前缺少的。
对于科学计算用户,conda还允许将构建的包链接到优化的线性代数库,如同Continuum在其自由提供的启用MKL的NumPy/SciPy包中那样。Conda甚至可以分发非Python构建需求,例如gcc,这使得在其分发的预编译二进制代码上构建其他包的过程更为流水线化。如果你试图使用pip的wheels,你最好希望你的系统和构建wheels包的系统有相兼容的编译器和设置。
事实:Conda是一种通用包管理系统,旨在构建和管理任何语言的任何类型的软件。因此,它也适用于Python包。
因为conda来自于Python(更具体地说是PyData)社区,许多人错误地认为它基本上是一个Python包管理器。情况并非如此:conda旨在管理任何软件堆栈中的包和依赖关系。在这个意义上,它不像pip,更像是apt或yum等跨平台版本。
如果你使用conda,你已经可以利用许多非Python包;以下命令将列出您环境中的那些:
$ conda search–canonical | grep -v ‘py\d\d’
在我的系统上,有350个结果:这些是我的Conda / Python环境中的包,这些包基本上是由Python-only工具(如pip和virtualenv)无法管理的。