conda使用pip_使用conda pip管理基于jupyterlab的数据科学项目

最新推荐文章于 2024-09-26 15:26:24 发布

weixin_26718993

最新推荐文章于 2024-09-26 15:26:24 发布

阅读量154

点赞数

文章标签： java python 人工智能 vue 大数据 ViewUI

原文链接：https://towardsdatascience.com/managing-jupyterlab-based-data-science-projects-using-conda-pip-cd2ee8521705

版权

conda使用pip

This article discusses two approaches for managing JupyterLab-based data science projects using Conda (+pip): a “system-wide” approach where Conda (+pip) are used to manage a single JupyterLab installation that is shared across all projects, and a “project-based” approach where Conda (+pip) are used to manage separate JupyterLab installations for each project. After describing the two approaches I will walk through some examples and discuss the relevant tradeoffs.

本文讨论了使用Conda(+ pip)管理基于JupyterLa b的数据科学项目的两种方法：一种“系统级”方法，其中Conda(+ pip)用于管理在所有项目之间共享的单个JupyterLab安装，以及一种“基于项目”的方法，其中Conda(+ pip)用于管理每个项目的单独JupyterLab安装。在描述了两种方法之后，我将通过一些示例并讨论相关的权衡。

“全系统” JupyterLab安装 (“System-wide” JupyterLab install)

With a “system-wide” approach to managing JupyterLab, Conda (+pip) are used to manage a JupyterLab installation that is shared across all or your data science projects. There are several benefits to a “system-wide” approach.

通过“全系统”管理JupyterLab的方法，Conda(+ pip)用于管理在所有或您的数据科学项目中共享的JupyterLab安装。 “全系统”方法有许多好处。

Common set of JupyterLab extensions simplifies user interface (UI) and user experience (UX).
通用的JupyterLab扩展集简化了用户界面(UI)和用户体验(UX)。
Allows for quicker start of new projects as no need to install (and build!) JupyterLab for every project.
无需为每个项目安装(或构建！)JupyterLab，即可更快地启动新项目。
Easy low-level configuration of JupyterLab via files inside the ~/.jupyter directory in your user home directory.
通过用户主目录中~/.jupyter目录中的文件，轻松进行JupyterLab的低级配置。

典型的Conda environment.yml，用于“系统范围”安装 (Typical Conda environment.yml for a “system-wide” install)

Below is a stub environment.yml file for a “system-wide” JupyterLab installation.

以下是用于“系统范围” JupyterLab安装的存根environment.yml文件。

name: jupyterlab-base-envchannels:
 — conda-forge
 — defaults
dependencies:
 — jupyterlab
 — jupyterlab-git # provides git support
 — nodejs # required for building (some) extensions
 — pip
 — pip:
 — -r file:requirements.txt # extensions available via pip go here
 — python

Couple of things worth noting. First, you should include any JupyterLab extensions available via Conda (typically from the conda-forge channel) as dependencies in this file. JupyterLab extensions available via Pip should be included separately in a requirements.txt file (discussed below). Second, I explicitly include nodejs as a dependency. Node.js is required to rebuild JupyterLab (which might be necessary depending on your collection of extensions). Finally, I install Pip and use pip to install all packages and extensions included in a separate requirements.txt file.

几件事情值得注意。首先，您应包括通过Conda提供的任何JupyterLab扩展(通常是从conda-forge forge渠道获得)作为此文件中的依赖项。通过Pip提供的JupyterLab扩展应单独包含在requirements.txt文件中(如下所述)。其次，我明确地将nodejs作为依赖项。需要Node.js来重建JupyterLab(根据您的扩展集合，可能是必需的)。最后，我安装了Pip，并使用pip安装了单独的requirements.txt文件中包含的所有软件包和扩展。

“全系统”安装的典型pip requirements.txt (Typical pip requirements.txt for a “system-wide” install)

Nothing fancy about the requirements.txt file. You just list the packages and extensions that you want to install via pip. Here I am including the Jupyter Language Server Protocol extension which brings full IDE capabilities such as code navigation, hover suggestions, linters, autocomplete and rename to JupyterLab.

没什么requirements.txt文件。您只需列出要通过pip安装的软件包和扩展名。在这里，我包括Jupyter语言服务器协议扩展，它带来了完整的IDE功能，例如代码导航，悬停建议，短毛绒，自动完成功能以及重命名为JupyterLab。

jupyter-lsp
python-language-server[all]

使用Bash脚本自动化`jupyterlab-base-env`构建 (Automate `jupyterlab-base-env` build using a Bash script)

Because the environment build process for Conda (+pip) environments is a bit complicated (as it involves installing packages via Conda, installing packages via Pip, and then possibly installing extensions and rebuilding JupyterLab itself) it is a good idea to automate the environment build using a Bash script.

因为Conda(+ pip)环境的环境构建过程有点复杂(因为它涉及通过Conda安装软件包，通过Pip安装软件包，然后可能安装扩展并重建JupyterLab本身)，所以自动化环境构建是一个好主意使用Bash脚本。

#!/bin/bash --login
set -econda env create \
 —-name jupyterlab-base-env \
 —-file environment.yml \
 —-force
conda activate jupyterlab-base-env
source postBuild # put jupyter labextension install commands here

Note the use of the --login flag. This ensures that the script will run inside a login shell which will properly source the necessary Bash profiles required for the conda activate command to work as expected. Also note the reference to a postBuild file. This is a Bash script that contains any necessary jupyter labextension install commands required to enable those extensions and rebuild JupyterLab. I have included a working example of all the configuration files reference above on GitHub.

请注意--login标志的使用。这样可以确保该脚本将在登录外壳中运行，该外壳将正确提供conda activate命令按预期运行所需的必要Bash配置文件。还要注意对postBuild文件的引用。这是一个Bash脚本，其中包含启用这些扩展和重建JupyterLab所需的所有必要的jupyter labextension install命令。我在GitHub上提供了上面所有参考配置文件的工作示例。

使`jupyterlab-base-env`保持精简 (Keep your `jupyterlab-base-env` lean)

Your jupyterlab-base-env environment should only contain JupyterLab and any required extensions (+dependencies). Do not install packages which you will use for data science projects into your jupyterlab-base-env. Instead, you should create separate Conda (+pip) environments for each of your projects and then create custom Jupyter kernels for each of your project-specific Conda (+pip) environments.

您的jupyterlab-base-env环境应仅包含JupyterLab和任何必需的扩展名(+依赖项)。不要将要用于数据科学项目的软件包安装到jupyterlab-base-env 。相反，您应该为每个项目创建单独的Conda(+ pip)环境，然后为每个项目特定的Conda(+ pip)环境创建自定义Jupyter内核。

为Conda环境创建Jupyter内核 (Creating Jupyter kernels for Conda environments)

Creating a custom Jupyter kernel for each of your project’s Conda (+pip) environments is what will allow you to launch Jupyter Notebooks and IPython consoles from those environments within a common JupyterLab installation. You can even automate the kernel creation process for all Conda (+pip) environments on your machine using the jupyter-conda extension!

为项目的每个Conda(+ pip)环境创建自定义Jupyter内核，将使您能够从普通JupyterLab安装中的那些环境中启动Jupyter Notebook和IPython控制台。您甚至可以使用jupyter-conda扩展jupyter-conda计算机上所有Conda(+ pip)环境的内核创建过程！

However, rather than create custom kernels for every Conda (+pip) environment, I prefer to manually create custom Jupyter kernels for particular Conda (+pip) environments that I really care about.

但是，我不是为每个Conda(+ pip)环境创建自定义内核，而是为我真正关心的特定Conda(+ pip)环境手动创建自定义Jupyter内核。

How to manually create a custom Jupyter kernel

如何手动创建自定义Jupyter内核

Before you can create a custom kernel for your Conda (+pip) environment you need to make sure that the ipykernel package is installed in your Conda environment as you will need to use this package to create the kernel spec file.

在为Conda(+ pip)环境创建自定义内核之前，您需要确保ipykernel软件包已安装在Conda环境中，因为您将需要使用此软件包来创建内核规范文件。

conda activate $PROJECT_DIR/env # don’t forget to activate!
python -m ipykernel install \ # requires ipykernel!
 --user \
 —-name name-for-internal-use-only \
 —-display-name “Name you will see in the JupyerLab launcher”

基于项目的JupyterLab安装 (Project-based JupyterLab install)

With a “project-based” approach to managing JupyterLab, Conda (+pip) are used to manage separate JupyterLab installations for each project. There are several advantages to the “project-based” approach.

通过“基于项目”的方法来管理JupyterLab，Conda(+ pip)用于管理每个项目的单独JupyterLab安装。 “基于项目”的方法有几个优点。

More flexible UI/UX as JupyterLab version and extensions can customized for each project.
可以为每个项目自定义JupyterLab版本和扩展，因此UI / UX更加灵活。
Easier experimentation with bleeding edge features of JupyterLab.
使用JupyterLab的出血边缘功能更容易进行实验。
Automatically makes a data science project repo “Binder-ready”.
自动将数据科学项目存储库设为“ Binder-ready” 。

典型的Conda environment.yml，用于“基于项目”的安装 (Typical Conda environment.yml for a “project-based” install)

The structure of this environment.yml file is similar to that of the “system-wide” approach. The difference is that, with the “project-based” approach, you should add all the required packages and extensions for your project that are available via Conda to this file; similarly for Pip and the requirements.txt file.

这个environment.yml文件的结构类似于“系统范围”方法的结构。区别在于，使用“基于项目”的方法，您应该将可通过Conda获得的项目所需的所有软件包和扩展名添加到此文件中； Pip和requirements.txt文件也是如此。

name: nullchannels:
 — conda-forge
 — defaults
dependencies:
 — jupyterlab
 — jupyterlab-git # extensions available via conda go here
 - nodejs
 — pip
 — pip:
 — -r file:requirements.txt # packages available via pip go here
 — python

使用Bash脚本自动化项目环境构建 (Automate project environment build with Bash script)

Again you should automate the Conda (+pip) environment build wherever possible. The only difference between this script and the script used in the “system-wide” approach is that I install my Conda (+pip) environment inside a subdirectory called env of my project directory. This is a Conda (+pip) “best practice”.

同样，您应该尽可能自动执行Conda(+ pip)环境构建。此脚本与“系统范围”方法中使用的脚本之间的唯一区别是，我将Conda(+ pip)环境安装在项目目录中名为env的子目录中。这是Conda(+ pip)的“最佳实践” 。

#!/bin/bash —-login
set -eexport ENV_PREFIX=$PROJECT_DIR/env
conda env create \
 —-prefix $ENV_PREFIX 
 —-file environment.yml \
 —-force
conda activate $ENV_PREFIX
source postBuild # put jupyter labextension install commands here

“基于项目”的JupyterLab安装示例 (Examples of “project-based” JupyterLab installs)

As promised here are a few examples of the “project-based” approach that you can use as inspiration for your next data science project.

正如这里所承诺的，您可以使用“基于项目”方法的一些示例，您可以将其用作下一个数据科学项目的灵感。

JupyterLab + Scikit Learn + Dask: Environment for CPU-based data science projects that combines JupyterLab with Scikit-learn and Dask (and friends!). Includes some common JupyterLab extensions.
JupyterLab + Scikit Learn + Dask ：基于CPU的数据科学项目的环境，该环境将JupyterLab与Scikit-learn和Dask (以及朋友！)结合在一起。包括一些常见的JupyterLab扩展。
JupyterLab + PyTorch: Standard environment for GPU-accelerated deep learning with JupyterLab and PyTorch. Includes GPU and deep learning specific JupyterLab extensions such as jupyterlab-nvdashboard and jupyterlab-tensorboard.
JupyterLab + PyTorch ：使用JupyterLab和PyTorch进行GPU加速的深度学习的标准环境。包括GPU和深度学习特定的JupyterLab扩展，例如jupyterlab-nvdashboard和jupyterlab-tensorboard 。
JupyterLab + NVIDIA RAPIDS + BlazingSQL + Dask: More complex environment for GPU-accelerated machine learning with JupyterLab, NVIDIA RAPIDS, BlazingSQL, and Dask (and many friends!). Includes some common JupyterLab extensions as well as some GPU-specific ones such as jupyterlab-nvdashboard.
JupyterLab + NVIDIA RAPIDS + BlazingSQL + Dask ：使用JupyterLab， NVIDIA RAPIDS ， BlazingSQL和Dask (还有许多朋友！)为GPU加速的机器学习提供了更为复杂的环境。包括一些常见的JupyterLab扩展以及一些特定于GPU的扩展，例如jupyterlab-nvdashboard 。

`%conda`和％pip magic命令 (`%conda` and %pip magic commands)

Any discussion of JupyterLab, Conda, and pip would be incomplete without mentioning built-in IPython magic commands for installing packages into an active environment/kernel via Conda (%conda) or Pip (%pip).

如果不提及内置的IPython魔术命令(用于通过Conda( %conda )或Pip( %pip )将软件包安装到活动环境/内核)，那么JupyterLab，Conda和pip的任何讨论都是不完整的。

Both commands can be used from within Jupyter Notebooks or IPython consoles.
这两个命令都可以在Jupyter Notebook或IPython控制台中使用。
Both %conda and %pip are mostly useful for prototyping new projects.
%conda和%pip都对新项目的原型制作最有用。
For “production”, prefer adding new packages to either the environment.yml or requirements.txt files (and rebuilding the environment).
对于“生产”，最好将新包添加到environment.yml或requirements.txt文件中(并重建环境)。

摘要 (Summary)

Hopefully by this point you will understand the difference between “system-wide” and “project-based” approaches to managing JupyterLab installs with Conda (+pip). You have also seen several examples of both approaches including some starter code that you can use for you next data science project.

希望到此为止，您将了解使用Conda(+ pip)管理JupyterLab安装的“系统范围”和“基于项目”方法之间的区别。您还看到了这两种方法的几个示例，包括一些入门代码，可用于下一个数据科学项目。

In general, I recommend the “project-based” approach for its greater flexibility with minimal additional overhead. If you work with GPUs on only some of your projects, then you may prefer the “project-based” approach as there are some nice JupyterLab extensions for GPU accelerated data science projects (that you don’t want to install for CPU-only projects). However, if all of your projects are either only-CPU or you almost always use GPUs and you always use a common set of JupyterLab extensions, then you may prefer the “system-wide” approach.

总的来说，我建议使用“基于项目”的方法，因为它具有更大的灵活性，并且具有最小的额外开销。如果仅在某些项目上使用GPU，则您可能更喜欢“基于项目”的方法，因为对于GPU加速的数据科学项目有一些不错的JupyterLab扩展(您不想为仅CPU的项目安装) )。但是，如果所有项目都是仅CPU的，或者您几乎总是使用GPU，并且总是使用一组通用的JupyterLab扩展，那么您可能更喜欢“系统范围的”方法。