
How to Simplify Data Science Development Environments in Windows by using Docker on Windows Subsystem for Linux 2.


Problems with Python Environments


Python and its extensive library of packages provide an amazing array of libraries and applications covering every use case imaginable for your data science and machine learning development workflow.


Like everyone else, you will most likely be using Anaconda distribution of Python, which packages Python with most commonly used packages for your data science and machine learning development workflow. To manage its extensive library of packages — over 7,500 data science and machine learning packages at last count, it comes with Conda package manager that automates the process of installing, updating, and removing packages.

When installing a new Python package, Conda will first resolve the dependencies, check if they are already installed on the system, and, if not, install them. Once all dependencies have been satisfied, which may require new installation of packages, upgrades of existing packages or downgrades of existing packages; then it will proceed to install the requested package(s). This all happens globally, by default, installing everything onto the machine in a single, operating system-dependent location.

The problem is that most of Python packages have dependencies to other Python packages, sometimes specific version of Python packages. When it comes to installing and using these Python packages, you will often find yourself running into issues with missing permissions, incompatible library dependencies, and installations that just break in surprising ways.

Conda tries to resolve these issues by using environments where you can create, export, list, remove, and update environments that have different versions of Python and/or packages installed in them where you can switch or move between environments. In light of this, you spend countless hours creating and managing environments to reduce chance of your code breaking due to environment issues.

But in many instances, by using two most dangerous Conda commands (“conda update conda” and “conda update anaconda”), you can make all your environment(s) unusable.

Using Docker in Data Science


There is a better way to manage data science and machine learning development environments using Docker where it enables developers to easily create environments, manage environments, distribute environments without worries about error(s) in one environment breaking other environments.


Docker containers do this by enabling developers to isolate code into a single Docker container. This makes it easier to modify and update the program. Docker makes it possible to set up local development environments that are exactly like a your co-workers’ environments; run multiple development environments from the same machine with unique software, operating systems, and configurations; and allow anyone to work on the same project with the exact same settings, regardless of the their machine configuration.

Benefits of Docker in Data Science


There are some important benefits of using Docker container for building a model.


  1. Maintain production machine learning models development environment


Some of the models we develop needs to be maintained for many years, which requires periodic performance assessment of the model as well as model re-calibration, if needed. By using Docker container, you will be able to preserve the exact development environment used to develop the model, which can be used to perform those tasks.

2. Sync development environments with co-workers


Model development is a usually team effort, depending on efforts need to develop the model(s). By using Docker container, you will be able to enforce every team members have same development environment as well as easily share team’s development environment with a new team member(s). There is nothing worse than it works on my machine, but other people cannot replicate same results due to the environment issues.

3. Have many developments environments as needed without cons of virtual environments


Problem with virtual environments is that they are virtual where changes to one environment may impact other environments. Docker containers are not virtual since they are self contained. You can have many developments environments as needed without any worry about one environment impacting other environments.

Step-by-Step Instructions


Install Docker


This article provide a step-by-step approach of installing Docker in Windows 10, utilizing Windows Subsystem for Linux 2 (WSL2). This approach provides best of all world, ease of use of Windows 10 and robustness of Linux where all python libraries/packages are available.

Step 1: Upgrade Windows 10 to most recent version of Windows 10. WSL2 requires Version 2004 or later. There are multiple tutorial available online on this topic.

Step 2: Install Windows Subsystem for Linux 2 (WSL 2). There are multiple tutorials available online on this topic.

Step 3: Install and enable Linux Distribution for WSL 2. As of this writing, I recommend installing and using Ubuntu 20.04 LTS. There are multiple tutorials available online on this topic.

Step 4: Install Docker Desktop for Windows

Navigate to https://www.docker.com/products/docker-desktop on your browser as shown below to download Windows (stable) version of Docker Desktop.

When download is completed, install Docker Desktop. When prompted for Configuration, ensure both boxes are checked: “Enable WSL 2 Windows Features” and “Add shortcut to desktop” as shown below.

When installation is completed, “Close and log out” of Docker Desktop as shown below.

Restart your computer. You will find a little whale icon in your task bar. Docker starts when your computer start by default.

Step 5: Install Docker on WSL — Ubuntu 20.04 LTS

Find and open Ubuntu 20.04 LTS terminal from Windows Start menu. Enter following command in the terminal “sudo apt install docker”. When prompted for your password, enter password you have used to setup Ubuntu 20.04 LTS.

Start Using Docker Container


Step 1: Pull (Download) Docker Container of Anaconda Distribution


Find and open Ubuntu 20.04 LTS terminal from Windows Start menu. Enter following command in the terminal “docker pull continuumio/anaconda3”. You will see multiple lines of progress bar, indicating anaconda3 is being downloaded. When completed, it will state “Downloaded newer image for continumio/anaconda3:latest”.

Step 2: Run multiple versions of Anaconda Distribution

  1. Open Ubuntu 20.04 LTS terminal and create a directory named “tensorflow” (Command: mkdir tensorflow), which will create a directory entitled “/home/sungkim/tensorflow”.

2. From same Ubuntu 20.04 LTS terminal, enter “docker run --name tensorflow -v ~/tensorflow:/root -i -t -p 8888:8888 continuumio/anaconda3 /bin/bash

  • where “ —name tensorflow” identifies the Docker container as “tensorflow”

  • where -v allows you to access files on docker container by binding volume that links to /root directory from Docker container to the ~/tensorflow (/home/sungkim/tensorflow) directory on your Windows machine.

  • where “-i -t” keeps Docker container interactive

  • where “-p 8888:8888” forward container’s port 8888 to host’s port 8888. which is needed if you want to use jupyter notebook/lab from Windows

  • where “continuumio/anaconda3” is name of Docker container image

  • where “bin/bash” enforces use of bash shell in Docker Container since I like bash shell

This should create and run/enter Docker container instance with Anaconda Distribution in same terminal that is named “tensorflow” where you can install python packages as well as run python programs as shown below.

3. Enter “conda install -c anaconda tensorflow” to install tensorflow. When tensorflow installation is completed, enter “exit” to exit out of Docker container and back to Ubuntu 20.04 LTS terminal.

4. Save changes to docker container by entering "docker commit -m="Anaconda3 with Tensorflow" tensorflow continuumio/anaconda3-tensorflow". This is needed so all your changes are save to new Docker container image as well as share the image with your co-workers so they will have same development as you.

5. Navigate to Docker Desktop and delete “tensorflow” docker container by click on “DELETE” icon next to it.

6. Run Docker container again by entering “docker run --name tensorflow -v ~/tensorflow:/root -i -t -p 8888:8888 continuumio/anaconda3-tensorflow /bin/bash

  1. Open Ubuntu 20.04 LTS terminal and create a directory named “tensorflow-gpu” (Command: mkdir tensorflow-gpu), which will create a directory entitled “/home/sungkim/tensorflow-gpu”.

2. From same Ubuntu 20.04 LTS terminal, enter “docker run --name tensorflow-gpu -v ~/tensorflow-gpu:/root -i -t -p 8888:8888 continuumio/anaconda3 /bin/bash

This should create and run/enter Docker container instance with Anaconda Distribution in same terminal that is named “tensorflow-gpu” where you can install python packages as well as run python programs as shown below.

3. Run “conda install -c anaconda tensorflow-gpu” to install tensorflow-gpu. When tensorflow-gpu installation is completed, enter “exit” to exit out of Docker container and back to Ubuntu 20.04 LTS terminal.

4. Save changes to docker container by entering "docker commit -m="Anaconda3 with Tensorflow GPU" tensorflow-gpu continuumio/anaconda3-tensorflow-gpu". This is needed so all your changes are save to new Docker container image as well as share the image with your co-workers so they will have same development as you.

5. Navigate to Docker Desktop and delete “tensorflow-gpu” docker container by click on “DELETE” icon next to it.

6. Run Docker container again by entering “docker run --name tensorflow-gpu -v ~/tensorflow-gpu:/root -i -t -p 8888:8888 continuumio/anaconda3-tensorflow-gpu /bin/bash”.

  1. Open Ubuntu 20.04 LTS terminal and create a directory named “pytorch” (Command: mkdir pytorch), which will create a directory entitled “/home/sungkim/pytorch”.

2. Open another Ubuntu 20.04 LTS terminal, enter “docker run --name pytorch -v ~/pytorch:/root -i -t -p 8888:8888 continuumio/anaconda3 /bin/bash

This should create and run/enter Docker container instance with Anaconda Distribution in same terminal that is named “pytorch” where you can install python packages as well as run python programs as shown below.

3. Run “conda install -c pytorch pytorch” to install pytorch. When pytorch installation is completed, enter “exit” to exit out of Docker container and back to Ubuntu 20.04 LTS terminal.

4. Save changes to docker container by entering "docker commit -m="Anaconda3 with Pytorch" pytorch continuumio/anaconda3-pytorch". This is needed so all your changes are save to new Docker container image as well as share the image with your co-workers so they will have same development as you.

5. Navigate to Docker Desktop and delete “pytorch” docker container by click on “DELETE” icon next to it.

6. Run Docker container again by entering “docker run --name pytorch -v ~/pytorch:/root -i -t -p 8888:8888 continuumio/anaconda3-pytorch /bin/bash”.

When the above steps are completed, start Docker Desktop and you should be able to see three newly created Docker container as shown below.

From Docker Desktop, you will be able to start said Docker container and enter Command Line Interface (CLI) as needed. Unlike Conda environments, if one Docker container is corrupted due to package versioning issues as well as Conda issues, other Docker container will not be impacted.

Step 8: Setup Development Environment (Microsoft Visual Studio Code)

If you have not done already, download and install Microsoft Visual Studio Code, which is best FREE editor out there right now. Download is available on https://code.visualstudio.com/ and just follow the onscreen instruction to install Visual Studio code.

  1. Install following extensions (you can access extension option by pressing CTRL-SHIFT-X or selecting Extensions icon on left navigation window):

  • Search and install “Python” (by Microsoft)

  • Search and install “Anaconda Extension Pack” (by Microsoft)

  • Search and install “Docker” (by Microsoft)

  • Search and install “Remote — Containers” (by Microsoft)

  • Search and install “Remote — WSL” (by Microsoft)

2. After you have installed all above extensions, you should be able to see and access Docker icon on left navigation window where you can:


  • Start Docker container

  • View Logs

  • Attach Visual Studio Code

  • Attach Shell

  • Stop Docker Container

  • Restart Docker Container

  • and more…


as shown below.


3. Right-click on selected container name and select Start to start the docker container.


4. Right-click on selected container name and select Attach Visual Studio Code.

5. Create a new file and save the file as “hello.py” under /root directory with following code “print(‘Hello World!’)” in the file. This file also be available to you on Windows machine (e.g., /home/sungkim/tensorflow).

6. Select “Run | Run without debugging” to execute python script.

7. Start using Visual Studio Code

Step 9: Setup Development Environment (Jupyter Lab)

To use Jupyter Lab as your editor, following these steps below:

  1. Start any Docker container and enter its terminal

  2. Create a directory named “notebooks” (command: “mkdir notebooks”). This should create /root/notebooks directory

  3. Run this command to start Jupyter Lab instance => jupyter lab --notebook-dir=/root/notebooks --ip='*' --port=8888 --no-browser --allow-root

  4. Start the browser on Windows

  5. Enter “http://localhost:8888/

  6. Copy and paste really long token from docker terminal to browser to login to Jupyter Lab.

  7. Create a python notebook. This file also be available to you on Windows machine (e.g., /home/sungkim/tensorflow/notebooks).

  8. Start using Jupyter Lab!

Docker是一种用于构建、运行和管理应用程序的开源平台。它允许开发人员将应用程序及其所有依赖项打包成一个可移植的容器,这个容器可以在任何环境中运行,无论是开发环境、测试环境还是生产环境。使用Docker可以实现应用程序和其依赖项的隔离,提供一致性和可重复性,并简化部署和扩展过程。 在使用Docker部署机器学习或深度学习模型时,可以通过定制Docker镜像来创建个性化的运行环境。这个过程可以使用Dockerfile来完成,Dockerfile是一个文本文件,它包含了一系列指令和配置,用于构建Docker镜像。通过编写Dockerfile,可以指定需要安装的软件包、环境变量、文件拷贝等操作,从而定制化地创建适合机器学习或深度学习模型的环境。 一旦Dockerfile制作完成,可以使用命令`sudo docker build -it mydocker:v1 .`来创建一个Docker镜像。其中,`-it`选项表示在交互模式下构建镜像,并将其命名为`mydocker:v1`,`.`表示Dockerfile所在的目录。这个命令会根据Dockerfile的指令和配置来生成一个镜像,其中包含了机器学习或深度学习模型所需的环境和依赖项。 在使用Docker过程中,还有一些常用的命令可以帮助管理和操作Docker镜像和容器。例如,`docker images`命令可以查看本机上的所有镜像,`docker ps -a`命令可以查看本机上创建的所有容器,`docker rmi`命令可以删除指定的镜像,`docker rm`命令可以删除指定的容器。这些命令可以帮助我们对Docker镜像和容器进行管理和清理。 综上所述,Docker是一种常用的部署机器学习或深度学习模型的工具,它能够提供隔离性、一致性和可重复性,并简化部署和扩展过程。使用Docker可以通过定制Docker镜像来创建个性化的运行环境,并通过一系列常用命令来管理和操作镜像和容器。


