Data scientists working on a proof-of-concept model or presentation in Jupyter or a similar prototyping environment will need to find a way to share it with their client when the project is complete. Reproducing the environment that ran it successfully on their machine is not always easy.
在Jupyter或类似原型环境中从事概念验证模型或演示工作的数据科学家将需要找到一种方法,以在项目完成后与客户共享该模型。 在他们的机器上复制成功运行它的环境并不总是那么容易。
Docker offers a solution to reproducibility — the environment and workspace files can be packaged up into an image.
Docker提供了可重现性的解决方案-可以将环境和工作空间文件打包到一个映像中。
Ensuring it can be easily deployed to the client in an enterprise setting is not always straightforward though. Their IT department might not have the resources to host the app temporarily on either cloud or internal servers; installing Docker Desktop on the user’s own computer may not be permitted.
不过,确保将其轻松地部署到企业环境中的客户端并不总是那么容易。 他们的IT部门可能没有资源在云或内部服务器上临时托管应用程序; 可能不允许在用户自己的计算机上安装Docker Desktop。
VirtualBox is a long-standing software component that is generally accepted by IT departments and can be installed with little red tape in most enterprises.
VirtualBox是一个长期存在的软件组件,已被IT部门普遍接受,并且在大多数企业中只需很少的繁文installed节即可安装。
This article describes the steps you can take to wrap up a Jupyter notebook environment, initially as a Docker image but ultimately sent to the client as a standalone file that can be imported straight into VirtualBox on their machine.
本文介绍了包装Jupyter笔记本环境所需采取的步骤,最初是作为Docker映像,但最终以独立文件的形式发送到客户端,该文件可以直接导入其计算机上的VirtualBox中。
This is a process we can automate for enterprise clients at ContainDS, but the manual steps described below will supply the most important aspects.
这是我们可以在ContainDS上为企业客户端自动化的过程,但是下面描述的手动步骤将提供最重要的方面。
保存您的Jupyter体验 (Saving your Jupyter experience)
The most important part of your project is of course building the visualization in the first place. I won’t teach you how to do that… but for the purposes of this article we’ll assume it is in a local web app such as a Jupyter notebook, or if you want it to be even more client-friendly, a Jupyter Voila presentation.
当然,项目中最重要的部分是首先构建可视化。 我不会教您如何操作…但是出于本文的目的,我们将假定它位于Jupyter笔记本之类的本地Web应用程序中,或者如果您希望它对客户端更友好,可以选择Jupyter。 Voila演示文稿。
There are plenty of guides to working with Docker in this context to establish an image of your project that you could share with someone, or if you want to cut out having to understand Docker at all then our desktop application ContainDS will do this all for you, including converting your Jupyter-based environment to a Voila presentation.
在这种情况下,有很多与Docker合作的指南来建立您可以与他人共享的项目图像,或者如果您想完全不必了解Docker,那么我们的桌面应用程序ContainDS将为您完成所有这些工作,包括将基于Jupyter的环境转换为Voila演示文稿。
For this article, we will just use a simple example Voila presentation that I’ve already packaged into an image and uploaded to Docker Hub.
对于本文,我们将仅使用一个简单的示例Voila演示文稿,该演示文稿已经打包到一个图像中并上传到Docker Hub了 。
I’m running these steps on a Mac but Windows is very similar.
我在Mac上运行这些步骤,但Windows非常相似。
使用VirtualBox (Working with VirtualBox)
We get a chance to set up our VirtualBox virtual machine (VM) on our own computer first, loading in the Docker image and setting some other configuration. Then we export it as a single ‘OVA’ file to be sent to the client.
我们有机会首先在自己的计算机上设置VirtualBox虚拟机(VM),然后加载Docker映像并进行其他配置。 然后,我们将其导出为单个“ OVA”文件以发送给客户端。
创建您的VirtualBox机器副本 (Creating your copy of the VirtualBox machine)
We will start with a VirtualBox image that Docker has provided in order to run Docker inside a VM. This was really from the days when Docker Desktop didn’t run natively on Mac or Windows, but it is useful here.
我们将从Docker提供的VirtualBox映像开始,以便在VM内运行Docker。 这确实是从Docker Desktop不在Mac或Windows上本机运行的时代开始的,但是在这里它很有用。
It would be technically possible to interact directly with Docker inside this VM — without needing any Docker products installed natively on our computer — but that would need a lot more work. There is a product called docker-machine which we can use to initialise the VirtualBox VM to contain the Docker daemon, and enable us to access its Docker daemon from the host computer. It will also make things simpler if we have the docker command line utility on our machine.
从技术上讲,可以直接在此VM中与Docker进行交互-无需在我们的计算机上本地安装任何Docker产品-但这将需要做更多的工作。 有一种名为docker-machine的产品,可用于初始化VirtualBox VM以包含Docker守护程序,并使我们能够从主机访问其Docker守护程序。 如果我们在计算机上安装了docker命令行实用程序,这也将使事情变得更简单。
Thus, you should install Docker Toolbox. The installer contains VirtualBox as well as docker-machine and the docker command. You can download the latest pkg installer (for Mac) or exe (for Windows) from the GitHub releases page.
因此,您应该安装Docker Toolbox 。 安装程序包含VirtualBox以及docker-machine和docker命令。 您可以从GitHub版本页面下载最新的pkg安装程序(对于Mac)或exe(对于Windows)。
创建VirtualBox VM (Creating the VirtualBox VM)
Use docker-machine to create a new VM on VirtualBox containing the Docker boot2docker image. We will call the VM ‘dsdeploy’. In a terminal window run:
使用docker-machine在VirtualBox上创建一个包含Docker boot2docker映像的新VM。 我们将虚拟机称为“ dsdeploy”。 在终端窗口中运行:
docker-machine create --driver virtualbox dsdeploy
Then configure our terminal to access Docker within the VM, instead of expecting to work with Docker daemon running natively on the host.
然后配置我们的终端以在VM中访问Docker,而不是期望与在主机上本地运行的Docker守护程序一起工作。
docker-machine env dsdeploy
This will display details of environment variables that need to be set on your computer. Once set (the command to set these should be presented on your screen by the ‘env’ command you just ran above), the docker
command will now talk to Docker within the VM.
这将显示需要在计算机上设置的环境变量的详细信息。 设置好之后(设置这些命令的命令应该通过刚刚在上面运行的“ env”命令显示在屏幕上), docker
命令现在将与VM中的Docker对话。
Run the following to obtain the sample Voila presentation Docker image for our project and to set it running as a new container, all inside our new VM:
运行以下命令以获取我们项目的样本Voila演示文稿Docker映像,并将其设置为作为新容器运行,并且都在我们的新VM中:
docker pull danlester/voila-sincos:de3b79a7docker run -p 8888:8888 danlester/voila-sincos:de3b79a7
This fetches the image from Docker Hub and then starts a new container exposing the Jupyter/Voila server on port 8888.
这将从Docker Hub中获取映像,然后启动一个新容器以暴露端口8888上的Jupyter / Voila服务器。
自动启动容器 (Autostart the container)
We could just export the VM now and share it with the client. However, we are missing a couple of things that would make things easier for our recipient.
我们现在就可以导出VM并将其与客户端共享。 但是,我们缺少一些使接收者更容易的东西。
When imported cold into their own VirtualBox, the Docker container will still exist but it will not be running. They would have to obtain a terminal into the VM and set it running again. It would also be difficult to know what URL they need to access the Voila server in their web browser, so we’ll find a way to notify them of this.
当将冷导入自己的VirtualBox中时,Docker容器将仍然存在,但将不会运行。 他们将必须获得进入VM的终端并使其再次运行。 在他们的Web浏览器中,要知道他们需要什么URL来访问Voila服务器也很困难,因此我们将找到一种方法来通知他们。
On your host computer make a text file called docker_start_webapps
containing the following:
在您的主机上,创建一个名为docker_start_webapps
的文本文件, docker_start_webapps
包含以下内容:
echo -e "\n\n Starting all web apps..."docker start $(docker ps -a -q)echo -e "\n ******* \n\n Please visit this address in your web browser\n"ifconfig -a eth1 | grep 'inet addr' | awk -F '[: ]+' -v OFS='' '{print "http://",$4,":8888/"}'echo -e "\n ******* \n"
This script will start all stopped containers and output the URL required to access the Voila presentation. It is ideal as a startup script at the end of the boot process in our VM.
该脚本将启动所有已停止的容器,并输出访问Voila演示文稿所需的URL。 非常适合作为VM中启动过程结束时的启动脚本。
Copy the file to the VM using the following command:
使用以下命令将文件复制到VM:
docker-machine scp docker_start_webapps dsdeploy:/var/lib/boot2docker/docker_start_webapps
Next we need to ssh into the VM to finalise the boot process and ensure our new script is called on startup. Run this command on your terminal:
接下来,我们需要进入虚拟机以完成启动过程,并确保在启动时调用新脚本。 在终端上运行以下命令:
docker-machine ssh dsdeploy -t
Now run the following commands inside the newly-created ssh session in the VM:
现在,在VM中新创建的ssh会话中运行以下命令:
sudo chmod a+x /var/lib/boot2docker/docker_start_webappssudo chmod o+w /var/lib/boot2docker/profilesudo ln -s /var/lib/boot2docker/docker_start_webapps /etc/profile.d/docker_start_webapps.shsudo chmod o-w /var/lib/boot2docker/profile
The main point here is just to link our new startup script into the /etc/profile.d folder so that it will be run automatically on startup.
这里的重点只是将我们的新启动脚本链接到/etc/profile.d文件夹中,以便它将在启动时自动运行。
Type exit
to quit the ssh session.
键入exit
退出ssh会话。
A large portion of the VM is wiped and reinitialised every time it restarts, hence we have needed some workarounds in this section to ensure our startup scripts are permanent.
每次重新启动时,大部分VM都会被擦除并重新初始化,因此我们在本节中需要一些变通办法,以确保启动脚本是永久的。
导出VirtualBox VM (Exporting the VirtualBox VM)
We are finally ready to export the VM as a file to be shared with our client.
我们终于准备好将VM导出为要与客户端共享的文件。
Run the following to pause the VM:
执行以下命令暂停虚拟机。
docker-machine stop dsdeploy
Then export the VM as a self-contained file using VirtualBox’s VBoxManage command:
然后使用VirtualBox的VBoxManage命令将VM导出为独立文件:
VBoxManage export dsdeploy --iso -o voila_sincos_saved.ova
If you can’t run the VBoxManage command for some reason, the above can be accomplished using the VirtualBox GUI application. (Export as OCI Container, ensuring you include the ISO image.)
如果由于某种原因无法运行VBoxManage命令,则可以使用VirtualBox GUI应用程序完成上述操作。 (导出为OCI容器,确保包含ISO映像。)
与客户分享 (Sharing with the client)
The OVA file voila_sincos_saved.ova can be sent to your client using the usual channels for large secure files. Please note it will be over 1 GB so email won’t be sufficient!
OVA文件voila_sincos_saved.ova可以使用大型安全文件的常用通道发送给您的客户端。 请注意,它将超过1 GB,因此电子邮件将不足!
On their computer, they should just run the friendly VirtualBox GUI application as usual. In the File menu select ‘Import Appliance…’
在他们的计算机上,他们应该照常运行友好的VirtualBox GUI应用程序。 在文件菜单中,选择“导入设备...”
Locate the OVA file and import it using the default settings.
找到OVA文件并使用默认设置将其导入。
There will be a limited screen display as the VM starts up, but it should be enough to see the final instructions we supplied: ‘Please visit this address in your web browser’ along with the URL to the Voila presentation.
VM启动时,屏幕显示将受到限制,但是足以看到我们提供的最终说明:“请在Web浏览器中访问此地址”以及Voila演示文稿的URL。
If they visit that URL they should be able to interact with our presentation exactly as we built it!
如果他们访问该URL,那么他们应该能够完全按照我们的构建方式与我们的演示文稿进行交互!
结论 (Conclusion)
It would certainly be neater to find a way to host this Voila application in the cloud. But sometimes this isn’t possible, for example if you don’t have permission to host all relevant data in the cloud — or if you can’t expect reliable access to the internet at the location where the presentation is due to be viewed.
寻找一种将这种Voila应用程序托管在云中的方法肯定会更明智。 但是有时候这是不可能的,例如,如果您无权在云中托管所有相关数据,或者您无法期望在将要查看演示文稿的位置可靠地访问Internet。
Another ‘purer’ approach would be to share the presentation directly as a Docker image. Our ContainDS desktop software enables this easily as described in another recent tutorial, but that relies on Docker being installable on both origin and destination machines.
另一种“纯粹”的方法是将演示文稿直接作为Docker映像共享。 我们的ContainDS桌面软件可以像最近的另一篇教程中所描述的那样轻松地实现这一点,但是它依赖于Docker既可以安装在原始计算机上也可以安装在目标计算机上。
I hope this article helps you if you have this very specific need to ensure a client can access your work through VirtualBox.
如果您有非常具体的需要,以确保客户端可以通过VirtualBox访问您的工作,希望本文对您有所帮助。
Working with data science environments on your local machine is something that we are thinking about every day at ContainDS, and I’d be really pleased to hear how you are addressing these challenges, or if you need any help with them at all.
在ContainDS上,我们每天都在考虑在本地计算机上使用数据科学环境,我很高兴听到您如何应对这些挑战,或者是否需要任何帮助。