高效使用Tigergraph和Docker

TigerGraph is my graph database and graph analytics platform of choice as it is fast, scalable, and has an active open-source community. I regularly make use of TigerGraph locally due to my location not having nearby TigerGraph Cloud servers.

TigerGraph是我选择的图形数据库和图形分析平台,因为它快速,可扩展且具有活跃的开源社区。 由于我所在的位置附近没有TigerGraph Cloud服务器,因此我经常在本地使用TigerGraph。

At the time of writing, the TigerGraph software requirements specify support for the following operating systems:

在撰写本文时, TigerGraph软件要求指定了对以下操作系统的支持:

  • Redhat and Centos versions 6.5–6.9, 7.0–7.4, and 8.0–8.2

    Redhat和Centos 6.5-6.9、7.0-7.4和8.0-8.2版本
  • Ubuntu 14.04, 16.04, and 18.04

    Ubuntu 14.04、16.04和18.04
  • Debian 8

    Debian 8

For anyone using operating systems beyond this list, a logical solution would be to make use of containerization: Docker, in the case of this article.

对于使用此列表以外的操作系统的任何人,一个合理的解决方案是利用容器化:就本文而言,是Docker。

In this article we will cover:

在本文中,我们将介绍:

  1. How to make use of the official TigerGraph and what’s inside

    如何利用官方的TigerGraph及其内部功能
  2. Stripping the official Docker image of unnecessary bloat

    删除不必要的膨胀的官方Docker映像
  3. Modifying the ENTRYPOINT to add:

    修改ENTRYPOINT以添加:
  • Running gadmin on startup

    启动时运行gadmin

  • Run GSQL scripts bound at a certain directory

    运行绑定到特定目录的GSQL脚本
  • Route the output from a log file to STDOUT

    将输出从日志文件路由到STDOUT

4. Using Docker Compose to run TigerGraph images

4.使用Docker Compose运行TigerGraph图像

TigerGraph官方图片 (The Official TigerGraph Image)

The official TigerGraph image, running the developer edition, can be obtained by the following command:

可以通过以下命令获取运行开发版的官方TigerGraph图像:

docker pull docker.tigergraph.com/tigergraph-dev:latest

Run with:

运行:

docker run -d -p 14022:22 -p 9000:9000 -p 14240:14240 --name tigergraph_dev --ulimit nofile=1000000:1000000 -v ~/data:/home/tigergraph/mydata -t docker.tigergraph.com/tigergraph-dev:latest

This source gives more in-depth instructions on how the image is constructed, but in summary:

资源提供了有关如何构造图像的更深入的说明,但总而言之:

  • A base image of Ubuntu 16.04 is used

    使用Ubuntu 16.04的基本映像
  • All required software such as tar, curl, etc. are installed

    安装了所有必需的软件,例如tar,curl等。
  • Optional software such as emacs, vim, wget, etc. are installed

    安装了可选软件,例如emacs,vim,wget等。
  • GSQL 101 and 102 tutorials and the GSQL Algorithms library is downloaded

    已下载GSQL 101和102教程以及GSQL算法库
  • An SSH server, REST++ API, and GraphStudio are the 3 notable ports which can be exposed and used to communicate with the server

    SSH服务器,REST ++ API和GraphStudio是3个值得注意的端口,可以公开并用于与服务器通信

The total image is close to a 1.8–2.0GB download (version dependent) which puts considerable strain on bandwidth — especially with resource-sensitive use cases like CI/CD. Another notable point is that all one needs to make use of TigerGraph is a GSQL socket connection which can be interfaced with by tools such as Giraffle and pyTigerGraph.

整个映像的下载量接近1.8–2.0GB(取决于版本),这对带宽造成了很大的压力-尤其是对于资源敏感的用例(如CI / CD)。 另一个值得注意的一点是,所有需要使用TigerGraph的地方都是GSQL套接字连接,可以通过GirafflepyTigerGraph之类的工具进行接口。

I’ve identified two large sources of bloat which are:

我发现了两个主要的of肿来源:

  • The optional and unnecessary software e.g. vim and GSQL Tutorial 101

    可选和不必要的软件,例如vim和GSQL教程101
  • GraphStudio and binaries not necessary for the minimal operation of TigerGraph Developer Edition

    TigerGraph Developer Edition的最小操作不需要GraphStudio和二进制文件

剥离TigerGraph图像 (Stripping the TigerGraph Image)

I’ve replaced the base image ubuntu:16.04 with Bitnami’s MiniDeb image in order to shave off a few megabytes of unnecessary space. This runs Debian Jessie.

我已经用Bitnami的MiniDeb映像替换了基本映像ubuntu:16.04 ,以节省掉几兆字节的不必要的空间。 这将运行Debian Jessie。

The next step was to remove unnecessary binaries installed during the apt-get stage of the official image. I’ve kept Vim as the only command-line text editor but binaries such as wget, git, unzip, emacs, etc. are no longer installed.

下一步是删除在官方映像的apt-get阶段安装的不必要的二进制文件。 我将Vim保留为唯一的命令行文本编辑器,但不再安装wgetgitunzipemacs等二进制文件。

During the TigerGraph installation, the hardware requirements are strictly enforced and the installation will fail if they are not met. Since I want DockerHub runners to automatically build and push my image, I hacked the check such that the low-resource runners can continue to build the image.

在TigerGraph安装过程中,将严格执行硬件要求 ,如果不满足这些要求 ,则安装将失败。 由于我希望DockerHub运行程序自动生成并推送我的映像,因此我砍掉了支票,以便资源匮乏的运行程序可以继续生成映像。

This is done by replacing the os_utils binary with my version, which makes the check_cpu_number() and check_memory_capacity() functions more lenient. This binary can be found under:

这可以通过用我的版本替换os_utils二进制文件来完成,这使得check_cpu_number()check_memory_capacity()函数更加宽松。 该二进制文件可以在下面找到:

/home/tigergraph/tigergraph-${DEV_VERSION}-developer/utils/os_utils

This has already reduced the bloat by around 400MB and my DockerHub image reports a compressed size TigerGraph 3.0.0 of 1.52GB (I did notice that downloading these layers indicates that it comes to around 1.62GB).

这已经减少了约400MB的膨胀,我的DockerHub映像报告了1.52GB的TigerGraph 3.0.0压缩大小(我确实注意到下载这些层表明它大约为1.62GB)。

Note: I have attempted to haphazardly delete GraphStudio binaries but this fails the gadmin start script so there will have to be more meticulous adjustments made in order to remove more from TigerGraph, e.g editing the gadmin Python scripts.

注意:我试图随意删除GraphStudio二进制文件,但这会使gadmin start脚本失败,因此,为了从TigerGraph中删除更多内容,必须进行更细致的调整,例如,编辑gadmin Python脚本。

The final result between the two images once I’ve downloaded and uncompressed them can be seen by calling docker images:

下载并解压缩后,两个映像之间的最终结果可以通过调用docker images来查看:

The source code for my build can be found here and I encourage anyone with suggestions to contact me!

我的构建的源代码可以在这里找到,我鼓励任何有建议的人与我联系!

修改ENTRYPOINT (Modifying the ENTRYPOINT)

Before we add features let’s have a look at the original ENTRYPOINT:

在添加功能之前,让我们看一下原始的ENTRYPOINT

ENTRYPOINT /usr/sbin/sshd && su — tigergraph bash -c “tail -f /dev/null”

This does two things:

这有两件事:

  1. The SSH server is started by running /usr/sbin/ssh .

    通过运行/usr/sbin/ssh启动SSH服务器。

  2. The container is kept alive by running thetail command as user tigergraph. What this does is constantly read output from /dev/null which is also why the container’s STDOUT is empty.

    通过以tigergraph用户tigergraph运行tail命令,可以使容器保持活动tigergraph 。 这样做是不断从/dev/null读取输出,这也是为什么容器的STDOUT为空的原因。

在启动时启动“ gadmin” (Starting “gadmin” on Startup)

As a way to improve user experience I’ve added a line that starts gadmin services on the Docker entry point from

为了改善用户体验,我添加了一行从Docker入口点启动gadmin服务,从

ENTRYPOINT /usr/sbin/sshd && su - tigergraph bash -c "tail -f /dev/null"

to

ENTRYPOINT /usr/sbin/sshd && su - tigergraph bash -c "/home/tigergraph/tigergraph/app/cmd/gadmin start all && tail -f /dev/null"

A very simple but valuable change!

一个非常简单但有价值的更改!

使用卷在启动时运行GSQL脚本 (Run GSQL Scripts on Startup Using Volumes)

Something that the TigerGraph Docker Image lacks (which other database images such as MySQL, MariaDB, and PostgreSQL has) is a directory named something along the lines of docker-entrypoint-init.d where a user can bind database scripts to run at startup e.g. for schema creation or database population.

TigerGraph Docker映像缺少的东西(MySQL,MariaDB和PostgreSQL等其他数据库映像所缺少的)是一个名为docker-entrypoint-init.d ,用户可以在其中绑定数据库脚本以在启动时运行,例如用于模式创建或数据库填充。

There are various ways to go about this but I’ve chosen a fairly simple way of implementing this by adding the following line between the gadmin and tail command:

有多种解决方法,但是我通过在gadmintail命令之间添加以下行,选择了一种非常简单的实现方法:

How this command works is:

该命令的工作方式是:

  1. The if-command will check if a directory called /docker-entrypoint-initdb.d exists and will not perform the next step unless this is true.

    if -command将检查是否有目录称为/docker-entrypoint-initdb.d存在,除非这是真的不会进行下一个步骤。

  2. The for file in /docker-entrypoint-initdb.d/*.gsql; do line will start a for-each loop of all the files ending with the gsql extension in the entry point folder.

    for file in /docker-entrypoint-initdb.d/*.gsql; dofor file in /docker-entrypoint-initdb.d/*.gsql; do for file in /docker-entrypoint-initdb.d/*.gsql; do line将启动入口点文件夹中所有以gsql扩展名结尾的文件的for-each循环。

  3. The su tigergraph bash -c line will run the GSQL command on the file given by the for-each loop.

    su tigergraph bash -c行将在for-each循环给定的文件上运行GSQL命令。

  4. By appending || continue, the container nor the loop will stop if the script failed to execute.

    通过附加|| continue || continue ,容器也不如果脚本未能执行循环将停止。

This will most likely look neater if placed into an entrypoint.sh but this is up to you! The final result now looks something like this:

如果将其放置在entrypoint.sh则看起来更整洁,但这取决于您! 现在,最终结果如下所示:

将日志路由到STDOUT (Routing logs to STDOUT)

In order to figure out where the logs belong, one can run gadmin log which will return something along the lines of

为了弄清楚日志的位置,可以运行gadmin log ,它将返回以下内容:

ADMIN  : /home/tigergraph/tigergraph/log/admin/ADMIN#1.out
ADMIN : /home/tigergraph/tigergraph/log/admin/ADMIN.INFO
CTRL : /home/tigergraph/tigergraph/log/controller/CTRL#1.log
CTRL : /home/tigergraph/tigergraph/log/controller/CTRL#1.out
DICT : /home/tigergraph/tigergraph/log/dict/DICT#1.out
DICT : /home/tigergraph/tigergraph/log/dict/DICT.INFO
ETCD : /home/tigergraph/tigergraph/log/etcd/ETCD#1.out
EXE : /home/tigergraph/tigergraph/log/executor/EXE_1.log
EXE : /home/tigergraph/tigergraph/log/executor/EXE_1.out
...etc

I’m mostly interested in the admin logs so I will change the tail command to read from /home/tigergraph/tigergraph/log/admin/ADMIN.INFO instead of /dev/null.

我对管理日志最感兴趣,因此我将tail命令更改为从/home/tigergraph/tigergraph/log/admin/ADMIN.INFO而不是/dev/null读取。

Now anything written to the admin logs will be piped to the container’s logs automatically. The final product from all three steps are now:

现在,写入管理日志的所有内容都将自动通过管道传递到容器的日志中。 这三个步骤的最终产品现在是:

将Docker Compose与TigerGraph一起使用 (Using Docker Compose with TigerGraph)

Note that I have added a health check which calls the REST++ echo endpoint every 5 seconds to determine if the container is healthy or not. If you use the official image, you would need to SSH into the container to manually start all services:

请注意,我添加了一个运行状况检查,该检查每5秒调用一次REST ++回显端点,以确定容器是否正常。 如果使用官方映像,则需要SSH进入容器以手动启动所有服务:

If you would like a GSQL script to run on startup, add the following entry under volumes:

如果您希望GSQL脚本在启动时运行,请在volumes下添加以下条目:

- my_script.gsql:/docker-entrypoint-initdb.d/my_script.gsql

Note that I have added a health check which calls the REST++ echo endpoint every 5 seconds to determine if the container is healthy or not. This is useful for many applications and, in my use-case, is used to check if the container is ready during integration testing before starting the tests.

请注意,我添加了一个运行状况检查,该检查每5秒调用一次REST ++回显端点,以确定容器是否正常。 这对于许多应用程序很有用,在我的用例中,用于在开始测试之前在集成测试期间检查容器是否准备就绪。

If you use the official image, you would need to SSH into the container to manually start all services:

如果使用官方映像,则需要SSH进入容器以手动启动所有服务:

The default password is “tigergraph” after which you would call the command

默认密码为“ tigergraph”,之后将调用该命令

gadmin start all (v3.0.0 >=)
gadmin start (v3.0.0<)

GraphStudio can then be found on localhost:14240 on your web browser and Rest++ can be found on localhost:9000.

然后,可以在Web浏览器的localhost:14240上找到GraphStudio,而在localhost:9000上找到Rest ++。

结论 (Conclusion)

In this article we have:

在本文中,我们有:

  • Inspected the official Dockerfile,

    检查了官方的Dockerfile,
  • identified and removed obvious unnecessary files,

    确定并删除了明显的不必要文件,
  • built a slimmer version of the TigerGraph image saving close to 500MB of storage space,

    构建了更薄的TigerGraph图像版本,可节省近500MB的存储空间,
  • modified the ENTRYPOINT to add additional automation to the container, and

    修改了ENTRYPOINT以向容器添加其他自动化功能,并且

  • used Docker Compose to run this image.

    使用Docker Compose运行该映像。

If you have any suggestions or thoughts on ways to further reduce the size of the image, then leave a comment, issue, or fork and give a pull request on the GitHub repository for the code in this article. If you would like to see more Docker-related guides for building custom setups for databases such as JanusGraph then leave a comment below.

如果您对进一步减小图像大小的方法有任何建议或想法,请在本文的GitHub存储库上留下评论,问题或分叉并提出拉取请求。 如果您想了解更多有关为JanusGraph之类的数据库构建自定义设置的Docker相关指南,请在下面发表评论。

You can find these images on Docker Hub, and I will continue to update this as new versions of come out or until TigerGraph makes an official slimmer version.

您可以在Docker Hub上找到这些映像,我将继续更新为新版本,或者直到TigerGraph成为正式的更薄版本为止。

If you would like to join the TigerGraph community and contribute, or start awesome projects and get a deeper look into what’s coming for TigerGraph, then join us on the following platforms:

如果您想加入TigerGraph社区并做出贡献,或者启动很棒的项目并更深入地了解TigerGraph的发展趋势,请加入以下平台:

If you are interested in seeing some of my other work, then have a look at my personal page at https://davidbakereffendi.github.io/.

如果您有兴趣查看我的其他作品,请访问https://davidbakereffendi.github.io/查看我的个人页面。

Credits go to Jon Herke at TigerGraph on his leadership in the community and equipping us to contribute in meaningful ways.

感谢TigerGraph的Jon Herke在社区中的领导,使我们能够以有意义的方式做出贡献。

翻译自: https://towardsdatascience.com/efficient-use-of-tigergraph-and-docker-5e7f9918bf53

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值