在AWS Lambda上使用mongodb客户端字段级加密-CSDN博客

MongoDB Client-Side Field Level Encryption (FLE) is a great tool to protect sensitive data like PII. Data is encrypted and decrypted on the client side, meaning even the database administrator or cloud provider can’t access the data in plaintext.

MongoDB 客户端字段级加密 (FLE)是保护PII等敏感数据的绝佳工具。数据在客户端进行加密和解密，这意味着即使数据库管理员或云提供商也无法访问纯文本数据。

Following the official guide in the docs it was fairly easy to setup automatic FLE with an Atlas M10 cluster. The problems started when I tried to port the same setup to AWS Lambda.

按照文档中的官方指南，使用Atlas M10集群设置自动FLE相当容易。当我尝试将相同的设置移植到AWS Lambda时，问题就开始了。

This is how a lambda would look after you completed the MongoDB guide.

这是您完成MongoDB指南后lambda的外观。

const { Binary, MongoClient } = require("mongodb");
const path = require("path");


const connectionString =
  "mongodb+srv://OMMITED.mongodb.net/yourDatabase?retryWrites=true&w=majority";
const keyVaultNamespace = "yourDatabase.__keyVault";
const base64KeyId = "OMITTED";


const createSchema = () => {
  return {
    "yourDatabase.test": {
      bsonType: "object",
      encryptMetadata: {
        keyId: [new Binary(Buffer.from(base64KeyId, "base64"), 4)],
      },
      properties: {
        foo: {
          encrypt: {
            bsonType: "string",
            algorithm: "AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic",
          },
        },
      },
    },
  };
};


const kmsProviders = {
  aws: {
    accessKeyId: "OMITTED",
    secretAccessKey: "OMITTED",
  },
};


module.exports.hello = async (event) => {
  const secureClient = new MongoClient(connectionString, {
    connectTimeoutMS: 7000,
    useNewUrlParser: true,
    useUnifiedTopology: true,
    autoEncryption: {
      keyVaultNamespace,
      kmsProviders,
      schemaMap: createSchema()
    },
  });


  await secureClient.connect();


  const collection = secureClient.db("development").collection("test");


  await collection.insertOne({
    foo: "bar",
  });


  const resp = await collection.find({}).toArray();


  return {
    statusCode: 200,
    body: JSON.stringify(resp),
  };
};

The code above does results in a connection error when running on lambda: connect ECONNREFUSED 127.0.0.1:27020

上面的代码在lambda上运行时确实导致连接错误： connect ECONNREFUSED 127.0.0.1:27020

In order to use Automatic FLE you need a running instance of the mongocryptd process. Otherwise you get the error above. The npm wrapper package mongodb-client-encryption by default spawns an instance for you if you have the binary of mongocryptd installed.

为了使用自动FLE，您需要一个正在运行的mongocrypted进程实例。否则，您会收到上述错误。如果安装了mongocrypted二进制文件，则默认情况下，npm包装程序包mongodb-client-encryption会为您生成一个实例。

However on lambda I still could not connect to my cluster. Since the mongocryptd process does not log any errors to the console (probably due to security reasons) it was rather difficult to debug. There are three things you need to do to fix this:

但是在lambda上，我仍然无法连接到群集。由于mongocrypt进程不会将任何错误记录到控制台(可能是由于安全原因)，因此调试起来相当困难。要解决此问题，需要做三件事：

1.包括mongocrypted库 (1. Include the mongocryptd library)

You need the compiled mongocryptd binary for Amazon Linux 2 and put it in a folder inside you lambda deployment package, e. g. in a folder called bin . Add the path to the binary inside your deployment package to your autoEncryption configuration like this:

您需要Amazon Linux 2的已编译mongocrypted二进制文件，并将其放在lambda部署包内的文件夹中，例如，放在名为bin的文件夹中。将部署包中的二进制文件的路径添加到autoEncryption配置中，如下所示：

autoEncryption: {
    keyVaultNamespace,
    kmsProviders,
    schemaMap: createSchema(),
    extraOptions: {
        mongocryptdSpawnPath: `${process.env.LAMBDA_TASK_ROOT}/bin/mongocryptd`,
    },
},

Please note that process.env.LAMBDA_TASK_ROOT is an env set by lambda that points to the directory that includes your handler file. Make sure that the binary has the correct file permissions so it is executable.

请注意， process.env.LAMBDA_TASK_ROOT是lambda设置的环境，它指向包含您的处理程序文件的目录。确保二进制文件具有正确的文件许可权，以便可执行。

2.包括共享库 (2. Include shared libraries)

The lambda execution context is missing a lot of shared libraries that are included in the Amazon Linux 2 docker image. Since the the mongocryptd process fails silently as mentioned above I reverted to the following test lambda to spin up the process manually so I could get proper logs:

lambda执行上下文缺少Amazon Linux 2 docker映像中包含的许多共享库。如上所述，由于mongocrypted进程无提示失败，因此我恢复为以下测试lambda来手动启动进程，以便获得正确的日志：

module.exports.hello = async (event) => {
  process.env.PATH = `${process.env.PATH}:${process.env.LAMBDA_TASK_ROOT}/bin`;


  const child = spawn("mongocryptd", [
    "--idleShutdownTimeoutSecs=60",
  ]);


  child.on("error", (error) => console.log(error));


  child.stdout.on("data", (data) => {
    console.log(data.toString());
  });


  child.stderr.on("data", (data) => {
    console.error(`grep stderr: ${data}`);
  });


  child.unref();


  await new Promise((resolve, reject) =>
    setTimeout(() => {
      resolve();
    }, 3000)
  );


  return {
    statusCode: 200,
    body: JSON.stringify(resp),
  };
};

Running this will show you the missing libraries in the error message. You can spin up an Amazon Linux 2 docker image and copy the libraries from the lib64 folder of the image. I copied all missing libraries to my lambda deployment package into a folder inside my deployment package called lib . At the time of writing this, the following lib were missing:

运行此命令将在错误消息中向您显示缺少的库。您可以启动Amazon Linux 2 docker映像，然后从映像的lib64文件夹复制库。我将所有缺少的库都复制到了lambda部署包中，并部署到了部署包中一个名为lib的文件夹中。在撰写本文时，缺少以下lib：

libcrypt.so.1
libcrypt.so.1
libidn2.so.0
libidn2.so.0
libldap-2.4.so.2
libldap-2.4.so.2
libnghttp2.so.14
libnghttp2.so.14
libsasl2.so.3
libsasl2.so.3
libssh2.so.1
libssh2.so.1
libunistring.so.0
libunistring.so.0
libcurl.so.4
libcurl.so.4
liblber-2.4.so.2
liblber-2.4.so.2
liblzma.so.5
liblzma.so.5
libnss3.so
libnss3.so
libsmime3.so
libsmime3.so
libssl3.so
libssl3.so

Make sure to copy the correct symlinks too, as mongocryptd will not call the specific version installed in the Docker image. The lib folder is automatically loaded by lambda. If you put these libraries in a different folder inside your deployment package you need to append that path to the LD_LIBRARY_PATH.

确保也复制正确的符号链接，因为mongocryptd不会调用Docker映像中安装的特定版本。 lib文件夹由lambda自动加载。如果将这些库放在部署包内的其他文件夹中，则需要将该路径附加到LD_LIBRARY_PATH 。

3.设置PID的路径 (3. Set a path for the PID)

Lambda give you read-only access to your deployment package. By default the mongocryptd process would write a PID file containing the process id to your lambda package root. This results in an error as the folder is not writable. Hence the mongocryptd process fails and you are not able to connect to your cluster.

Lambda为您提供对部署程序包的只读访问权限。默认情况下，mongocrypted进程会将包含进程ID的PID文件写入您的lambda软件包根目录。由于该文件夹不可写，因此会导致错误。因此，mongocrypted进程失败，您将无法连接到集群。

There is however one folder that is writable: Lambda offers 512MB of temporary storage in a folder called tmp. Its path is path.resolve(process.env.LAMBDA_TASK_ROOT, "../../tmp");

但是，有一个可写的文件夹：Lambda在一个名为tmp的文件夹中提供512MB的临时存储。它的路径是path.resolve(process.env.LAMBDA_TASK_ROOT, "../../tmp");

So we need to tell the mongocryptd process where to place the PID file.

因此，我们需要告诉mongocrypted进程将PID文件放置在何处。

Our final lambda would look like this:

我们最终的lambda将如下所示：

const { Binary, MongoClient } = require("mongodb");
const path = require("path");


const connectionString =
  "mongodb+srv://OMMITED.mongodb.net/yourDatabase?retryWrites=true&w=majority";
const keyVaultNamespace = "yourDatabase.__keyVault";
const base64KeyId = "OMITTED";


const createSchema = () => {
  return {
    "yourDatabase.test": {
      bsonType: "object",
      encryptMetadata: {
        keyId: [new Binary(Buffer.from(base64KeyId, "base64"), 4)],
      },
      properties: {
        foo: {
          encrypt: {
            bsonType: "string",
            algorithm: "AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic",
          },
        },
      },
    },
  };
};


const kmsProviders = {
  aws: {
    accessKeyId: "OMITTED",
    secretAccessKey: "OMITTED",
  },
};


module.exports.hello = async (event) => {
  const tmpPath = path.resolve(process.env.LAMBDA_TASK_ROOT, "../../tmp");
  process.env.LD_LIBRARY_PATH = `${process.env.LD_LIBRARY_PATH}:${process.env.LAMBDA_TASK_ROOT}/lib`;


  const secureClient = new MongoClient(connectionString, {
    connectTimeoutMS: 7000,
    useNewUrlParser: true,
    useUnifiedTopology: true,
    autoEncryption: {
      keyVaultNamespace,
      kmsProviders,
      schemaMap: createSchema(),
      extraOptions: {
        mongocryptdSpawnArgs: [`--pidfilepath=${tmpPath}/mongocryptd.pid`],
        mongocryptdSpawnPath: `${process.env.LAMBDA_TASK_ROOT}/bin/mongocryptd`,
      },
    },
  });


  await secureClient.connect();


  const collection = secureClient.db("development").collection("test");


  await collection.insertOne({
    foo: "bar",
  });


  const resp = await collection.find({}).toArray();


  return {
    statusCode: 200,
    body: JSON.stringify(resp),
  };
};

With this you should be able to connect to your cluster. Since the execution context, including the tmp folder with the process id, might be reused between invocations, we should probably clean up after ourselves. I will leave this up to you…

这样，您应该可以连接到群集。由于执行上下文(包括带有进程ID的tmp文件夹)可能在两次调用之间被重用，因此我们应该自己清理一下。我会把这个留给你...

更新：2020年9月1日 (Update: September 1st 2020)

I faced another obstacle when migrating this proof of concept to production. The NodeJS bindings of the libmongocrypt module in my opinion present the possibility of a race condition.

在将这种概念证明移植到生产中时，我遇到了另一个障碍。在我看来，libmongocrypt模块的NodeJS绑定提出了竞争条件的可能性。

What happens: There can be cases where the mongo driver will try to connect to your cluster while the mongocryptd process has spawned but is not ready yet. This timeframe is tiny but I had a lot of errors in my lambda that we unsteady, they came and went, so I checked the source code. If you connect while the mongocryptd process is not ready you will get a connection refused error or some other weird errors like “topology does not support sessions”.

发生的情况：在某些情况下，mongo驱动程序会在mongocrypted进程已生成但尚未准备就绪的情况下尝试连接到群集。这个时间范围很小，但是我在lambda中有很多错误，我们不稳定，它们来来往往，所以我检查了源代码。如果在mongocrypt进程尚未准备就绪时进行连接，则会收到连接被拒绝错误或其他一些奇怪的错误，例如“拓扑不支持会话”。

Although the comment in the linked source code above says that the spawn method will wait for process “to be up”, it does not really do that. At least it is debatable what “up” means in this case. It hands of the spawning of the process to the OS and waits until this “handing off” has happened, using process.nextTick(callback). But this does not necessarily mean that the process is “up” in the sense that the process is ready to accept incoming connections.

尽管上面链接的源代码中的注释说，spawn方法将等待进程“启动”，但实际上并没有这样做。在这种情况下，至少“ up”是有争议的。它使用process.nextTick(callback)将进程的生成工作交付给OS，并等待该“切换”发生。但这并不一定意味着该过程已“就绪”，因为该过程已准备好接受传入的连接。

I reached out to MongoDB support regarding this issue. Hopefully they will fix this soon.

我就此问题与MongoDB取得了联系。希望他们会尽快解决。

In the meantime a fix would be to manually spawn the mongocryptd process, disable automatic spawning in the driver and wait for the process to be up. The mongocryptd process issued a message when it’s ready. Another problem here is that the mongo db bson package that should be able to parse mongo extended JSON logs, is in fact not be able to do this. It fails to parse the logs outputted by the mongocryptd process.

同时，一种解决方法是手动生成mongocrypted进程，在驱动程序中禁用自动生成，并等待进程启动。准备就绪时，mongocrypted进程会发出一条消息。这里的另一个问题是，应该能够解析mongo扩展JSON日志的mongo db bson包实际上无法执行此操作。它无法解析mongocrypted进程输出的日志。

// this is the writable folder where our .pid file live
const tmpPath = path.resolve(process.env.LAMBDA_TASK_ROOT, "../../tmp");


// if the file exists our ephemeral function container got reused, so we do not need to spawn the process again
if (!fs.existsSync(tmpPath + "/mongocryptd.pid")) {
await new Promise((resolve, reject) => {
    // to prevent waiting for godot
    const safeGuard = setTimeout(() => reject(), 3000);


    const child = spawn(
    "mongocryptd",
    ["--idleShutdownTimeoutSecs=240", `--pidfilepath=${tmpPath}/mongocryptd.pid`],
    {
        // note that we need stdio in order to determine when the process is ready to accept connections
        detached: true,
    }
    );


    child.stdout.on("data", (data) => {
      // I checked the mongocryptd logs
      // id:23016 means the process is ready to accept incoming connections
      // this looks hacky, but the bson package is unable to parse the mongocryptd logs
    if (data.toString().includes(`"id":23016`)) {
        console.debug("Mongocryptd is ready to accept connecitons. Proceed to connect to cluster.");
        clearTimeout(safeGuard);
        resolve();
    }
    });


    child.stderr.on("data", (data) => {
      // you could add debug logs here
      reject(data);
    });


    child.unref();
});
}

Since we manually spawn the process we need to alter our autoEncryption.extraOptions object like so

由于我们手动生成了该过程，因此我们需要像这样更改autoEncryption.extraOptions对象

extraOptions: {mongocryptdBypassSpawn: true}

更新：2020年9月8日 (Update: September 8th 2020)

Thank you to Matt from the MongoDB team. The issue is that the libmongocrypt library uses an internal MongoClient to connect to the mongocryptd process. This internal client does not use the default or use set serverSelectionTimeout, but a fixed timeout of 1 second. Depending on your lambda env and application this can lead to a situation where it is not possible to connect to your cluster, since the driver could not connect to the mongocryptd process. I asked if this could be changed or at least if the timeout could be set via parameter. Currently the only way to fix it, is to monkey-patch the libmongocrypt by yourself and set a higher timeout: https://github.com/mongodb/libmongocrypt/blob/master/bindings/node/lib/autoEncrypter.js#L99

感谢MongoDB团队的Matt。问题是libmongocrypt库使用内部MongoClient连接到mongocrypted进程。此内部客户端不使用默认值，也不使用set serverSelectionTimeout设置，而是固定的1秒超时。根据lambda env和应用程序的不同，这可能导致无法连接到群集的情况，因为驱动程序无法连接到mongocrypted进程。我问这是否可以更改，或者至少可以通过参数设置超时时间。当前解决此问题的唯一方法是自行猴子修补libmongocrypt并设置更高的超时时间： https : //github.com/mongodb/libmongocrypt/blob/master/bindings/node/lib/autoEncrypter.js#L99

Again, whether you encounter this problem at all depends a lot on your application. In my case the lambda function is a background task with a rather large codebase and cold start time over 1s.

同样，您是否遇到此问题完全取决于您的应用程序。在我的情况下，lambda函数是一个后台任务，具有相当大的代码库，启动时间超过1秒。

I will append this post once there is a permanent solution out there.

一旦有永久解决方案，我将在这篇文章后面追加。