Much like many other software companies, Pipedrive utilizes the power of data ‘logging’ significantly as we find it’s extremely useful when analyzing issues if/when they come up. While logging is obviously important so is privacy which, by design, is one of our core values when it comes to development.
与许多其他软件公司一样,Pipedrive大大利用了数据“记录”的功能,因为我们发现在分析问题(如果有)时非常有用。 日志记录显然很重要,而隐私在设计上也很重要,从设计上讲,隐私是我们的核心价值之一。
In order for logging to really be useful, one would expect that the maximum amount of contextual information is logged. The more context you have, the less time it takes to understand the specifics of what went wrong in any particular situation (the quality of errors messages is also just as important, but that’s a topic for another day).
为了使日志记录真正有用,人们期望记录最大数量的上下文信息。 您拥有的上下文越多,在任何特定情况下了解发生错误的细节所花费的时间就越少(错误消息的质量也同样重要,但这是另一天的话题)。
While more context makes understanding an issue easier, exposing private information (names, emails, addresses, etc.) in logs is not an option for us — we value the privacy of our customers and wouldn’t want to take advantage of their trust.
尽管有了更多的上下文可以更轻松地理解问题,但对于我们来说,在日志中公开私人信息(姓名,电子邮件,地址等)不是一个选择-我们重视客户的隐私,并且不想利用他们的信任。
问题:我们如何在完整的上下文日志记录和防止任何隐私侵犯之间找到平衡? (The problem: How do we find a balance between full contextual logging and preventing any privacy violations?)
As it turns out, using Golang and protocol buffers to communicate between microservices seems to solve the problem fairly seamlessly.
事实证明,使用Golang和协议缓冲区在微服务之间进行通信似乎可以无缝地解决该问题。
This year, near the beginning of March, an article titled “A New Go API for Protocol Buffers” was published in “The Go Blog” and in this article, a new version of API came with a very handy reflection of protocol messages. We gave this API a spin and tried to make our logged fields automatically white-listed, exactly at the protocol messages design stage.
今年三月初,在“ The Go Blog ”上发表了一篇名为“ 用于协议缓冲区的新Go API ” 的文章,在本文中,新版本的API很好地反映了协议消息。 我们对该API进行了一些调整,并尝试使我们的记录字段在协议消息设计阶段完全自动列入白名单。
动手:Sample Logger及其陷阱 (Hands on: Sample Logger and it’s pitfalls)
Sample logger wrapper:
记录器包装样本:
Once it is used, it will just output to a standard logger — so let’s imagine we’re using it in some controller:
一旦使用了它,它只会输出到一个标准的记录器中-因此,让我们想象一下我们正在某个控制器中使用它:
Our main function calls a controller (of course data doesn’t come to an application hard-coded in another file — it’s usually through something like forms, but for the sake of simplicity we won’t build a web page):
我们的主要功能称为控制器(当然,数据不会到达硬编码在另一个文件中的应用程序中-通常是通过诸如表单之类的东西,但是为了简单起见,我们不会构建网页):
This all looks fine until we see what kind of information is stored inside the Company message:
在我们看到公司消息中存储了什么样的信息之前,一切看起来都很好:
If we run our main function a couple of times (until a random failure), we will eventually receive an output showing:
如果我们多次运行main函数(直到随机失败),我们最终将收到如下输出:
1970/01/01 00:00:00 error: failed to process company id:11 owner:{id:1 name:"Batman" email:"batman@cave.com" title:{id:100001 name:"CLSO - Chief Life Savior Officer"}} coOwner:{id:2 name:"Catwoman" email:"catwoman@box.com" title:{id:100002 name:"CCO - Chef Cuddling Officer"}} size:3
Clearly, there’s a lot of sensitive information here which we’d like to avoid having in our logs.
显然,这里有很多敏感信息,我们希望避免在日志中出现。
具有参数清理功能的高级记录仪 (Advanced Logger with parameters sanitization)
It seems that inside our logger we need do some sanitizing steps before sending data to output:
似乎在记录器内部,我们需要执行一些清理步骤,然后再将数据发送到输出:
From the protocol message, we only want to leave in those fields which aren’t considered to contain any sensitive information. Ideally, there should be a way to state which field can be logged (whitelisting) inside the protocol message.
从协议消息中,我们只想保留那些不包含任何敏感信息的字段。 理想情况下,应该有一种方法可以声明可以在协议消息内记录哪个字段(列入白名单)。
As it turns out, the custom options for protocol messages is exactly what we were looking for. (If you want to check out the documentation on Custom Options, you can find it here: Language Guide — Custom Options. The documentation is for the proto2 syntax, but custom options are the same in the proto3 syntax version).
事实证明, 自定义选项 协议消息正是我们想要的。 (如果要查看有关“自定义选项”的文档,可以在这里找到: 语言指南-“自定义选项” 。该文档适用于proto2语法,但是自定义选项在proto3语法版本中相同)。
For what we need, we’re specifically interested in the section related to custom Field Options. Here we’ll introduce our custom field option which states which logField will be used when logging:
对于我们所需要的,我们对与自定义字段选项相关的部分特别感兴趣。 在这里,我们将介绍我们的自定义字段选项,该选项说明在记录日志时将使用哪个logField :
Here we’re extending the default google.protobuf.FieldOptions by providing our own options. Pay special attention to the number for our extension:
在这里,我们通过提供我们自己的选项来扩展默认的google.protobuf.FieldOptions。 请特别注意我们的分机号码:
One last thing: Since custom options are extensions, they must be assigned field numbers like any other field or extension. In the examples above, we have used field numbers in the range 50000–99999. This range is reserved for internal use within individual organizations, so you can use numbers in this range freely for in-house applications. — https://developers.google.com/protocol-buffers/docs/proto#customoptions
最后一件事:由于自定义选项是扩展名,因此必须像其他任何字段或扩展名一样为它们分配字段编号。 在上面的示例中,我们使用了50000–99999范围内的字段编号。 此范围保留供单个组织内部使用,因此您可以在内部应用程序中自由使用此范围内的数字。 — https://developers.google.com/protocol-buffers/docs/proto#customoptions
Below, you’ll see how we use our option by changing the company message description:
在下面,您将通过更改公司消息描述来了解我们如何使用我们的选项:
We simply add the option to all non-sensitive fields, so that any other field inside the message is considered to be sensitive.
我们只需将选项添加到所有非敏感领域,因此该消息内的任何其他领域被认为是敏感的 。
反射魔术消毒 (Reflection magic for sanitizing)
Reflection API has been improved in the recent version of ‘API for Protocol Buffers’ — https://blog.golang.org/protobuf-apiv2
在最新版本的“协议缓冲区API”中,对反射API进行了改进-https: //blog.golang.org/protobuf-apiv2
Regarding the improvement, .Range seems to be particularly handy in walking through protocol message fields so... let’s utilize it!
关于改进, .Range在遍历协议消息字段时似乎特别方便,因此...让我们利用它吧!
On the line where the case protoreflect.ProtoMessage
is used — the sanitizeProtoMessage
function will be called:
在使用大小写protoreflect.ProtoMessage
的行上,将调用sanitizeProtoMessage
函数:
With this, we’re going over all field values in the message, and processing those fields which have logField option specified. If the field is simple, the code combines the prefix and stores it to map[string]interface{}
. If the field is just another protocol message, it goes in recursion. This process guarantees that only fields with the logField option will end up in the log output:
这样,我们将遍历消息中的所有字段值,并处理指定了logField选项的那些字段。 如果该字段很简单,则代码将合并前缀并将其存储到map[string]interface{}
。 如果该字段只是另一个协议消息,则以递归方式进行。 此过程确保只有带有logField选项的字段才会出现在日志输出中:
The extractLogField
function extracts the value of the custom field option logField:
extractLogField
函数提取自定义字段选项logField的值:
The logFieldName
filters out empty prefixes and joins everything together using _
:
logFieldName
过滤出空前缀,并使用_
将所有内容连接在一起:
After all these manipulations, our new logger prints only the white-listed information to the output:
经过所有这些操作之后,我们的新记录器仅将白名单中的信息打印到输出中:
1970/01/01 00:00:00 error: failed to process company map[co_owner_profession_id:100002 co_owner_user_id:2 company_id:11 owner_profession_id:100001 owner_user_id:1 size:3]
Perfect! We can now keep our logs free of sensitive information, while still having enough context to debug issues.
完善! 现在,我们可以使日志中没有敏感信息,同时仍具有足够的上下文来调试问题。
This technique allows us to have any type of protocol messages with any kind of nesting, while being sure that only white-listed fields of messages will end up in our logging system.
这种技术使我们能够拥有任何类型的带有任何嵌套的协议消息,同时确保只有白名单的消息字段才能最终进入我们的日志记录系统。
Thank you very much for reading, and make sure to keep your customer’s data safe!
非常感谢您阅读,并确保确保客户数据的安全!