cloud pub/sub_将OpenTelemetry集成到Cloud Pub / Sub中

cloud pub/sub

可观察性问题 (The Observability Problem)

The Cloud Pub/Sub service provides a convenient way to communicate information as an alternative to a typical request-response model. This makes it an ideal system to use for event-driven applications that are composed of multiple services.

Cloud Pub / Sub服务提供了一种方便的方式来传达信息,可以替代典型的请求-响应模型。 这使得它成为用于由多个服务组成的事件驱动应用程序的理想系统。

However, the inner workings of Cloud Pub/Sub are largely hidden to the user. This makes debugging the cause of undelivered messages or large gaps between sending and receiving messages much more difficult, as there is not a good way to track a message from when it is published to the point where a subscriber picks it up.

但是,Cloud Pub / Sub的内部工作在很大程度上对用户隐藏。 这使得调试未传递消息的原因或发送和接收消息之间存在较大差距的原因变得更加困难,因为没有很好的方法来跟踪消息的发布时间(从发布时间到订阅者接收)。

行动中的问题 (The Problem in Action)

Take, for example, a Python chat application that sends messages through the Cloud Pub/Sub service. A message sent from a publisher to a subscriber always arrives, but there seems to be a significant delay between the point at which a publisher sends a message and when a subscriber receives it. For example, the message “hello world” can be sent to all clients.

以一个Python聊天应用程序为例,该应用程序通过Cloud Pub / Sub服务发送消息。 从发布者发送到订阅者的消息总是会到达,但是似乎在发布者发送消息的时间与订阅者接收消息之间存在明显的延迟。 例如,消息“ hello world”可以发送给所有客户端。

Image for post

The message eventually appears in the message list, but the time between when it was sent and when it appeared in the list of messages was approximately six seconds.

该消息最终出现在消息列表中,但是从发送到出现在消息列表之间的时间大约为六秒钟。

This delay is only present in some messages, and it does not always last for the same amount of time. Without some way to monitor a message throughout its delivery cycle, discovering the source of the inconsistent delay may be difficult.

此延迟仅出现在某些消息中,并且并不总是持续相同的时间量。 如果没有某种方法可以监视消息的整个传递周期,则很难发现不一致的延迟源。

与OpenTelemetry集成 (Integration with OpenTelemetry)

This is where OpenTelemetry comes into play. OpenTelemetry is an observability library that adds tracing and metrics to applications. When added to Cloud Pub/Sub, it provides a trace that enables more visibility into how a message is being handled outside the scope of an application. As of the time of writing, the Node.js and Python Cloud Pub/Sub clients have OpenTelemetry support. The Java and Go clients have OpenCensus support, the predecessor library to OpenTelemetry, but have not been instrumented with OpenTelemetry yet.

这就是OpenTelemetry发挥作用的地方。 OpenTelemetry是一个可观察性库,可为应用程序添加跟踪和度量。 当添加到Cloud Pub / Sub中时,它提供了一个跟踪,使您可以更清楚地了解如何在应用程序范围之外处理消息。 在撰写本文时,Node.js和Python Cloud Pub / Sub客户端都具有OpenTelemetry支持。 Java和Go客户端具有OpenCensus支持,它是OpenTelemetry的前身库,但尚未使用OpenTelemetry进行检测。

To discover the cause of the message delay, OpenTelemetry tracing can be added to this application. By default, OpenTelemetry is an optional feature. For the Python library, OpenTelemetry can be enabled simply by installing the pip packages:

为了发现消息延迟的原因,可以将OpenTelemetry跟踪添加到此应用程序。 默认情况下,OpenTelemetry是一项可选功能。 对于Python库,只需安装pip软件包即可启用OpenTelemetry:

pip3 install opentelemetry-api opentelemetry-sdk

pip3 install opentelemetry-api opentelemetry-sdk

Now that the packages have been installed, the publisher and subscriber need to be told how to create and handle traces. This is done through the tracer provider, which serves as a trace creation factory. For this example, when a trace is created, it will be exported as soon as the trace is completed. OpenTelemetry supports exporting to a variety of tracing backends, but for this example, traces will be exported to Google Cloud Trace. This can be set up in the publisher and subscriber by adding the following to both:

现在已经安装了软件包,需要告诉发布者和订阅者如何创建和处理跟踪。 这是通过充当跟踪创建工厂的跟踪提供程序完成的。 对于此示例,创建跟踪时,将在跟踪完成后立即将其导出。 OpenTelemetry支持导出到各种跟踪后端,但是在此示例中,跟踪将导出到Google Cloud Trace。 可以在发布者和订阅者中都添加以下内容来进行设置:

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleExportSpanProcessor# For this example, we will use the Google Cloud Trace exporter
from opentelemetry.exporter.cloud_trace import CloudTraceSpanExporter# Set the global tracer provider to the one that was imported
trace.set_tracer_provider(TracerProvider())# Tell the tracer provider to export spans to Google Cloud Trace when spans are completed
trace.get_tracer_provider().add_span_processor(
SimpleExportSpanProcessor(CloudTraceSpanExporter())
)

The application now knows how to handle exporting traces. All that is left to do is to redeploy it and send a new message in chat. An OpenTelemetry trace should appear in Google Cloud Trace that shows the new message being sent.

该应用程序现在知道如何处理导出跟踪。 剩下要做的就是重新部署它并在聊天中发送新消息。 OpenTelemetry跟踪应该出现在Google Cloud Trace中,该跟踪显示正在发送的新消息。

Image for post

Looking at the trace that was generated by the Cloud Pub/Sub message, it appears that the message was delivered to the same subscriber multiple times. This could indicate that there is nothing wrong with the message delivery, but rather that when the message is received by the application’s subscriber, something is wrong with the way that new messages are handled when they are received. We can take a look at the subscriber to see what’s going on.

查看由Cloud Pub / Sub消息生成的跟踪,看来该消息已多次传递给同一订户。 这可能表明消息传递没有问题,而是当应用程序的订户接收到消息时,接收新消息时处理新消息的方式出了问题。 我们可以看看订户,看看发生了什么。

def callback(message):
if (randint(1, 10) < 5):
return
received_messages.append(message.data.decode("utf-8"))
message.ack()subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path("sethmaxwl-playground", "chat-subscriber")
streaming_pull_future = subscriber.subscribe(subscription_path, callback=callback)

It appears that the culprit is the callback method provided to the subscriber to handle incoming messages. There’s a random chance that incoming messages are not acknowledged or added to the list of messages to display to the user. This prompts Cloud Pub/Sub to resend the message since the subscriber did not send back an acknowledgement that it was received.

罪魁祸首似乎是提供给订户以处理传入消息的回调方法。 有随机的机会,传入的消息不会被确认或添加到要显示给用户的消息列表中。 由于订阅者未发送回已收到的确认,这将提示Cloud Pub / Sub重新发送该消息。

Taking the random return out of the callback method seems to have fixed the issue. Now, messages are appearing approximately one second after they are sent, and the traces for each message back this up.

从回调方法中删除随机返回值似乎已解决了该问题。 现在,消息在发送后大约一秒钟出现,并且每条消息的跟踪都会对此进行备份。

Image for post

Yay! Adding OpenTelemetry observability to Cloud Pub/Sub messages helped to diagnose why the messages were being delayed, and it ultimately provided a better user experience.

好极了! 将OpenTelemetry可观察性添加到Cloud Pub / Sub消息有助于诊断为什么消息被延迟,并最终提供了更好的用户体验。

This was a simple example of why OpenTelemetry can be useful in Cloud Pub/Sub applications, but for your application, this random number-driven error could be an asynchronous call that takes longer than the deadline to acknowledge a message. This would cause the same issue as the one explored here, and viewing the OpenTelemetry trace for these messages would be a useful debugging tool to find its cause.

这是一个简单的示例,说明了为什么OpenTelemetry在Cloud Pub / Sub应用程序中很有用,但是对于您的应用程序,此随机数驱动的错误可能是异步调用,它花费的时间超过了确认消息的期限。 这将导致与此处探讨的问题相同的问题,查看这些消息的OpenTelemetry跟踪将是找到其原因的有用调试工具。

Support for OpenTelemetry in the Node.js Cloud Pub/Sub client is now available, and support for the Python client will be available shortly.

现在可以在Node.js Cloud Pub / Sub客户端中支持OpenTelemetry,不久之后将提供对Python客户端的支持。

To try out this example for yourself, the source can be found here. Please also see the OpenTelemetry Python 3 docs and Node.js docs for more information.

要亲自尝试该示例,请在此处找到源。 另请参阅OpenTelemetry Python 3文档Node.js文档

翻译自: https://medium.com/google-cloud/integrating-opentelemetry-into-cloud-pub-sub-19aacd83692a

cloud pub/sub

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值