使用CloudWatch警报Sns和AWS Lambda发出不和谐通知

Alarms exist to notify us when our system behaves in an unexpected way, which warrants manual intervention to correct. When we have multiple systems in a production environment and an error passes unnoticed, the consequences can be catastrophic.

当我们的系统以异常方式运行时,存在警报来通知我们, 这需要手动干预来纠正 。 当我们在生产环境中有多个系统并且错误被忽略时,后果可能是灾难性的。

An alarm should be created when the system cannot automatically recover, and human intervention is required. If an alert happens to occur too frequently it might lead to longer response time or even get missed.

当系统无法自动恢复并且需要人工干预时,应创建警报。 如果警报发生得太频繁,则可能导致更长的响应时间甚至被错过。

In this article, we will be building an alarm notification pipeline for an AWS Lambda function. For that will be using 3 AWS Services: AWS Lambda, Simple Notification Service (SNS), and CloudWatch. The goal is to send a notification to a Discord Channel when a CloudWatch Alarm is triggered.

在本文中,我们将为AWS Lambda函数构建警报通知管道。 为此将使用3种AWS服务:AWS Lambda,简单通知服务(SNS)和CloudWatch。 目的是在触发CloudWatch警报时向Discord通道发送通知

Image for post
https://www.lucidchart.com https ://www.lucidchart.com

步骤1 — CloudWatch警报 (Step 1 — CloudWatch Alarm)

选择指标 (Select Metric)

First of all, you will need to choose a CloudWatch metric for the alarm to watch. For the Lambda Function there are 3 types of metrics:

首先,您需要选择一个CloudWatch指标来观看警报。 对于Lambda函数,有3种指标:

  • Invocation Metrics: binary indicators of the outcome of an invocation. Examples: Invocations, Errors, DeadLetterErrors, DestinationDeliveryFailures, Throttles.

    调用指标 :调用结果的二进制指标。 示例: 调用错误DeadLetterErrorsDestinationDeliveryFailuresThrottles

  • Performance Metrics: performance details about a single invocation. Such as: Duration, IteratorAge.

    性能指标 :有关单个调用的性能详细信息。 如: DurationIteratorAge

  • Concurrency Metrics: aggregate count of the number of instances processing events across a function, version, alias, or AWS Region. Examples: ConcurrentExecutions, ProvisionedConcurrentExecutions, ProvisionedConcurrencyUtilization, UnreservedConcurrentExecutions.

    并发指标 :跨函数,版本,别名或AWS区域处理事件的实例总数。 示例: ConcurrentExecutionsProvisionedConcurrentExecutionsProvisionedConcurrencyUtilizationUnreservedConcurrentExecutions。

You can read more in details about AWS Lambda metrics in AWS Developer Guide.

您可以在AWS Developer Guide中阅读有关AWS Lambda指标的更多详细信息。

In this example, we will be monitoring the Errors metric, i.e. the number of invocations that result in a function error.

在此示例中,我们将监视Errors指标 ,即导致函数错误的调用次数。

图形指标 (Graphed Metrics)

Next, you will need to choose the Statistics for the data aggregation over specified period of time, i.e, the statistic to apply the metric defined above: SampleCount, Average, Sum, Minimum, Maximum. We will be using the Sum.

接下来,您将需要为指定时间段内的数据聚合选择统计信息,即要应用上面定义的度量标准的统计信息: SampleCountAverageSumMinimumMaximum 。 我们将使用Sum

The variable period represents the period in seconds over which the above statistic is applied. We will set it to 60s. The number of periods over which data is compared to the specified threshold will be set to 1 since we would like to have our Errors metrics to evaluated only in 1 minute intervals.

可变时间段表示应用上述统计信息的时间段(以秒为单位)。 我们将其设置为60s。 将数据与指定阈值进行比较的时间段数将设置为1,因为我们希望仅以1分钟为间隔评估错误指标。

条件 (Conditions)

Lastly, you will need to set the Comparison Operator, which is the arithmetic operation to use when comparing the specified Statistic and Threshold. Supported conditions:

最后,您需要设置比较运算符,这是比较指定的统计量和阈值时要使用的算术运算。 支持的条件:

  • GreaterThanOrEqualToThreshold

    大于等于阈值
  • GreaterThanThreshold

    大于阈值
  • LessThanThreshold

    小于阈值
  • LessThanOrEqualToThreshold

    小于等于阈值

We will be using the condition: GreaterThanThreshold with a Threshold of 0.

我们将使用以下条件:具有阈值的GreaterThanThreshold 0

通知事项 (Notifications)

As mentioned at the beginning of this article the CloudWatch Alarm will trigger a SNS topic to notify when the alarm is in ALARM state.

如本文开头所述,CloudWatch Alarm将触发SNS主题以在警报处于ALARM状态时通知。

地形代码 (Terraform Code)

The below code contains all the settings that were discussed above. It also contains some additional settings that won’t be discussed in here. You can observe that I added into the variable alarms_actionsthe ARN of the SNS that we will create in the following section, which means that this alarm will trigger the SNS when it transitions into an ALARM state.

下面的代码包含上面讨论的所有设置。 它还包含一些其他设置,这里不再讨论。 您可以观察到,我将在下一节中创建的SNS的ARN添加到变量alarms_actions ,这意味着此警报将在过渡到ALARM状态时触发SNS。

resource "aws_cloudwatch_metric_alarm" "lambda_alarm" {
  for_each = length(keys(local.alarms_dimensions)) > 0 ? local.alarms_dimensions : {}


  alarm_name                = "${each.key}-alarm"
  comparison_operator       = "GreaterThanThreshold"
  evaluation_periods        = 1
  metric_name               = "Errors"
  namespace                 = "AWS/Lambda"
  period                    = "60"
  statistic                 = "Sum"
  threshold                 = 0
  datapoints_to_alarm       = 1
  alarm_actions             = [aws_sns_topic.sns_alarms.arn]
  alarm_description         = "Triggerd by errors in lambdas"
  treat_missing_data        = "notBreaching"


  dimensions = each.value


  tags = {
    Product = local.name_dash
  }
}

The above code allows you to create as many CloudWatch Alarms as you want. You will just have to edit the local.alarms_dimensions:

上面的代码使您可以根据需要创建任意数量的CloudWatch警报。 您只需要编辑local.alarms_dimensions

locals {
  name_dash = "${var.name}-${var.environment}"
  # Lambda with Alarms
  alarms_dimensions = {
    "${var.name}-${var.environment}-lambda-1" = {
      FunctionName = "${var.name}-${var.environment}-lambda-1"
    },
    "lambda-2" = {
      FunctionName = "lambda-y"
    },
    "ADD_NEW_LAMBDA" = {
      FunctionName = "ADD_YOUR_LAMBDA_TO_MONITORED"
    }
  }
}

第2步-简单通知服务(SNS) (Step 2 — Simple Notification Service (SNS))

From AWS Documentation:

AWS文档中

Amazon Simple Notification Service (Amazon SNS) is a web service that coordinates and manages the delivery or sending of messages to subscribing endpoints or clients. It enables you to communicate between systems through publish/subscribe (pub/sub) patterns

Amazon Simple Notification Service(Amazon SNS)是一项Web服务,用于协调和管理向订阅端点或客户端的消息传递或发送。 它使您可以通过发布/订阅 (pub / sub)模式在系统之间进行通信

Our SNS needs to publish its messages to the Lambda-Alarm that will then send a custom message to the Discord Channel.

我们的SNS需要将其消息发布到Lambda-Alarm,然后将自定义消息发送到Discord频道。

地形代码 (Terraform Code)

In the code below, we subscribed the Lambda-Alarm to our SNS Topic, passing the lambda ARN in the endpoint parameter.

在下面的代码中,我们将Lambda-Alarm订阅到我们的SNS主题,在endpoint参数中传递了Lambda ARN。

resource "aws_sns_topic" "sns_alarms" {
  name = "${local.name_dash}-sns-alarms"


  tags = {
    Product = local.name_dash
  }
}


resource "aws_sns_topic_subscription" "lambda_alarm" {
  topic_arn = aws_sns_topic.sns_alarms.arn
  protocol  = "lambda"
  endpoint  = "arn:aws:lambda:${var.region}:${data.aws_caller_identity.current.account_id}:function:${local.name_dash}-alarms-discord"
}

步骤3 — Lambda警报 (Step 3 — Lambda Alarm)

When triggered by the SNS topic, this Lambda will send an alert to a Discord Channel using the following message format:

当由SNS主题触发时,此Lambda将使用以下消息格式将警报发送到Discord频道:

Image for post
Screenshot of Discord Channel
Discord Channel的屏幕截图

不和谐网钩 (Discord Webhooks)

First, you will need to create a Webhook in the desired channel. In Edit Channel page you will go into Integrations, and Create Webhook.

首先,您需要在所需的通道中创建一个Webhook。 在“ 编辑频道”页面中,您将进入IntegrationsCreate Webhook。

Image for post
Screenshot of Discord Channel Settings
不和谐频道设置的屏幕截图

You will then Copy the Webhook URL and paste it into your Lambda code.

然后,您将复制Webhook URL并将其粘贴到Lambda代码中。

Lambda代码 (Lambda Code)

The code below is very intuitive and it doesn’t need much explanation. You can get further information about the Channel Object on Discord'’s Developer Portal.

下面的代码非常直观,不需要太多说明。 您可以在Discord的Developer Portal上获取有关Channel对象的更多信息。

import sys
sys.path.insert(0, 'package/')
import json
import requests
import os
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)




def parse_service_event(event, service='Service'):
    return [
        {
            'name': service,
            'value': event['Trigger']['Dimensions'][0]['value'],
            "inline": True
        },
        {
            'name': 'alarm',
            'value': event['AlarmName'],
            "inline": True
        },
        {
            'name': 'description',
            'value': event['AlarmDescription'],
            "inline": True
        },
        {
            'name': 'oldestState',
            'value': event['OldStateValue'],
            "inline": True
        },
        {
            'name': 'trigger',
            'value': event['Trigger']['MetricName'],
            "inline": True
        },
        {
            'name': 'event',
            'value': event['NewStateReason'],
            "inline": True
        }
    ]




def handler(event, context):
    webhook_url = os.getenv("WEBHOOK_URL")
    parsed_message = []
    for record in event.get('Records', []):
        sns_message = json.loads(record['Sns']['Message'])
        is_alarm = sns_message.get('Trigger', None)
        if is_alarm:
            if (is_alarm['Namespace'] == 'AWS/Lambda'):
                logging.info('Alarm from LAMBDA')
                parsed_message = parse_service_event(sns_message,
                                                     'Lambda')
        if not parsed_message:
            parsed_message = [{
                'name': 'Something not parsed happened',
                'value': json.dumps(sns_message)
            }]
        dicord_data = {
            'username': 'AWS',
            'avatar_url': 'https://a0.awsstatic.com/libra-css/images/logos/aws_logo_smile_1200x630.png',
            'embeds': [{
                'color': 16711680,
                'fields': parsed_message
            }]
        }


        headers = {'content-type': 'application/json'}
        response = requests.post(webhook_url, data=json.dumps(dicord_data),
                                 headers=headers)


        logging.info(f'Discord response: {response.status_code}')
        logging.info(response.content)

结论 (Conclusion)

In this article, we have built an alarm notification system using CloudWatch Alarms, SNS, and AWS Lambda. When an error occurs in our lambda we will receive a message in our Discord Channel.

在本文中,我们使用CloudWatch Alarms,SNS和AWS Lambda构建了警报通知系统。 当我们的lambda发生错误时,我们将在Discord频道中收到一条消息。

You can, of course, integrate the Lambda Alarms with different services besides Discord, such as:

当然,您可以将Lambda警报与Discord之外的其他服务集成在一起,例如:

  • Any email service: Gmail, Outlook, Yahoo and etc

    任何电子邮件服务:Gmail,Outlook,Yahoo等
  • Slack

    松弛
  • Microsoft Teams

    微软团队

You can check the full code here!

您可以在此处查看完整代码!

翻译自: https://towardsdatascience.com/discord-notification-using-cloudwatch-alarms-sns-and-aws-lambda-71393861699f

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值