AWS SAP-C02教程1--计算资源

linmoo1986

已于 2023-11-27 09:44:41 修改

阅读量1.3k

点赞数 6

分类专栏： AWS 文章标签： aws 云计算 SAP 考试指南

于 2023-10-10 14:13:56 首次发布

本文链接：https://blog.csdn.net/linwu_2006_2006/article/details/133715603

版权

AWS 专栏收录该内容

12 篇文章

订阅专栏

在AWS中，有诸多不同的计算资源，而SAP-C02一般会考察以下的计算资源：

1 Amazon EC2

EC2可以简单理解就是一台简单虚拟服务器（可以部署Linux、Windows等）。其全称Amazon Elastic Compute Cloud (Amazon EC2) 是一种提供可调节计算容量的 Web 服务 – 简单来说，就是 Amazon’s 数据中心内的服务器 – 您可以使用它来构建和托管您的软件系统。
阿里云对标产品：ECS

1.1 Instances Types（实例类型）

EC2最新了解的是它的实例类型，以下表格有几种不同的实例类型在不同场景使用：

类型系列	描述	应用场景
R系列	拥有内存比较大的实例	缓存，如部署redis等
C系列	拥有cpu比较好的实例	计算较多，比如数据库等
M系列	cpu和内存比较均衡的实例	内存和cpu较均衡，比如web应用等
I系列	拥有比较好的磁盘I/O的实例	存储，比如数据库等
G系列	使用GPU的实例	使用GPU的应用，比如机器学习等
T系列	可突增性能实例，即实例会按需增减配置	针对业务流量不稳定的业务比较节省成本

其它具体类型可以参考帮助文档：https://docs.aws.amazon.com/zh_cn/AWSEC2/latest/UserGuide/instance-types.html

例题：A company uses an on-premises data analytics platform. The system is highly available in a fully redundant configuration across 12 servers in the company’s data center.
The system runs scheduled jobs, both hourly and daily, in addition to one-time requests from users. Scheduled jobs can take between 20 minutes and 2 hours to finish running and have tight SLAs. The scheduled jobs account for 65% of the system usage. User jobs typically finish running in less than 5 minutes and have no SLA. The user jobs account for 35% of system usage. During system failures, scheduled jobs must continue to meet SLAs. However, user jobs can be delayed.
A solutions architect needs to move the system to Amazon EC2 instances and adopt a consumption-based model to reduce costs with no long-term commitments. The solution must maintain high availability and must not affect the SLAs.
Which solution will meet these requirements MOST cost-effectively?
A. Split the 12 instances across two Availability Zones in the chosen AWS Region. Run two instances in each Availability Zone as On-Demand Instances with Capacity Reservations. Run four instances in each Availability Zone as Spot Instances.
B. Split the 12 instances across three Availability Zones in the chosen AWS Region. In one of the Availability Zones, run all four instances as On-Demand Instances with Capacity Reservations. Run the remaining instances as Spot Instances.
C. Split the 12 instances across three Availability Zones in the chosen AWS Region. Run two instances in each Availability Zone as On-Demand Instances with a Savings Plan. Run two instances in each Availability Zone as Spot Instances.
D. Split the 12 instances across three Availability Zones in the chosen AWS Region. Run three instances in each Availability Zone as On-Demand Instances with Capacity Reservations. Run one instance in each Availability Zone as a Spot Instance.
答案：D
答案解析：题目有2个作业，计划作业（有高的SLA要求）和用户作业（允许delay），要求提供high availability and must not affect the SLAs，并且MOST cost-effectively。
AZ1(2个按需+ 4个点)/ AZ2(2个按需+ 4个点)
B: AZ1(4个按需)/ AZ2(4个点)/ AZ3(4个点)
C: AZ1(2个按需+ 2个点)/ AZ2(2个按需+ 2个点)/ AZ3(2个按需+ 2个点)
D: AZ1(1个按需+ 1个点)/ AZ2(3个按需+ 1个点)/ AZ3(3个按需+ 1个点)。
从中可以得出C选项和D选项完全符合需求，A选项和B选项会不符合高可用。C选项也不符合SLA要求，因此选择D选项。

例题：A company runs a memory-intensive analytics application using on-demand Amazon EC2 C5 compute optimized instance. The application is used continuously and application demand doubles during working hours. The application currently scales based on CPU usage. When scaling in occurs, a lifecycle hook is used because the instance requires 4 minutes to clean the application state before terminating.
Because users reported poor performance during working hours, scheduled scaling actions were implemented so additional instances would be added during working hours. The Solutions Architect has been asked to reduce the cost of the application.
Which solution is MOST cost-effective?
A. Use the existing launch configuration that uses C5 instances, and update the application AMI to include the Amazon CloudWatch agent. Change the Auto Scaling policies to scale based on memory utilization. Use Reserved Instances for the number of instances required after working hours, and use Spot Instances to cover the increased demand during working hours.
B. Update the existing launch configuration to use R5 instances, and update the application AMI to include SSM Agent. Change the Auto Scaling policies to scale based on memory utilization. Use Reserved Instances for the number of instances required after working hours, and use Spot Instances with on-Demand instances to cover the increased demand during working hours.
C. Use the existing launch configuration that uses C5 instances, and update the application AMI to include SSM Agent. Leave the Auto Scaling policies to scale based on CPU utilization. Use scheduled Reserved Instances for the number of instances required after working hours, and use Spot Instances to cover the increased demand during working hours.
D. Create a new launch configuration using R5 instances, and update the application AMI to include the Amazon CloudWatch agent. Change the Auto Scaling policies to scale based on memory utilization. Use Reserved Instances for the number of instances required after working hours, and use Standard Reserved Instances with On-Demand Instances to cover the increased demand during working hours.
答案：D
答案解析：题目需要的是内存密集型，因此只能在B选项和D选项中选择。在实例终止之前requires 4 minutes，因此Spot Instances明显不合适。因此答案只能是D选项

1.2 Placement groups（置放群组）

EC2第二个需要了解的的是置放群组，你可以同时创建一组EC2实例，而且还可以按照你对可用性、灾备等要求，将这一组EC2设置不同置放模式

置放类型	描述	应用场景
Cluster（集群）	集群中的实例将部署在同一个AZ，且同一个VPC	适合于网络低延迟的应用
Partition（分区）	分区中的实例将部署在同一个AZ，但是会被分组放在不同机架上，避免同一个机架不可用导致整个集群不可用	一般对有多副本的应用，比如Hadoop等
Spread（分布）	分布中的实例将可以被部署到同一个zone的不同AZ，且每个实例都是在不同机架上，每个AZ最多7台实例	一般对容灾要求比较高的应用，提供高可用

例题：A company has a new application that needs to run on five Amazon EC2 instances in a single AWS Region. The application requires high-throughput, low-latency network connections between all of the EC2 instances where the application will run. There is no requirement for the application to be fault tolerant.
Which solution will meet these requirements?
A. Launch five new EC2 instances into a cluster placement group. Ensure that the EC2 instance type supports enhanced networking.
B. Launch five new EC2 instances into an Auto Scaling group in the same Availability Zone. Attach an extra elastic network interface to each EC2 instance.
C. Launch five new EC2 instances into a partition placement group. Ensure that the EC2 instance type supports enhanced networking.
D. Launch five new EC2 instances into a spread placement group. Attach an extra elastic network interface to each EC2 instance.
答案：A
答案解析：题目要求建立一个低网络延迟的集群。很明显对比不同置放群组可知道cluster placement group符合这个方案，因此选择A选项

1.3 Instance purchasing options（实例购买选项）

在AWS除了实例的类型（即配置cpu、内存等）可选择之外，还可以根据需求优化成本的购买选项

On-Demand Instances(按需型实例) – 按秒为启动的实例付费。
Savings Plans – 通过承诺在 1 年或 3 年期限内保持一致的使用量（以 USD/小时为单位）来降低您的 Amazon EC2 成本。
Reserved Instances(预留实例) – 通过承诺在 1 年或 3 年期限内提供一致的实例配置（包括实例类型和区域）来降低您的 Amazon EC2 成本。一般从整个organization层面购买，并分享或者禁止使用RI实例。如果下一个年期限想保留RI实例，RI实例必须要在过期前续订。
Spot Instances(竞价型实例) – 请求未使用的 EC2 实例，这可能会显著降低您的 Amazon EC2 成本。
Dedicated Hosts(专用主机) – 为完全专用于运行您的实例的物理主机付费，让您现有的按插槽、按内核或按 VM 计费的软件许可证降低成本。
Dedicated Instances(专用实例) – 为在单一租户硬件上运行的实例按小时付费。
Capacity Reservations(预留容量) –可在特定可用区中为 EC2 实例预留容量，持续时间不限。

1.4 Metric（指标）

可以查看EC2的状态、cpu、网络、硬盘，但是不包括内存。
通过status状态指标，我们可以设置EC2的自动恢复能力，通过CloudWatch Alarm检测状态，如果EC2状态有问题，会自动恢复EC2；同时也能集成SNS发送通知。
在这里插入图片描述

2 EC2 Auto Scaling（ASG）

Amazon EC2 Auto Scaling 帮助您确保具有正确数量的 Amazon EC2 实例以处理应用程序负载。您可创建 EC2 实例的集合，称为 Auto Scaling 组。您可以指定每个 Auto Scaling 组中最少的实例数量，Amazon EC2 Auto Scaling 会确保您的组中的实例永远不会低于这个数量。您可以指定每个 Auto Scaling 组中最大的实例数量，Amazon EC2 Auto Scaling 会确保您的组中的实例永远不会高于这个数量。如果您在创建组的时候或在创建组之后的任何时候指定了所需容量，Amazon EC2 Auto Scaling 会确保您的组一直具有此数量的实例。如果您指定了扩展策略，则 Amazon EC2 Auto Scaling 可以在您的应用程序的需求增加或降低时启动或终止实例。
阿里云对标产品：弹性伸缩

2.1 伸缩策略

保持固定数量策略：根据健康状态检测以及最小容量、最大容量和所需容量设置相同值来实现

例题：An entertainment company recently launched a new game. To ensure a good experience for players during the launch period, the company deployed a static quantity of 12 r6g.16xlarge (memory optimized) Amazon EC2 instances behind a Network Load Balancer. The company’s operations team used the Amazon CloudWatch agent and a custom metric to include memory utilization in its monitoring strategy.
Analysis of the CloudWatch metrics from the launch period showed consumption at about one quarter of the CPU and memory that the company expected. Initial demand for the game has subsided and has become more variable. The company decides to use an Auto Scaling group that monitors the CPU and memory consumption to dynamically scale the instance fleet. A solutions architect needs to configure the Auto Scaling group to meet demand in the most cost-effective way.
Which solution will meet these requirements?
A. Configure the Auto Scaling group to deploy c6g.4xlarge (compute optimized) instances. Configure a minimum capacity of 3, a desired capacity of 3, and a maximum capacity of 12.
B. Configure the Auto Scaling group to deploy m6g.4xlarge (general purpose) instances. Configure a minimum capacity of 3, a desired capacity of 3, and a maximum capacity of 12.
C. Configure the Auto Scaling group to deploy r6g.4xlarge (memory optimized) instances. Configure a minimum capacity of 3, a desired capacity of 3, and a maximum capacity of 12.
D. Configure the Auto Scaling group to deploy r6g.8xlarge (memory optimized) instances. Configure a minimum capacity of 2, a desired capacity of 2, and a maximum capacity of 6.
答案：C
答案解析：题目希望通过设置ASG的固定数量策略来优化成本。首先原先采用的是r系列的EC2，那么依旧要保持，因此排除A选项和B选项。原来最高峰需要12台，目前日常需要原来的1/4，因此应该设置最低3台，因此选择C选项。

目标跟踪扩缩策略：需要指定 Amazon CloudWatch 指标和代表应用程序理想平均利用率或吞吐量水平的目标值（如CPU、网络流量等），当达到目标值时，伸缩实例。

2.2 基本功能

启动模板，配置AMI等信息

例题：A company is running a compute workload by using Amazon EC2 Spot Instances that are in an Auto Scaling group. The launch template uses two placement groups and a single instance type.
Recently, a monitoring system reported Auto Scaling instance launch failures that correlated with longer wait times for system users. The company needs to improve the overall reliability of the workload.
Which solution will meet this requirement?
A. Replace the launch template with a launch configuration to use an Auto Scaling group that uses attribute-based instance type selection.
B. Create a new launch template version that uses attribute-based instance type selection. Configure the Auto Scaling group to use the new launch template version.
C. Update the launch template Auto Scaling group to increase the number of placement groups.
D. Update the launch template to use a larger instance type.
答案：B
答案解析：题目出现ASG启动Spot 实例失败，想提高可靠性。通过利用基于属性的实例类型选择，Auto Scaling组可以适应Spot实例可用性的变化，并在可用性更高的区域中启动实例，从而减少启动失败。因此选择B选项。参考：https://docs.aws.amazon.com/autoscaling/ec2/userguide/create-mixed-instances-group-attribute-based-instance-type-selection.html

例题：An application is deployed on Amazon EC2 instances that run in an Auto Scaling group. The Auto Scaling group configuration uses only one type of instance.
CPU and memory utilization metrics show that the instances are underutilized. A solutions architect needs to implement a solution to permanently reduce the EC2 cost and increase the utilization.
Which solution will meet these requirements with the LEAST number of configuration changes in the future?
A. List instance types that have properties that are similar to the properties that the current instances have. Modify the Auto Scaling group’s launch template configuration to use multiple instance types from the list.
B. Use the information about the application’s CPU and memory utilization to select an instance type that matches the requirements. Modify the Auto Scaling group’s configuration by adding the new instance type. Remove the current instance type from the configuration.
C. Use the information about the application’s CPU and memory utilization to specify CPU and memory requirements in a new revision of the Auto Scaling group’s launch template. Remove the current instance type from the configuration.
D. Create a script that selects the appropriate instance types from the AWS Price List Bulk API. Use the selected instance types to create a new revision of the Auto Scaling group’s launch template.
答案：C
答案解析：题目要求根据实际情况调整EC2的配置。“将来配置更改的次数最少?”意味着我们需要使用基于属性的实例类型。否则，随着新实例类型的创建和旧实例类型的退出，我们需要再次重新配置启动配置。

例题：A company has a website that runs on four Amazon EC2 instances that are behind an Application Load Balancer (ALB). When the ALB detects that an EC2 instance is no longer available, an Amazon CloudWatch alarm enters the ALARM state. A member of the company’s operations team then manually adds a new EC2 instance behind the ALB.
A solutions architect needs to design a highly available solution that automatically handles the replacement of EC2 instances. The company needs to minimize downtime during the switch to the new solution.
Which set of steps should the solutions architect take to meet these requirements?
A. Delete the existing ALB. Create an Auto Scaling group that is configured to handle the web application traffic. Attach a new launch template to the Auto Scaling group. Create a new ALB. Attach the Auto Scaling group to the new ALB. Attach the existing EC2 instances to the Auto Scaling group.
B. Create an Auto Scaling group that is configured to handle the web application traffic. Attach a new launch template to the Auto Scaling group. Attach the Auto Scaling group to the existing ALB. Attach the existing EC2 instances to the Auto Scaling group.
C. Delete the existing ALB and the EC2 instances. Create an Auto Scaling group that is configured to handle the web application traffic. Attach a new launch template to the Auto Scaling group. Create a new ALB. Attach the Auto Scaling group to the new ALB. Wait for the Auto Scaling group to launch the minimum number of EC2 instances.
D. Create an Auto Scaling group that is configured to handle the web application traffic. Attach a new launch template to the Auto Scaling group. Attach the Auto Scaling group to the existing ALB. Wait for the existing ALB to register the existing EC2 instances with the Auto Scaling group.
答案：B
答案解析：题目要求EC2集群某台EC2出现故障能够自动切换并连上ALB，并要求minimize downtime 。A选项和C选项删除ALB会增加不可用时间，因此排除。D选项等待ALB注册EC2这个不正确，EC2是由ASG控制的。因此选择B选项。

支持spot、On-Demand实例混合部署
支持定时伸缩
Lifecycle Hook：在实例启动或者终止时做一些操作，如清理日志等

例题：A company is running an application on several Amazon EC2 instances in an Auto Scaling group behind an Application Load Balancer. The load on the application varies throughout the day, and EC2 instances are scaled in and out on a regular basis. Log files from the EC2 instances are copied to a central
Amazon S3 bucket every 15 minutes. The security team discovers that log files are missing from some of the terminated EC2 instances.
Which set of actions will ensure that log files are copied to the central S3 bucket from the terminated EC2 instances?
A. Create a script to copy log files to Amazon S3, and store the script in a file on the EC2 instance. Create an Auto Scaling lifecycle hook and an Amazon EventBridge (Amazon CloudWatch Events) rule to detect lifecycle events from the Auto Scaling group. Invoke an AWS Lambda function on the autoscaling:EC2_INSTANCE_TERMINATING transition to send ABANDON to the Auto Scaling group to prevent termination, run the script to copy the log files, and terminate the instance using the AWS SDK.
B. Create an AWS Systems Manager document with a script to copy log files to Amazon S3. Create an Auto Scaling lifecycle hook and an Amazon EventBridge (Amazon CloudWatch Events) rule to detect lifecycle events from the Auto Scaling group. Invoke an AWS Lambda function on the autoscaling:EC2_INSTANCE_TERMINATING transition to call the AWS Systems Manager API SendCommand operation to run the document to copy the log files and send CONTINUE to the Auto Scaling group to terminate the instance.
C. Change the log delivery rate to every 5 minutes. Create a script to copy log files to Amazon S3, and add the script to EC2 instance user data. Create an Amazon EventBridge (Amazon CloudWatch Events) rule to detect EC2 instance termination. Invoke an AWS Lambda function from the EventBridge (CloudWatch Events) rule that uses the AWS CLI to run the user-data script to copy the log files and terminate the instance.
D. Create an AWS Systems Manager document with a script to copy log files to Amazon S3. Create an Auto Scaling lifecycle hook that publishes a message to an Amazon Simple Notification Service (Amazon SNS) topic. From the SNS notification, call the AWS Systems Manager API SendCommand operation to run the document to copy the log files and send ABANDON to the Auto Scaling group to terminate the instance.
答案：B
答案解析：A选项错误的放弃不会阻止实例的终止，因此脚本不会被执行；C选项如果实例终止，将删除日志；D选项没有提到要使用SNS发送通知后，由哪个组件接收通知。因此选择B，lifecycle hook+EventBridge 。

2.3 伸缩操作

Luanch：增加所需容量，可以自动增加实例
Terminate：减少所需容量，可以自动减少实例
HealthCheck：对实例进行健康检查
ReplaceUnHealthy：终止不健康实例，新建新的实例
AZRebalance：跨AZ分布均衡
AlarmNotification：接收来自CloudWatch的警告
ScheduledActions：执行定时策略
AddToBalance：添加实例到负载均衡或者组中
Suspend and resume：暂停然后恢复您的 Auto Scaling 组的一个或多个扩展进程（实例因超过其最长生命周期或未通过运行状况检查而被终止时）

例题：A large company is running a popular web application. The application runs on several Amazon EC2 Linux instances in an Auto Scaling group in a private subnet.
An Application Load Balancer is targeting the instances in the Auto Scaling group in the private subnet. AWS Systems Manager Session Manager is configured, and AWS Systems Manager Agent is running on all the EC2 instances.
The company recently released a new version of the application. Some EC2 instances are now being marked as unhealthy and are being terminated. As a result, the application is running at reduced capacity. A solutions architect tries to determine the root cause by analyzing Amazon CloudWatch logs that are collected from the application, but the logs are inconclusive.
How should the solutions architect gain access to an EC2 instance to troubleshoot the issue?
A. Suspend the Auto Scaling group’s HealthCheck scaling process. Use Session Manager to log in to an instance that is marked as unhealthy.
B. Enable EC2 instance termination protection. Use Session Manager to log in to an instance that is marked as unhealthy.
C. Set the termination policy to OldestInstance on the Auto Scaling group. Use Session Manager to log in to an instance that is marked an unhealthy.
D. Suspend the Auto Scaling group’s Terminate process. Use Session Manager to log in to an instance that is marked as unhealthy.
答案：D
答案分析：题目中提到通过CloudWatch的日志无法确定原因，因此需要的是EC2上的日志，那么问题就变成如何保留不健康的实例。A选项不正确，因为挂起HealthCheck扩展进程不会阻止实例被终止。B选项是不正确的，因为启用EC2实例终止保护不会阻止实例被Auto Scaling组终止。C选项不正确，因为将终止策略设置为自动扩展组上的OldestInstance不会阻止标记为不健康的实例被终止。因此答案为D。

例题：A company is processing videos in the AWS Cloud by Using Amazon EC2 instances in an Auto Scaling group. It takes 30 minutes to process a video Several EC2 instances scale in and out depending on the number of videos in an Amazon Simple Queue Service (Amazon SQS) queue.
The company has configured the SQS queue with a redrive policy that specifies a target dead-letter queue and a maxReceiveCount of 1. The company has set the visibility timeout for the SQS queue to 1 hour. The company has set up an Amazon CloudWatch alarm to notify the development team when there are messages in the dead-letter queue.
Several times during the day. the development team receives notification that messages are in the dead-letter queue and that videos have not been processed property. An investigation finds no errors m the application logs.
How can the company solve this problem?
A. Turn on termination protection tor the EC2 Instances
B. Update the visibility timeout for the SQS queue to 3 hours
C. Configure scale-in protection for the instances during processing
D. Update the redrive policy and set maxReceiveCount to 0.
答案：C
答案解析：参考：https://aws.amazon.com/blogs/aws/new-instance-protection-for-auto-scaling/

2.4 经典架构

通过ALB直接切换流量
通过ALB权重分发流量
通过Router53自动切换

3 Amazon ECS

在了解ECS之前，必须了解容器技术，例如docker，这样理解ECS就会容易很多。
ECS的全称Amazon Elastic Container Service（Amazon ECS）是完全托管的容器编排服务，可帮助您轻松地部署、管理和扩展容器化应用程序。可以简单理解为底部有一个类似K8s的编排，管理很多node，node其中部署很多pod，pod里面部署容器，而一台ECS就是其中的一个容器；ECS一般部署在EC2上面，而EC2就是node节点。你可以选择Fargate的无维护（即你不需要维护node节点（也就是EC2）），也可以选择自己维护EC2。
阿里云对标产品：ECI

3.1 相关组件

ECS Core：ECS本身的配置，ECS最终运行于EC2上面
Fargate：以一种serverless方式运行ECS集群，无需管理基础设施EC2。Fargate在考试中经常出现，当应用场景不想维护基础设施时，基本上都要选择Fargate模式。

例题：A company is running a traditional web application on Amazon EC2 instances. The company needs to refactor the application as microservices that run on containers. Separate versions of the application exist in two distinct environments: production and testing. Load for the application is variable, but the minimum load and the maximum load are known. A solutions architect needs to design the updated application with a serverless architecture that minimizes operational complexity.
Which solution will meet these requirements MOST cost-effectively?
A. Upload the container images to AWS Lambda as functions. Configure a concurrency limit for the associated Lambda functions to handle the expected peak load. Configure two separate Lambda integrations within Amazon API Gateway: one for production and one for testing.
B. Upload the container images to Amazon Elastic Container Registry (Amazon ECR). Configure two auto scaled Amazon Elastic Container Service (Amazon ECS) clusters with the Fargate launch type to handle the expected load. Deploy tasks from the ECR images. Configure two separate Application Load Balancers to direct traffic to the ECS clusters.
C. Upload the container images to Amazon Elastic Container Registry (Amazon ECR). Configure two auto scaled Amazon Elastic Kubernetes Service (Amazon EKS) clusters with the Fargate launch type to handle the expected load. Deploy tasks from the ECR images. Configure two separate Application Load Balancers to direct traffic to the EKS clusters.
D. Upload the container images to AWS Elastic Beanstalk. In Elastic Beanstalk, create separate environments and deployments for production and testing. Configure two separate Application Load Balancers to direct traffic to the Elastic Beanstalk deployments.
答案：B
答案解析：题目关键词：microservices ，minimizes operational complexity。将一个传统微服务做改造，并且想使用最小运维代价。因此优先选择ECS的Fargate架构。A选项对于传统的微服务，不建议改为Lambda，改造成本过大；C选项使用EKS的运维成本叫Fargate高；D选项Beanstalk要做的底层基础设施运维比Fargate多。因此选择B选项

EKS：使用kubernetes集群管理ECS
ECR：Amazon Elastic Container Registry (Amazon ECR) 是 AWS 托管容器映像注册表服务，保存ECS所需的镜像服务，在ECR章节中讲述。

3.2 基本概念及应用场景

ECS集群：一组EC2实例。无论你使用Fargate方式还是手动添加EC2方式，最终这些EC2实例都会组成一个集群，供ECS服务。
ECS service：程序运行在ECS集群上的服务。
ECS task：它是一个 JSON 格式的文本文件，用于描述构成应用程序的参数和一个或多个容器，包括启动类型、镜像、联网模式等配置，任务最终运行在EC2上面创建应用。
应用场景：微服务、批处理、定时任务、应用迁移云等。
资源和标签：Amazon ECS 资源分配有 Amazon 资源名称 (ARN) 和唯一的资源标识符 (ID)。这些资源包括任务定义、集群、任务、服务和容器实例。您可以使用您定义的值标记这些资源，以帮助您组织和识别它们。

例题：A delivery company needs to migrate its third-party route planning application to AWS. The third party supplies a supported Docker image from a public registry. The image can run in as many containers as required to generate the route map.
The company has divided the delivery area into sections with supply hubs so that delivery drivers travel the shortest distance possible from the hubs to the customers. To reduce the time necessary to generate route maps, each section uses its own set of Docker containers with a custom configuration that processes orders only in the section’s area.
The company needs the ability to allocate resources cost-effectively based on the number of running containers.
Which solution will meet these requirements with the LEAST operational overhead?
A. Create an Amazon Elastic Kubernetes Service (Amazon EKS) cluster on Amazon EC2. Use the Amazon EKS CLI to launch the planning application in pods by using the --tags option to assign a custom tag to the pod.
B. Create an Amazon Elastic Kubernetes Service (Amazon EKS) cluster on AWS Fargate. Use the Amazon EKS CLI to launch the planning application. Use the AWS CLI tag-resource API call to assign a custom tag to the pod.
C. Create an Amazon Elastic Container Service (Amazon ECS) cluster on Amazon EC2. Use the AWS CLI with run-tasks set to true to launch the planning application by using the --tags option to assign a custom tag to the task.
D. Create an Amazon Elastic Container Service (Amazon ECS) cluster on AWS Fargate. Use the AWS CLI run-task command and set enableECSManagedTags to true to launch the planning application. Use the --tags option to assign a custom tag to the task.
答案：D
答案解析：题目要求部署多区域的docker容器组，并且LEAST operational overhead。首先EKS需要投入比较多的操作和运维，因此排除A选项和B选项；使用Fargate更为简便。因此选择D选项

3.3 ECS的安全

IAM角色：一般需要2个角色，一个是EC2 Instance Role和IAM Task Role
集成SSM parameter store&secrets manager

3.4 ECS的网络

在这里插入图片描述
AWS中建议您使用 awsvpc 网络模式，除非您特别需要使用其他网络模式。

3.5 伸缩功能

ECS伸缩是ECS级别的伸缩，同样支持metric监控自动伸缩、定时伸缩。但需要注意的是如果你使用自己管理EC2，那么在做ECS伸缩时，出现资源不足，则需要手动自己添加EC2；而采用Fargate管理则更容易些，因为其本身自动帮你做EC2的扩缩。
另外，ECS集群现在支持更多的实例，如spot Instance、On-Demand Instance等。

3.6 ECS Anywhere

Amazon ECS Anywhere 支持向 Amazon ECS 群集注册外部实例，如本地部署服务器或虚拟机（VM）。外部实例针对生成出站流量或流程数据的运行应用程序进行了优化。
在这里插入图片描述

让你的实例运行更接近于现场
适合场景：现场机器学习、视频处理、数据处理

4 AWS Lambda

如果说计算服务的话，那么Lambda才算是真正的纯计算服务。在了解Lambda之前，建议先了解一下什么是serverless，这样对了解Lambda有较好的基础。
AWS Lambda 是一项计算服务，可使您无需预配置或管理服务器即可运行。意味着你只关心你的代码逻辑，根本不需要关心基础设施、操作系统、自动伸缩、容量等问题，这些在Lambda中会自动按需给你调配，也就是用多少算多少。
阿里云对标产品：函数计算 FC

4.1 集成多个服务

集成服务越多，意味着你编写函数更为方便，如果未在集成范围内，你可能要通过自己部署提供API给Lambda使用。
在这里插入图片描述

例题：A company runs a Python script on an Amazon EC2 instance to process data. The script runs every 10 minutes. The script ingests files from an Amazon S3 bucket and processes the files. On average, the script takes approximately 5 minutes to process each file The script will not reprocess a file that the script has already processed.
The company reviewed Amazon CloudWatch metrics and noticed that the EC2 instance is idle for approximately 40% of the time because of the file processing speed. The company wants to make the workload highly available and scalable. The company also wants to reduce long-term management overhead.
Which solution will meet these requirements MOST cost-effectively?
A. Migrate the data processing script to an AWS Lambda function. Use an S3 event notification to invoke the Lambda function to process the objects when the company uploads the objects.
B. Create an Amazon Simple Queue Service (Amazon SQS) queue. Configure Amazon S3 to send event notifications to the SQS queue. Create an EC2 Auto Scaling group with a minimum size of one instance. Update the data processing script to poll the SQS queue. Process the S3 objects that the SQS message identifies.
C. Migrate the data processing script to a container image. Run the data processing container on an EC2 instance. Configure the container to poll the S3 bucket for new objects and to process the resulting objects.
D. Migrate the data processing script to a container image that runs on Amazon Elastic Container Service (Amazon ECS) on AWS Fargate. Create an AWS Lambda function that calls the Fargate RunTaskAPI operation when the container processes the file. Use an S3 event notification to invoke the Lambda function.
答案：A
答案解析：题目要求将使用高可用、可扩展且最经济实惠的方式重构原先EC2运行一段脚本。A选项使用Lambda处理是一个最佳方式；B选项使用SQS并使用ASG的EC2处理消息也能做到高可用、可扩展，但比起Lambda来说不够经济实惠；C选项并没有改变本质存在的问题；D选项同样使用ECS的Fargate也是使得架构多余且不经济实惠。因此答案选择A选项。

4.2 Lambda的限制

RAM：128M~3G
CPU：由内存决定，2个CPU对应1.5GB
执行时间不能超过15分钟（注意考试中会出现执行时间超过15分钟，则不能使用Lambda）
/tmp临时存储512M
部署包不能超过250M
不能使用docker镜像

例题：A company is developing a new serverless API by using Amazon API Gateway and AWS Lambda. The company integrated the Lambda functions with API Gateway to use several shared libraries and custom classes.
A solutions architect needs to simplify the deployment of the solution and optimize for code reuse.
Which solution will meet these requirements?
A. Deploy the shared libraries and custom classes into a Docker image. Store the image in an S3 bucket. Create a Lambda layer that uses the Docker image as the source. Deploy the API’s Lambda functions as Zip packages. Configure the packages to use the Lambda layer.
B. Deploy the shared libraries and custom classes to a Docker image. Upload the image to Amazon Elastic Container Registry (Amazon ECR). Create a Lambda layer that uses the Docker image as the source. Deploy the API’s Lambda functions as Zip packages. Configure the packages to use the Lambda layer.
C. Deploy the shared libraries and custom classes to a Docker container in Amazon Elastic Container Service (Amazon ECS) by using the AWS Fargate launch type. Deploy the API’s Lambda functions as Zip packages. Configure the packages to use the deployed container as a Lambda layer.
D. Deploy the shared libraries, custom classes, and code for the API’s Lambda functions to a Docker image. Upload the image to Amazon Elastic Container Registry (Amazon ECR). Configure the API’s Lambda functions to use the Docker image as the deployment package.
答案：D
答案解析：A选项, B选项和C选项是错误的。AWS Lambda层不支持Docker镜像或已部署的容器作为源。参考：https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html

4.3 Lamda的安全

IAM：如果需要访问其他资源比如S3，可以提供S3的IAM role给Lambda
Policy：策略权限控制，基于JSON格式的权限控制，可以控制哪些资源比如SNS等可以使用你的Lambda

4.4 Lamda的网络

Lamda正常创建情况下，它是处于AWS网络中，但是不在于某一个VPC中，除非你在创建时选择启用VPC
在这里插入图片描述

不启用情况，你可以自由访问AWS其它Service（如S3、DynamoDB等），也可以访问互联网，但无法访问在某个VPC内部的资源，除非给VPC提供如PrivateLink服务
启用情况，意味着你的Lambda运行在某个VPC中，那么你可以访问该VPC内的资源，但是如果要访问AWS其它Service（如S3、DynamoDB等），则需要endpoint。如果要访问互联网则需要NAT。

4.5 三种调用方式

同步调用：当您同步调用某个函数时，Lambda 会运行该函数并等待响应。当函数完成时，Lambda 返回来自函数代码的响应以及其他数据。
异步调用：异步调用函数时，您不会等待函数代码的响应。您将事件交给 Lambda，Lambda 处理其余部分。您可以配置 Lambda 处理错误的方式，并将调用记录发送到下游资源。
事件源调用：事件源映射是一个从事件源读取并调用 Lambda 函数的 Lambda 资源。您可以使用事件源映射来处理未直接调用 Lambda 函数的服务中的流或队列中的项。以下是支持的事件源：
Amazon DynamoDB
Amazon Kinesis
Amazon MQ
Amazon Managed Streaming for Apache Kafka (Amazon MSK)
自行管理的 Apache Kafka
Amazon Simple Queue Service (Amazon SQS)
Amazon DocumentDB（与 MongoDB 兼容）（Amazon DocumentDB）

4.6 其它内容

X-Ray：可用于跟踪服务之间调用关系，并可视化展现。方便于查看调用问题。
目的地：可以配置一个目的地（如SNS、SQS等），作为处理结果接收点。
版本：Lambda部署本身是有版本，但你更新应用就等于你更新一个版本
Alias：可以为 Lambda 函数创建一个或多个别名。Lambda 别名类似于指向特定函数版本的指针。别名一般用于切换部署，比如和CodeDeploy一起使用做blue/green发布、金丝雀发布等

例题：A company built an application based on AWS Lambda deployed in an AWS CloudFormation stack. The last production release of the web application introduced an issue that resulted in an outage lasting several minutes. A solutions architect must adjust the deployment process to support a canary release.
Which solution will meet these requirements?
A. Create an alias for every new deployed version of the Lambda function. Use the AWS CLI update-alias command with the routing-config parameter to distribute the load.
B. Deploy the application into a new CloudFormation stack. Use an Amazon Route 53 weighted routing policy to distribute the load.
C. Create a version for every new deployed Lambda function. Use the AWS CLI update-function-configuration command with the routing-config parameter to distribute the load.
D. Configure AWS CodeDeploy and use CodeDeployDefault.OneAtATime in the Deployment configuration to distribute the load.
答案：A
答案解析：题目要求对Lambda做金丝雀发布，并且最好不要停机。最好方式就是通过别名；B选项和D选项没有解决出现问题还是会停机几分钟；C选项需要手工切换，同样也会停机几分钟。别名部署方案：https://docs.aws.amazon.com/zh_cn/lambda/latest/dg/configuration-aliases.html

例题：A company gives users the ability to upload images from a custom application. The upload process invokes an AWS Lambda function that processes and stores the image in an Amazon S3 bucket. The application invokes the Lambda function by using a specific function version ARN.
The Lambda function accepts image processing parameters by using environment variables. The company often adjusts the environment variables of the Lambda function to achieve optimal image processing output. The company tests different parameters and publishes a new function version with the updated environment variables after validating results. This update process also requires frequent changes to the custom application to invoke the new function version ARN. These changes cause interruptions for users.
A solutions architect needs to simplify this process to minimize disruption to users.
Which solution will meet these requirements with the LEAST operational overhead?
A. Directly modify the environment variables of the published Lambda function version. Use the SLATEST version to test image processing parameters.
B. Create an Amazon DynamoDB table to store the image processing parameters. Modify the Lambda function to retrieve the image processing parameters from the DynamoDB table.
C. Directly code the image processing parameters within the Lambda function and remove the environment variables. Publish a new function version when the company updates the parameters.
D. Create a Lambda function alias. Modify the client application to use the function alias ARN. Reconfigure the Lambda alias to point to new versions of the function when the company finishes testing.
答案：D
答案解析：通过使用函数别名，自定义应用程序调用最新版本的Lambda函数，而无需在公司每次更新图像处理参数时修改应用程序代码。这降低了导致用户中断的风险。因此选择D选项。

CodeDeploy：通过CodeDeploy可以自动做版本升级部署

4.7 经典架构

利用alias做应用程序版本升级

5 AWS Batch

AWS Batch是 AWS Cloud上运行批处理计算工作负载。AWS Batch与传统的批处理计算软件类似，消除了配置和管理所需基础架构的无差别繁重工作。AWS Batch最终运行于EC2实例上，只需要对EC2付费，但是你又无需管理底层基础设施。与Lambda有点类似，但是却与之有很大不同。
阿里云对标产品：批量计算

5.1 与Lambda区别

Lambda
时间限制
次数限制
临时硬盘限制
serverless
AWS Batch
没有时间限制
使用的是docker镜像
可挂载EBS
并非serverless，付费的还是EC2

5.2 不同的计算环境

使用 AWS Fargate 资源创建托管计算环境：直接使用已经创建的Fargate集群作为基础
使用 EC2 资源创建托管计算环境：设置EC2的属性后，由AWS托管自动伸缩EC2实例
使用 EC2 资源创建非托管计算环境：添加已创建的EC2实例，并非AWS托管
使用 EKS 资源创建托管计算环境：直接使用已创建的EKS集群作为基础

5.3 经典架构

在这里插入图片描述

6 AWS Elastic Beanstalk

AWS Elastic Beanstalk并非计算资源，它只是AWS云中快速部署和管理应用程序，之所以放在此是因为它一般就是用于快速创建计算资源。AWS中存在上百种不同的服务，如果你从头开始部署这些服务以及服务之间关系，会比较繁重。借助 Elastic Beanstalk，您可以在AWS云中快速部署和管理应用程序，而不必了解运行这些应用程序的基础设施。Elastic Beanstalk 可降低管理的复杂性，但不会影响选择或控制。您只需上传应用程序，Elastic Beanstalk 将自动处理有关容量预配置、负载均衡、扩展和应用程序运行状况监控的部署细节。
阿里云对标产品：轻量应用服务器或者EDAS

6.1 支持的平台

Docker
Go
Java SE
Tomcat
.NET Core on Linux
Windows Server 上的 .NET
Node.js
PHP
Python
Ruby

6.2 部署架构

Single-instance environment（单实例环境）
ASG only（伸缩组）
Load-balanced, scalable environment（负载平衡、可扩展的环境）

例题：A company developed a pilot application by using AWS Elastic Beanstalk and Java. To save costs during development, the company’s development team deployed the application into a single-instance environment. Recent tests indicate that the application consumes more CPU than expected. CPU utilization is regularly greater than 85%, which causes some performance bottlenecks.
A solutions architect must mitigate the performance issues before the company launches the application to production.
Which solution will meet these requirements with the LEAST operational overhead?
A. Create a new Elastic Beanstalk application. Select a load-balanced environment type. Select all Availability Zones. Add a scale-out rule that will run if the maximum CPU utilization is over 85% for 5 minutes.
B. Create a second Elastic Beanstalk environment. Apply the traffic-splitting deployment policy. Specify a percentage of incoming traffic to direct to the new environment in the average CPU utilization is over 85% for 5 minutes.
C. Modify the existing environment’s capacity configuration to use a load-balanced environment type. Select all Availability Zones. Add a scale-out rule that will run if the average CPU utilization is over 85% for 5 minutes.
D. Select the Rebuild environment action with the load balancing option. Select an Availability Zones. Add a scale-out rule that will run if the sum CPU utilization is over 85% for 5 minutes.
答案：C
答案解析：题目提出通过Beanstalk 部署的环境出现性能问题。因此在Beanstalk 中有3种不同的部署架构。Load-balanced, scalable environment是适合生产环境。因此选择C选项。

6.4 经典架构

利用Beanstalk部署2层服务
利用Beanstalk实现蓝绿发布

例题：A Solutions Architect must update an application environment within AWS Elastic Beanstalk using a blue/green deployment methodology. The Solutions Architect creates an environment that is identical to the existing application environment and deploys the application to the new environment.
What should be done next to complete the update?
A. Redirect to the new environment using Amazon Route 53
B. Select the Swap Environment URLs option
C. Replace the Auto Scaling launch configuration
D. Update the DNS records to point to the green environment
答案：B
答案解析：参考：https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.CNAMESwap.html

7 Amazon ECR

Amazon Elastic Container Registry (Amazon ECR) 是 AWS 托管容器映像注册表服务，它安全、可扩展且可靠。Amazon ECR 支持私有存储库，其具有使用 AWS IAM 的基于资源的权限。简单来讲，就是一个镜像托管平台。它不是什么计算资源，但是与计算资源之间的关系比较紧密，不然ECS等，你学习其他就必须先了解它，因此将其放在这里。

支持私有和公有镜像仓库
跨区域复制（可以设置跨区域复制，意味着你上传一个镜像，可以在其他区域被使用）
镜像扫描
1）ECR自带的镜像扫描
2）集成Inspector扫描（扫描能力会比ECR自带的增强）

例题：A company is running a containerized application in the AWS Cloud. The application is running by using Amazon Elastic Container Service (Amazon ECS) on a set of Amazon EC2 instances. The EC2 instances run in an Auto Scaling group.
The company uses Amazon Elastic Container Registry (Amazon ECR) to store its container images. When a new image version is uploaded, the new image version receives a unique tag.
The company needs a solution that inspects new image versions for common vulnerabilities and exposures. The solution must automatically delete new image tags that have Critical or High severity findings. The solution also must notify the development team when such a deletion occurs.
Which solution meets these requirements?
A. Configure scan on push on the repository. Use Amazon EventBridge (Amazon CloudWatch Events) to invoke an AWS Step Functions state machine when a scan is complete for images that have Critical or High severity findings. Use the Step Functions state machine to delete the image tag for those images and to notify the development team through Amazon Simple Notification Service (Amazon SNS).
B. Configure scan on push on the repository. Configure scan results to be pushed to an Amazon Simple Queue Service (Amazon SQS) queue. Invoke an AWS Lambda function when a new message is added to the SQS queue. Use the Lambda function to delete the image tag for images that have Critical or High severity findings. Notify the development team by using Amazon Simple Email Service (Amazon SES).
C. Schedule an AWS Lambda function to start a manual image scan every hour. Configure Amazon EventBridge (Amazon CloudWatch Events) to invoke another Lambda function when a scan is complete. Use the second Lambda function to delete the image tag for images that have Critical or High severity findings. Notify the development team by using Amazon Simple Notification Service (Amazon SNS).
D. Configure periodic image scan on the repository. Configure scan results to be added to an Amazon Simple Queue Service (Amazon SQS) queue. Invoke an AWS Step Functions state machine when a new message is added to the SQS queue. Use the Step Functions state machine to delete the image tag for images that have Critical or High severity findings. Notify the development team by using Amazon Simple Email Service (Amazon SES).
答案：A
答案解析：题目要求对ECR的镜像进行扫描，如果发现漏洞发送通知。参考：https://docs.aws.amazon.com/AmazonECR/latest/userguide/image-scanning.html#scanning-repository。里面如何配置ECR镜像扫描，因此选择A选项

8 Amazon EKS

8.1 Kubernetes

了解EKS之前，先要知道Kubernetes，Kubernetes 是一个开源系统，用于自动管理、扩展和部署容器化应用程序，也就是和AWS中的ECS很像，通过容器化模式运行应用程序，只不过AWS ECS是不开源的，而Kubernetes是开源的，并且Kubernetes很多标准都成为容器管理的标准。如果想深入了解Kubernetes，可以自行找一下资料。
在这里插入图片描述
上图为Kubernetes 的一个架构图，你可以看不懂，但是对于Kubernetes 有几个概念要弄清楚

Node：工作主机，可以是一个物理机器、一个虚拟机，理解为一个物理节点。
Pod：是kubernetes的基本操作单元。包含多个容器，可以理解为一个“逻辑宿主”。
Data Volume：一个 volume 就是一个目录，可能包含一些数据，kubernetes里面的volume是和pod的生命周期一致。不同类型的volume作用、应用场景和配置不一样。
通过几个概念，你简单理解Kubernetes会管理很多的Node（物理节点，在AWS中可能就是一堆EC2），然后你的应用程序部署在Kubernetes上时，Kubernetes 会创建Pod，Pod里面包括的容器就是你的程序。

8.2 Amazon EKS

Amazon Elastic Kubernetes Service（Amazon EKS）是一项托管服务，无需在 Amazon Web Services (AWS) 上安装、操作和维护自己的 Kubernetes 控制面板。了解了Kubernetes，那么了解EKS就容易多了，它就是AWS上面给你部署好的Kubernetes。
在这里插入图片描述

Node Types
1）Manager Node Groups： EKS Kubernetes 集群自动对节点（Amazon EC2 实例）进行预置和生命周期管理。
2）Self-managed nodes：一个集群包含一个或多个在其上调度了 Pods 的 Amazon EC2 节点。也就是里面的Node完全有自己加入删除等管理
3）Fargate：Fargate 是一种为容器按需提供大小合适的计算容量的技术。使用 Fargate，您不必再自己预置、配置或扩展虚拟机组即可运行容器。您无需再选择服务器类型、确定扩展节点组的时间和优化集群打包。（注意：考试中如果出现不行做过多的基础设施管理，基本上就选择Fargate ）
Data Volume
1）EBS
2）EFS（注意：只有这个存储类型支持Fargate）
3）FSx for Lustre
4）FSx for NetApp ONTAP

8.3 EKS Anywhere

EKS Anywhere提供了一种管理Kubernetes集群的方法，使用与AWS用于其Amazon Elastic Kubernetes服务(Amazon EKS)相同的卓越运营和实践。简单说就是通过安装EKS Anywhere，然后可以通过AWS EKS控制台控制你本地数据中心的K8S集群。提供2种模式连接：
1）Fully Connected&Partially Disconnected：可以通过AWS的EKS控制台控制你的集群
2）Fully Disconnected：集群与AWS EKS无关，但是不通过AWS EKS控制台控制，而是由你自己控制台控制（这样做好处之一就是迁移AWS云会比较方便）