41. AWS Data Pipeline

最新推荐文章于 2024-05-21 10:09:18 发布

JessicaWind

最新推荐文章于 2024-05-21 10:09:18 发布

阅读量199

点赞数

分类专栏： AWS Certification # AWS Analytics 文章标签： aws

本文链接：https://blog.csdn.net/meiyubaihe/article/details/121362029

版权

Overview

AWS Data Pipeline is a web service that you can use to automate the movement and transformation of data.
With AWS Data Pipeline, you can define data-driven workflows, so that tasks can be dependent on the successful completion of previous tasks.
You define the parameters of your data transformations and AWS Data Pipeline enforces the logic that you've set up.

A pipeline definition specifies the business logic of your data management.
A pipeline schedules and runs tasks by creating Amazon EC2 instances to perform the defined work activities.
Task Runner polls for tasks and then performs those tasks.

AWS Data Pipeline works with the following services to store data.
- Amazon DynamoDB
- Amazon RDS
- Amazon Redshift
- Amazon S3
AWS Data Pipeline works with the following compute services to transform data.
- Amazon EC2
- Amazon EMR

A pipeline definition is how you communicate your business logic to AWS Data Pipeline. It contains the following information:
- Names, locations, and formats of your data sources
- Activities that transform the data
- The schedule for those activities
- Resources that run your activities and preconditions
- Preconditions that must be satisfied before the activities can be scheduled
- Ways to alert you with status updates as pipeline execution proceeds
From your pipeline definition, AWS Data Pipeline determines the tasks, schedules them, and assigns them to task runners.

Pipeline Components
- Pipeline components represent the business logic of the pipeline and are represented by the different sections of a pipeline definition.
- Pipeline components specify the data sources, activities, schedule, and preconditions of the workflow.