Data Pipelines

Understanding Data Pipelines

 

AWS Data Pipeline is a web service that provides a simple management system for data-driven workflows. Using AWS Data Pipeline, you define a pipeline composed of the “data sources” that contain your data, the “activities” or business logic such as EMR jobs or SQL queries, and the “schedule” on which your business logic executes. For example, you could define a job that, every hour, runs an Amazon Elastic MapReduce (Amazon EMR)–based analysis on that hour’s Amazon Simple Storage Service (Amazon S3) log data, loads the results into a relational database for future lookup, and then automatically sends you a daily summary email.

                                                                                ----- From AWS

 

可见Data Pipeline是一个作业级的Resource Manager+Scheduler+Summary Reporter。与传统的Pipeline概念相比,他们基本的思想是一致的。

AmazonAWS data pipeline处理:

l  作业调度,执行和失败重启逻辑

l  跟踪业务逻辑之间的依赖关系,保证在执行作业之前,其所有依赖条件都满足

l  生成并发送必要的失败通知

l  创建并管理作业所需要的零时资源

为了保证activity的顺利执行,AWS Data Pipeline会检测该activity所需的所有资源(在AWS网站上仅仅提到了data,我认为这里还可以包括CPU,网络带宽等其他资源)。这项资源可用性检查叫做“precondition”,在检查通过之前activity会被阻塞。

在用户接口方面AWS Data Pipeline提供了:

l  Management Console

l  CLI

l  Service APIs (defining data source, preconditions, activities, the scheduler and notification levels)

 

 

Data Pipeline Components

 

Data Collection: 用于将数据从数据源向存储地点的传输

Data Acquisition: 用于从不同的外部数据源获取数据

Data Storage: 存储系统

Data Processing: The ability to transform data in various useful ways including annotation, filtering and aggregation

Table Management/Meta Data: Provide a consistent API for data consumers with a standard metadata system

Job Coordination/Scheduling: Ability to schedule, submits, manage, retry, reprocess, catch up a DAG

Data Output: Enables push or Pull based delivery of data subject to policies

Data Policy Management: Anonymize, retain, clean up and archive data

Monitoring/System Management: Provide the ability to operate, visualize and install pipelines

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值