目录
Overview
what is Apache NiFi
put simply, NIFI was the built to automate the flow of data between systems.
简单来说,NIFI是用来构建不同系统之间的数据流自动化.
While the term dataflow is used in a variety of context, we use it here to mean the automated and managed flow of information between systems.
虽然术语数据流在各种上下文中使用,但我们在这里使用它来表示系统之间的自动化和管理信息流。
This problem space has been around ever since enterprises had more than one system, where some of the systems created data and some of systems consumed data.
自从企业有多个系统以来,这个问题就一直存在, 一些系统产生数据,一些系统消费数据
the problems and solution patterns that emerged have been discussed and articulated extensively.
这个问题和解决方案模式已经被广泛的讨论和阐明。
A comprehensive and readily consumed form is found in the enterprise integration patterns[eip].
一个全面而且易于使用的形式在企业集成模型中发现。
some of the high-level challenges of dataflow include:
一些高级别数据流的挑战如下:
Systems fail
系统故障
networks fail, disks fail, software crashes, people make mistakes.
网络故障,硬盘故障,软件崩溃,认为操作失误
data access exceeds capacity to consume
舒服访问超过可消费的上限
sometimes a give data source can outpace some part of the processing or delivery chain-it only takes one weak-link to have an issue.
有时,给定数据源的速度可能会超过处理或交付链的某些部分——只需要一个薄弱环节就会出现问题。
Boundary conditions are mere suggestions
边界条件只是建议
You will invariably get data that is too big,too small,too fast,too slow,corrupt,wrong,or in the wrong format.
您总是会得到太大、太小、太快、太慢、损坏、错误或格式错误的数据。
What is noise one day becomes signal the next
Priorities of an organization change - rapidly. Enabling new flows and changing existing ones must be fast.
组织优先级迅速变化,启用新数据流和更新现有的数据流一定要迅速
System evolve at different rates
系统以不同的速率发展
The protocols and formats used by a given system can change anytime and often irrespective of the systems around them.
给定系统的协议和格式经常在任何时间经常更改,而且不管周围有什么系统.
Dataflow exists to connect what is essentially a massively distributed system of components that are loosely or not-at-all designed to work together.
数据流的存在本质上是为了连接大型分布式系统中连接松弛的或者不用一起工作的组件.
compliance and security
合规性和安全
Laws,regulations,and policies change. Business to business agreements changes. System to system and system to user interactions must be secure,trusted,accountable.
法律,法规和政策改变,企业之间的协议发生变化,系统直接或者系统与用户之间的交互必须是安全的,可信的,负责的.
Continuous improvement occurs in production
生产环境中发生持续改进
it is often not possible to com even close to replicating production environments in the lab.
甚至在测试的时候复制生产环境也是经常不可能的
Over the years dataflow has been one of those necessary evils in an architecture.
多年以来数据流在技术架构中已经成为很重要的弊端
now there are number of active and rapidly evolving movements making dataflow a lot more interesting and a lot more vital to the success of a given enterprise.
现在有了很多快速的发展进步使数据流变得更有趣,对企业成功也起到了更重要的作用
these include things like; service oriented architecture[soa],the rise of the API, internet of things, and Big Date.
这些包含像SOA,API,IOT,bigdata.
In addition, the level of rigor necessary for compliance,privacy,and security is constantly on the rise.
除此之外,合规,隐私和安全的严格程度也在不断提高.
Even still with all of these new concepts coming about, the patterns and needs of dataflow are still largely the same.
尽管很有多新概念产生,数据流的模式和需求大体上还是一样的.
the primary differences then are the scope of complexity, the rate of change necessary to adapt, and that at scale the edge case becomes common occurrence.
主要是区别还是复杂程度,适应需求改变速率和扩容边界变得很普遍