【英语文章-阅读分享-201406[01]】Google发现:使用集中式和分布式数据架构比完全的分散式架构工作得更好

Google Finds: Centralized Control, Distributed Data Architectures Work Better Than Fully Decentralized Architectures


For years a war has been fought in the software architecture trenches between the ideal of decentralized services and the power and practicality of centralized services. Centralized architectures, at least at the management and control plane level, are winning. And Google not only agrees, they are enthusiastic adopters of this model, even in places you don't think it should work.

[这些年来,一直上演着一场在完美去中心化(分散式)服务与高效实用的中心化(集中式)服务之间的软件架构之争。在管理和控制层面,中心化架构成功占优。并且,Google不仅仅认同中心化架构,他们还是这种模型的狂热使用份子,包括那些你认为这种模型不管用的领域]

Here's an excerpt from Google Lifts Veil On “Andromeda” Virtual Networking, an excellent article by Timothy Morgan, that includes a money quote from Amin Vahdat, distinguished engineer and technical lead for networking at Google:

[下面是从Google Lifts Veil On “Andromeda” Virtual Networking摘录的段落,这是一篇由Timothy Morgan写的精品文章,其中还包含了来自Google著名工程师Amin Vahdat在Google担任网络方面的技术主管)的话]

Like many of the massive services that Google has created, the Andromeda network has centralized control. By the way, so did the Google File System and the MapReduce scheduler that gave rise to Hadoop when it was mimicked, so did the BigTable NoSQL data store that has spawned a number of quasi-clones, and even the B4 WAN and the Spanner distributed file system that have yet to be cloned.
[同Google创建的大量其他服务类似, Andromeda 网络是中心化(集中式)控制的。顺便说下,过去GFS和MapReduce被模仿而诞生了Hadoop{Hadoop是GFS和MR的开源实现},也有一些类似于BigTable NoSQl数据存储的准克隆{现在非常多的NoSQL数据库},还包括B4 WAN和Spanner分布式文件系统也已被复制]{Google的这些软件都有了相应的开源实现}

"What we have seen is that a logically centralized, hierarchical control plane with a peer-to-peer data plane beats full decentralization,” explained Vahdat in his keynote. “All of these flew in the face of conventional wisdom,” he continued, referring to all of those projects above, and added that everyone was shocked back in 2002 that Google would, for instance, build a large-scale storage system like GFS with centralized control. “We are actually pretty confident in the design pattern at this point. We can build a fundamentally more efficient system by prudently leveraging centralization rather than trying to manage things in a peer-to-peer, decentralized manner.

[“我们可以看到,一个逻辑上中心化、分层控制带有P2P数据层面的服务是全面优于完全的去中心化服务的”,这阐明了Vahdat 的主旨,他继续说到“这些显得有悖于常理”,[关于上面所有的项目,其中的任何一个(比如大规模存储系统GFS),就算回到2002年Google都会采用集中式控制],“我们对自己这个点上的设计模式是相当自信的,我们可以谨慎的去构建的一个根本上更高效的集中式系统,而非习惯性的去管理一个P2P、分散式系统”]

The context of the article is Google's impressive home brew SDN (software defined network) system that uses a centralized control architecture instead of the Internet's decentralized Autonomous System model, which thinks of the Internet as individual islands that connect using routing protocols.

[令人印象深刻的Google SDN系统使用了集中式架构取代了Internet 的分散自发式模型,并且认为Internet类似于通过路由协议连接的不同岛屿]

SDN completely changes that model as explained by Greg Ferro:

The major difference between SDN and traditional networking lies in the model of controller-based networking. In a software-defined network, a centralized controller has a complete end-to-end view of the entire network, and knowledge of all network paths and device capabilities resides in a single application. As a result, the controller can calculate paths based on both source and destination addresses; use different network paths for different traffic types; and react quickly to changing networking conditions. 

[SDN与传统分层网络之间最大的不同在于基于控制的网络模型。SDN是一个集中式控制、可以端到端查看到整个网络、网络路径和设备性能等知识(信息)都保存在一个统一的应用程序里面的网络。这些所带来的结果就是,控制者可以通过源地址和目的地址计算出路径;不同的通信类型可以使用不同的网络路径;还能对网络条件的变化做出快速响应]

In addition to delivering these features, the controller serves as a single point of configuration. This full programmability of the entire network from a single location, which finally enables network automation, is the most valuable aspect of SDN.

[除了提供这些特性,这些控制服务器作为单点配置[??]。从单点位置出发的、完全可编程的整个网络,最终实现网络自适应,这也是SDN的最大价值所在]

So a centralized controller knows all and sees all and hard wires routes by directly programming routers. In the olden says slow BGP convergence times after a fault was detected would kill performance. With your own SDN on your hardware failure response times can be immediate, as the centralized controller will program routers with a possibly precalculated alternative route. This is a key feature for today's cloud based systems that demand highly available, low latency connections, even across the WAN. 

[可编程路由,在硬件故障后也可以立即通过控制器切换到使用预先计算好的备用路由,对应当今云系统的高可用、低延时、跨广域网需求,这是一个关键特性]

Does this mean the controller is a single process? Not at all. It's logically centralized, but may be split up among numerous machines as is typical in any service architecture. This is how it can scale. With today's big iron, big memory, and fast networks the motivation for adopting a completely decentralized architecture for capacity reasons is not compelling except for all but the largest problems.

[控制器并不是只有一个真实的进程,这是逻辑上的集中式,类似于传统的服务架构可以分割到很多的物理机器上,这也是其可伸缩性的原因。在大铁块(cpu??)、大内存、高速网络的时代,如果因为容量(扩展能力??)的原因而去适配非集中式架构,这不是那么引人注目,但却是最大的问题所在??]

At Internet scale, the Autonomous System model of being logically and physically decentralized is still a win, it can scale wonderfully, but at the price of high coordination costs and slow reactions times. Which was fine in the past, but doesn't work for today's networking needs.

Google isn't running an Internet. They are running a special purpose network for their own particular portfolio of needs. Why should they use an over generalized technology meant for a completely different purpose?

[无论是逻辑上还是物理上的非集中式架构,其在Internet扩展性、自适应性方面依然保持优势,虽然有极好的扩展性,但是也存在高协调成本和慢相应时间。这在过去还可以,但不适应当今的网络需求了。]

We Can See Centralization Winning In The Services That People Choose To Use.

Email and NNTP, both fully decentralized services, while not dead by any means, have given way to centralized services like Twitter, Facebook, G+, WhatsApp, and push notifications. While decentralization plays an important part in the back-end of most every software service, the services themselves are logically centralized. 

[人们也更多的选择使用集中式架构提供的各种服务

电邮、网络新闻传输协议都是非集中式服务,虽然没有消失,但是已经被集中式服务取代如Twitter, FB, G+, WhatsApp和消息推送。虽然分散式在几乎所有的软件后端服务扮演重要的部分,但是服务本身又是逻辑上集中的]

Centralization makes a lot of things easier. Search, for example. If you want great search you need all the data in one place. That's why Google crawls the web and stashes it in their very large back pocket. Identity is a dish best served centralized. As are things like follow lists, joins, profiles, A/B testing, frequent pushes, iterative design, fraud detection, DDoS mitigation, deep learning, and virtually any kind of high value add feature you want to create.

[集中式让很多事情变得更简单。拿搜索举例来说,如果你想要在一个地方就能搜索到你想要的所有数据,Google爬取web并存储在他们的巨大包里。一致性是最好的集中服务??连接、配置文件、A/B测试(网站优化方案选取的一种测试方法)、频繁推送、迭代设计、欺诈检测、缓解分布式拒绝服务攻击、深度学习等几乎所有的你想创建的有价值的特性]

Also, having a remote entity not under your control as a key component to your product is inviting a high latency and a variable user experience due to failures. Not something you want in your service. End-to-end control is key for creating an experience.

So when you argue for a fully decentralized architecture it's hard to argue based on features or scalability, you have to look elsewhere.

[也许,存在一个远程的实体作为一个产品的组件而不在你的控制下,访问延迟高导致失败的用户体验,这不是你想要的服务,那么端到端的控制是创建体验的关键点??

在特性或扩展性方面去争论完全的分散式架构师困难的,应该看得更广一点]

Decentralization Is Also A Political Choice.

Attempts to make a decentralized or federated Twitter service, for example, while technically feasible, have not busted out into general adoption. The simple reason is centralization works and as a user what you want is something that works. That's primary. Secondary qualities like security, owning your own data, resilience, free speech, etc. while of great importance to some, barely register as issues to the many.

But for the few, these secondary qualities are exactly what they prize the most. Doc Searls in articles like Escaping the Black Holes of Centralization makes the case that decentralization is important for human rights and personal sovereignty reasons. A fully distributed and encrypted P2P chat system is a lot harder to compromise than a centralized service run by a large faceless corporation. 

[分散式仍然是一个政治选择??

试图去创建一个集中式或者联合的Twitter服务,虽然技术上可行,但还未被普遍采用。原因很简单,集中式可工作且是按照你想要的方式在工作,这是首要的,其次如安全、自我自己的数据、弹性、言论自由等等,这些主要的,已经其他众多潜在的原因??

也有少量的,第二点也正好是他们最崇尚的。Doc Searls在《逃离集中式的黑洞》中指出非集中式对于人权和个人主权非常重要。一个完全的分布式和加密的P2P聊天系统比被大量不知名公司运行着的集中式服务更难妥协]

When You Are Thinking About The Architecture Of Your Own System...

If it is for personal sovereignty purposes, or it operates at Internet or inter-planetary scale, or it must otherwise operate autonomously then federation is your friend.

If your system is smallish then a completely centralized architecture is still quite attractive.

For the vast middle ground Google has shown centralized management and control combined with distributed data is probably now the canonical architecture. Don't get caught up trying to make distributed everything work. You probably don't need it and it's really really hard.

But then again Oceania has always been at war with Eastasia.

个人小结 by mlinlcnan

1、从本文了解到SDN的基础——与传统分散式、灵活、易扩展、自适应网络相异,是集中式可控制、可编程的。对于大型互联网公司,他们的数据中心建设和维护是占了成本的大头,同时数据中心内部和之间的通信(带宽需求)要求也是非常高的,传统Internet的方式[灵活]无法满足需要——感觉这是一个化繁为简的过程,Internet需要灵活(对应复杂),因为要实现全球各个终端的互联(是不是不可控呢,但又为什么需要控制呢?!,这么多年不是运行得挺好),互联网公司的数据中心相反,需要通过控制来实现高带宽、高服务质量,充分发挥数据中心的计算和服务能力(成本呐)

2、软件为王的时代了么,SDN(软件定义网络)、SDS(软件定义存储),还有NFV(网络功能虚拟化),这也符合事物发展和人们追求的普遍规律——由繁到简

3、在分布式系统中,那个控制者(Master)往往最容易成为整个系统的瓶颈——在于怎么取舍了——目前经历的这个分布式DB项目,在做数据迁移(扩容/缩容)时要求不阻塞业务(感觉比较高大上),这个方案设计和实现的复杂度就翻倍了,现在要交付了,但是最早设计的方案不完善,导致有部分场景无法支持,现在改方案改代码中——而淘宝开源的tair分布式K/V DB在做数据迁移时是阻塞式的,也可以用的好好的

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值