From DistributedSystemsConceptsandDesign (5th Edition)

a distributed system as one in which hardware or software components
located at networked computers communicate and coordinate their actions only by
passing messages.

分布式系统的定义 :分布式系统由通过网络连接的计算机硬件和软件组件组成,并只能通过发送消息的方式进行通信和协调行为。


分布式系统的特点
(1)并发性,各个程序在不同计算机上独立运行
(2)无全局时钟,无法提供精确的全局一致性时钟
(3)独立失败性, 各个软件和硬件模块都有可能失败。比如某台计算机的电源,连接部分计算机的交换机或者某台计算机的系统crash


分布式系统的目的
共享资源。(抽象概念,比如说硬件方面硬盘资源的共享, 软件方面文件或者数据库的共享, 视频共享等)

分布式系统的多样性和例子
通过分布式定义和目的可知分布式系统可以提供多样的应用。

(1)搜索引擎(google),
特点:大量由网络连接的计算机, 分布式文件系统用来支持大文件和优化查询,特别是稳定高速从文件中读取数据, 结构化分布式存储系统提供快速访问大的数据集,锁服务提供分布式系统所需的分布式锁和分布式协商,一种编程模型来支持在分布式系统中进行大规模并发计算

(2)MMOGs(大型多人在线游戏),
特点:系统需要实时处理并广播游戏事件到所有玩家并且维护一致性的系统状态(所有玩家共享的虚拟游戏世界)。
a> 模型一,EVE Online 整个系统状态由一台集中的server维护(大型机)。单独一份状态简化一致性模型。系统的核心在于如何快速响应请求事件。通过将load分布到单独的“star system”???
b> 模型二,大量基于网络连接的服务器,玩家用户被分配到特定的一个服务器上。
c> 模型三,研究者在探讨使用去中心化的P2P技术来实现MMOGs ???

(3)金融交易系统
特点:实时访问大量信息,比如汇率价格和趋势,经济和政治趋势等。以可靠的方式实时发送相关事件给感兴趣的客户,比如价格和最新失业率等。(基于事件的分布式系统,第六章单独介绍)

从不同信息源得到的事件首先通过FIX Adapter转换成统一的内部消息格式。
事件以较快的速度达到并需要被实时处理以用来检测相关模型,使得可以产生交易的机会。
wKiom1Mxc4rCz7nwAAGa-8LJFg4257.jpg
当ms股票波动范围超过2%的滑动平均并且接下来我的所有股票上涨0.5%加上HP的股票上升5%或者MS下降2%。如果持续两分钟,将买进ms并且卖出HP
wKioL1Mxc5SR6dgGAAGm5A2yhis806.jpg



分布式系统的趋势
(1)多种网络技术,
    wifi, 3G, 蓝牙,超宽带
(2)多种普适计算,
    笔记本,智能手机,可穿戴设备(智能手表),嵌入式系统(车载系统,电脑冰箱),GPS, 传感器。
(3)多媒体服务需求的增加
分布式系统用来支持存储,传输和播放(多种媒体类型,比如图片,文字,音频和视频)。应用的例子有网络电话(skype,基于p2p的网络电话), webcasting
需求:支持多种格式(编码和加密), 保证QoS, 资源管理机制,QoS适应策略(高清,标清针对不同带宽)
(4)将分布式系统作为基础环境(cloud, everything as a service)
网盘, AWS计算节点,虚拟机,Google Apps,
wKiom1Mxc_XBSOgkAAFOUCpM04M951.jpg

分布式系统的挑战
(1)异构性
多种网络,计算机硬件,不同操作系统,编程语言和多个开发者。
Middleware The termmiddlewareapplies to a software layer that provides a
programming abstraction as well as masking the heterogeneity of the underlying
networks, hardware, operating systems and programming languages.
Heterogeneity and mobile code The termmobile codeis used to refer to program code
that can be transferred from one computer to another and run at the destination – Java
applets are an example. 虚拟机, javascript也属于此范畴

(2)开放性
The openness of distributed systems is determined primarily by the degree to which new resource-sharing services can be added and be made available for use by a variety of client programs.

the key interfaces arepublished, 比如RFC, http规范。开发者自己定义的接口

Open systems are characterized by the fact that their key interfaces are published.
Open distributed systems are based on the provision of a uniform communication
mechanism and published interfaces for access to shared resources.
Open distributed systems can be constructed from heterogeneous hardware and
software, possibly from different vendors.

(3)安全性
Security for information resources has three components:
confidentiality(protection against disclosure to unauthorized individuals),
integrity (protection against alteration or corruption),
通过加密方式来包含confidentiality(公私钥认证)和integrity (对称加密),例子ssh和https

availability (protection against interference with the means to access the resources).
Denial of service attacks,操作大量僵尸机***网站 (第三章)
Security of mobile code, 邮件附件中携带的非法程序(第十一章)

(4)扩展性
A system is described asscalableif it will remain effective when there is a significant increase in the number of resources and the number of users.

The design of scalable distributed systems presents the following challenges:
a> Controlling the cost of physical resources
资源和用户成O(n)的线性比例关系
In general, for a system withnusers to be scalable, the quantity of physical resources required to support them should be at most O( n ) – that is, proportional to n . For example, if a single file server can support 20 users, then two such servers should be able to support 40 users.

b>   Controlling the performance loss
随着用户的增加,控制性能损失为O(logn)
DNS最初由单台机器响应所有处理请求。之后改进为层级树形结构。
the time taken to access hierarchically structured data is O( log n ), where n is the size of the set of data. For a
system to be scalable, the maximum performance loss should be no worse than this.

c> Preventing software resources running out
IPv4地址被用完。当然过度预估未来增加比迭代更新更差。

d> Avoiding performance bottlenecks
(全局性)分布式算法需要去中心话以避免性能瓶颈的问题。比如DNS
(局部性)另外对于访问频发的热点资源进行复制和cache来提高大并发使用的性能。

The issue of scale is a dominant theme in the development of distributed systems.
(第十八章,复制;第二和十二章,cache)

(5)错误处理
进程和网络错误(第二章)

Failures in a distributed system are partial – that is, some components fail while others continue to function. Therefore the handling of failures is particularly difficult
a>   Detecting failures,
The challenge is to manage in the presence of failures that cannot be detected but may be suspected。 比如远程系统crash或者网络拥塞或者OS繁忙
b> Masking failures,
比如TCP消息重传机制,文件写到多块磁盘作为恢复手段,  IP failover, active/standby 切换
c> Tolerating failures ,
比如在客户端容忍失败,web browser, 重连。 服务器端通过redundancy来容忍失败
d> Recovery from failures,
坏数据(比如文件)的恢复(校验码), 软件升级后失败(rolled back)
e > Redundancy,
复制和冗余。 多个路由器,DNS复制到多个服务器, 数据库复制到多个服务器
The design of effective techniques for keeping replicas of rapidly changing data upto- date without excessive loss of performance is a challenge. Approaches are discussed in Chapter 18.

MBTF formula ??(99.999%)

(6)并发性
多用户同时访问资源。需要通过同步(锁)来保证并发(比如多线程)修改共享资源

(7) Transparency
Transparency is defined as the concealment from the user and the application programmer of the separation of components in a distributed system, so that the system is perceived as a whole rather than as a collection of independent components.
透明性是指分布式系统中各个单独模块对用户和应用程序员的封装性。

Access transparencyenables local and remote resources to be accessed using identical operations.
本地调用和远程调用使用相同的操作(RPC)

Location transparencyenables resources to be accessed without knowledge of their physical or network location (for example, which building or IP address).
访问资源不需要知道物理网络的知识(不需要知道IP地址或者对应机器)

Concurrency transparencyenables several processes to operate concurrently using shared resources without interference between them.
不需要显式干涉对共享资源的多个访问

Replication transparencyenables multiple instances of resources to be used to increase reliability and performance without knowledge of the replicas by users or application programmers.
对于用户和应用程序员不需要知道访问的资源实例是否为复本

Failure transparencyenables the concealment of faults, allowing users and application programs to complete their tasks despite the failure of hardware or software components.
封装错误,使得用户和应用程序在硬件和软件失败的情况下也可以完成所需的任务。

Mobility transparencyallows the movement of resources and clients within a system without affecting the operation of users or programs.
移动资源而不影响用户和程序操作。

Performance transparencyallows the system to be reconfigured to improve performance as loads vary.
系统可以支持针对当前负载进行重新配置

Scaling transparencyallows the system and applications to expand in scale without change to the system structure or the application algorithms.
扩展容量不需要对系统结构和应用算法进行修改

The two most important transparencies are access and location transparency; their presence or absence most strongly affects the utilization of distributed resources. They are sometimes referred to together as network transparency .
access transparency的例子:访问本地磁盘文件和SMB/NFS 上的磁盘文件。
location transparency的例子:使用URLs来访问web server (可用多个IP指向同一个web地址,DNS load balance)

(8) Quality of service
The main nonfunctional properties of systems that affect the quality of the service experienced by clients and users are reliability , security and performance . Adaptability to meet changing system configurations and resource availability has been recognized as a further important aspect of service quality.

performance:
The performance aspect of quality of service was originally defined in terms of
responsiveness and computational throughput, but it has been redefined in terms of
ability tomeet timeliness guarantees,
比如视频播放的实时性

QoS applies to operating systems as well as networks. Each critical resource must be reserved by the applications that require QoS, and there must be resource managers that provide guarantees. Reservation requests that cannot be met are rejected. These issues will be addressed further in Chapter 20.
通过预留资源来保证资源的可用性



分布式系统的实例(www)
Web is an open system.
(1)  操作基于http通信协议和html标准(多种浏览器实现和web服务器实现)
(2)  所共享和发布的资源类型,比如media files (plug-ins来处理不同类型的文件)

The Web has moved beyond these simple data resources to encompass services, such as electronic purchasing of goods. It has evolved without changing its basic architecture. The Web is based on three main standard technological components:

a> HTML,
html language for display.
javascript
!!link is king!! no matter that is human or resource

b> URLs, scheme : scheme-specific-identifier
(ftp://172.24.12.11/delivery/xxx.rpm ,http://www.sina.com.cn,https://www.alibaba.com )

c> HTTP
Request-reply interactions: GET/POSE
Content types, MIME -> text/html,  p_w_picpath/GIF, application/zip
One resource per request (request multiple resources concurrently)
Simple access control (right for access)


基于以上接口实现如下应用类型:

动态页面: CGI to generate dynamic "html" contents but no "html file" from local file system.
Downloaded code: javascript & AJAX, activeX and applet
Web services : replace HTML to XML, ( REST design scheme)

web的问题
1. 资源的删除和移动会导致链接失效 (通过搜索引擎来查找信息,显示的结果对用户可能会导致困惑, 尝试解决的方法: semantic web
2. 面临scale的问题。第二章介绍浏览器缓存和代理服务器来增加responsiveness