A distributed system is a collection of autonomous computing elements that appears to its users as a single coherent system.
- System: a collection of computing elements -- Nodes.
- Node: a hardware device or a software process
-
Autonomous computing elements --
- Nodes can act independently
- Nodes are programmed to achieve common goals
- Nodes lack a common reference of time (i.e. global clock)
- Hard to manage group membership
- Hard to do admission control
- Nodes need to be organized
- An overlay network structure:
- a node is a software process
- a node has a number of neighbors
- nodes can communicate with each other through TCP/IP or UDP channels
- Structured overlay network: neighbors of a node are well-defined
- Unstructured overlay network: neighbors of a node are randomly selected
- An overlay network structure:
- Single coherent system --
-
Coherent: the system behaves according to the expectations of its users
-
End user would not be able to tell
-
exactly on which computer a process is currently executing
-
exactly on which computer data is stored
-
- Developers need to deal with partial failures: some applications may continue to execute successfully while others come to a grinding halt.
-
Middleware
A middleware is a separate layer of software that is logically placed on top of the respective operating systems of the computers that are part of the system.
- A middleware is a container of commonly used components and functions
-
Typical middleware services
-
Commuication RPC: allows an application to invoke a function that is implemented and executed on a remote computer
-
Atomic transaction: When involking services on multiple computers, middleware makes sure that either every service is invoked, or none at all.
-
Service composition: taking existing programs and gluing them together
-
Reliability
-
Distribution transparency
Making the distribution of processes and resources transparent, that is, invisible, to end users and applications.
Access transparency
Hiding differences in data representation and the way that objects can be accessed
- Hide differences in machine architectures, i.e. hide how data is to be represented by different machines and operating systems
- Hide differences in naming conventions
- Hide differences in file operations
- Hide differences in how low-level communication with other processes is to take place
Location transparency
Hiding where an object is physically located in the system
- Can be achieved by giving a logical-only name -- URL
- URL gives no clue about the actual location of the Web server
Relocation transparency
Sometimes an entire site may have been moved from one data center to another, yet users should not notice
Migration transparency
A system supports the mobility of processes and resources initiated by users without affecting ongoing communication and operations
- e.g. mobile phone calls, teleconferencing
Replication transparency
Hiding the fact that there are several copies of a resource exist in the system
- Replication (copies) improves availability and performance
- But should not be noticed by the end user
- Can be achieved by having the same name
Concurrency transparency
Concurrent access to a shared resource leaves that resource in a consistent state
- Locking, transactions ...
Failure transparency
Hiding a partial failure from user, and automatically recover from it latter