1. Fall 98 - EECS 591 Handout #1
Lectures: weeks 1 - 3
Introduction, characteristics of distributed systems, design issues, h/s concepts, distributed programming
models
Reading list: Tanenbaum Chapter 1, pp 1-32.
Communication issues and network technologies
Reading list: Tanenbaum Chapter 2.1, pp 34-42, network protocol layers.
Handout: Chapter 4 - “The Road to Bandwidth Heaven” from C/S Survival Guide.
Slide 1
Handout: Chapter 13 - “Internetworking: Concepts, Architecture, and Protocols” from Computer
Networks and Internets by Doug Comer, 1996.
Handout: Chapter 14 - “IP: Internet Protocol Address” from Computer Networks and Internets by Doug
Comer, 1996.
Handout: Chapter 28 - “Internet Protocols” from Internetworking Technologies Handbook by Ford, et. al.,
CISCO Press, 1997.
Models of distributed computation: synchronous vs asynchronous systems, failure models
Reading list: Mullender Chapter 5.1-5.2, pp 97-102,
Distributed computations, global system states, consistent cuts and runs
Reading list: Mullender Chapter 4.
Real-Time Computing Laboratory
RTCL
The University of Michigan
Distributed Systems
Three Technology Advances:
Development of powerful microprocessors
Development of high-speed networks
Slide 2 Development of denser and cheaper memory/storage
Easy: put together large # of powerful processors connected by a high-speed network.
Hard: SOFTWARE! SOFTWARE! SOFTWARE!
Real-Time Computing Laboratory
RTCL
The University of Michigan
1
2. Distributed Systems
What is a distributed system?
“You know you have one when the crash of a computer you’ve never heard of stops you from
getting any work done.” Leslie Lamport
Slide 3
A collection of (perhaps) heterogeneous nodes connected by one or more interconnection
networks which provides access to system-wide shared resources and services.
A collection of independent computers that appear to the users of the system as a single
computer. (Tananbaum’s text)
Real-Time Computing Laboratory
RTCL
The University of Michigan
2
3. Characteristics of a distributed systems:
{ Multiple Computers:
More than one physical computer, each consisting of CPUs, local memory, and possibly
stable storage, and I/O paths to connect it with the environment.
{ Interconnections:
I/O paths for communicating with other nodes via a network.
Slide 4
{ Shared State:
If a subset of nodes cooperate to provide a service, a shared state is maintained by these
nodes. The shared state is distributed or replicated among the participants.
Real-Time Computing Laboratory
RTCL
The University of Michigan
Distributed vs. PCs/Centralized Systems
Why distribute?
{ Resource sharing
{ Device sharing
Slide 5 { Flexibility to spread load
{ Incremental growth
{ Cost/performance
{ Reliability/Availability
{ Inherent distribution
{ Security?
Real-Time Computing Laboratory
RTCL
The University of Michigan
3
4. Why NOT distribute?
{ Software
{ Network
{ Security
{ System management
Numerous sources of complexity including:
{ Transparent/uniform access to data or resources
Slide 6 { Independent failures of processors (or processes)
{ Dynamic membership of a system
{ Unreliable/unsecure communication
{ Overhead in message communication
{ Distribution or replication of data (or meta-data)
{ Lack of clean common interfaces
Real-Time Computing Laboratory
RTCL
The University of Michigan
Key Design Issues
Transparency
Flexibility
Dependability
Performance
Slide 7 Scalability
Real-Time Computing Laboratory
RTCL
The University of Michigan
4
5. Transparency
How to achieve “single-system image”? How to hide distribution from users or programs?
Location transparency: users cannot tell where the h/s resources are located.
Migration transparency: resources can move without changing their names.
Replication transparency: number of copies or their locations are hidden from users.
Slide 8 Concurrency transparency: multiple users can share resources without being aware of others.
Parallelism transparency: a task can be parallelized and run on multiple machines without the
user being aware of it.
Is transparency always a good idea?
trade off transparency for performance
Real-Time Computing Laboratory
RTCL
The University of Michigan
5
6. Flexibility
Key question is how to structure a distributed system?
Monolithic vs. Microkernel
Monolithic kernel: Each machine runs a traditional o.s. kernel and provides most services via
a system call interface to the kernel.
Slide 9 Microkernel: provide as little as possible usually-
o IPC
o MM mechanism (not policy)
o low-level process management and scheduling
o low-level I/O
Most other services provided by user-level servers. MODULARITY!
Figure 1-14, Tanenbaum, p. 25
Potential problems with microkernels?
Real-Time Computing Laboratory
RTCL
The University of Michigan
Dependability
Reliability: measured by the probability R(t) that the system is up (and providing correct
service) during the time interval [0,t] assuming that it was operational at time t = 0.
Availability: measured by the probability A(t) that the system is operational at the instant of
time t. As t ! 1, availability expresses the fraction of time that the system is usable.
Slide 10 Integrity: replicated copies of data must be kept consistent.
Security: protection from unauthorized usage/access. Why more difficult in distributed
systems?
Real-Time Computing Laboratory
RTCL
The University of Michigan
6
7. Performance
Various performance metrics:
o response time
o throughput
o system utilization
Slide 11
o network capacity utilization
Key issue in parallelizing computations in a distributed system?
overhead of message communication
Real-Time Computing Laboratory
RTCL
The University of Michigan
7
8. Trade off:
More tasks ! more parallelism ! better performance
More tasks ! more communication ! worse performance
Grain size affects # of messages:
fine-grained parallelism vs. coarse-grained parallelism
Slide 12
small computations large computations
high interaction rate low interaction rate
Real-Time Computing Laboratory
RTCL
The University of Michigan
Scalability
The challenge is to build distributed systems that scale with the increase in the number of
CPUs, users, and processes, larger databases, etc.
A few simple principles: (Tanenbaum’s text)
o Avoid centralized components
o Avoid centralized tables
Slide 13
o Avoid centralized algorithms
A few lessons from AFS:
o “Clients have cycles to burn.”
o “Cache whenever possible.”
o “Exploit usage properties.”
o “Minimize system-wide knowledge/change.”
o “Batch if possible.”
o Multicast often works!
Real-Time Computing Laboratory
RTCL
The University of Michigan
8
9. Distributed Programming Paradigms
Client/server model with distributed filesystems
Remote procedure calls
Group communication and multicasts
Slide 14
Distributed transactions
Distributed shared memory
Distributed objects
Java and the WWW
Real-Time Computing Laboratory
RTCL
The University of Michigan
9
10. Taxonomy for Parallel & Distributed Systems
Flynn’s (1972) Classification of Multiple CPU Computer Systems:
(# of instruction streams and # of data streams)
SISD
Slide 15 SIMD
MISD
MIMD
MIMD machines can be classified further:
Multiprocessors: Single virtual address space is shared by all CPUs / more tightly coupled.
Multicomputers: Each machine has its own private memory / more loosely coupled.
Real-Time Computing Laboratory
RTCL
The University of Michigan
Bus-Based Multiprocessors
All CPUs are connected to a memory module via a common bus.
Coherency must be ensured as concurrent r/w occur
Slide 16
Problem: bus may become overloaded with a very few CPUs
Common solution: Use high-speed cache between CPU and bus
high hit rate ! more CPUs allowed ! better perfomance
Cache coherency problem? snoopy write-through cache
32 to 64 CPUs on a bus is possible.
Real-Time Computing Laboratory
RTCL
The University of Michigan
10
11. Switched Multiprocessors
Put a single switch between CPUs and memory modules.
Various switch topologies (Figure 1-6, Tananbaum p.12).
Crossbar switch: n CPU, n memory, n 2 crosspoints.
Omega switch: log n stages with n=2 in each stages.
2
Delay through multiple stages can be problem: NUMA.
Slide 17
Each CPU has its own local memory and can access other memory modules slower.
Tanenbaum’s comments:
Bus-based multiprocessors are limited to 64.
Large crossbar switches are expensive.
Large omega switches are potentially expensive and long delay.
NUMA machines require complex software.
SO BUILDING A LARGE TIGHTLY-COUPLED MULTIPROCESSOR IS DIFFICULT AND
EXPENSIVE. BUT LARGE LOOSELY-COUPLED MULTICOMPUTERS ARE EASY TO BUILD.
Real-Time Computing Laboratory
RTCL
The University of Michigan
Multicomputers
Bus-based Multicomputers:
Each CPU has its own private memory.
Connection via a slower bus.
Slide 18 10-100 Mbps LANs vs. 300 Mbps and more backplane bus
Switched Multicomputers:
An interconnection network connects CPUs, each of which has its own local memory.
Various topologies including grid, hypercube. (Figure 1-8, Tanenbaum)
Real-Time Computing Laboratory
RTCL
The University of Michigan
11