Scaling API-first – The story of a global engineering organization
Distributed operating system
1. • A distributed system is a collection of independent
computers that appears to its users as a single
coherent system – Andrew Tanenbaum
• A distributed system is the one that prevents you from
working because of the failure of a machine that you had
never heard of – Veríssimo & Rodrigues
• Weak definition: A distributed system is a collection of
independent computers that are used jointly to
perform a single task or to provide a single service.
Distributed System
1
2. • Collection of Web Servers - more precisely,
servers implementing the HTTP protocol—
that jointly provide the distributed database
of hypertext and multimedia documents that
we know as the World-Wide Web
• DNS servers
• ATM networks etc.
Examples
2
3. 1. Cost. Better price/performance as long as commodity hardware
is used for the component computers.
2. Reliability. By having redundant components the impact of
hardware and software faults on users can be reduced.
3. Inherent distribution. Naturally and physically distributed
4. Transparency. The end users can be concealed or hidden from
the actual separation of the distributed system so that the user
feels that everything is transparent to everyone.
5. Scalability. Resources such as processing and storage capacity
can be increased significantly.
6. Dependability. The dependence of a system on another system
can be achieved to solve a particular task jointly.
7. Performance. By using the combined processing and storage
capacity of many nodes, performance levels can be reached that
are beyond the range of centralized machines.
8. Flexibility. Easily can be added or removed a node
Why Distributed System?
– Objectives of DS
3
4. • Transparency is the hiding of the separation of the
components of the distributed system from users and the
application programmer.
• Transparency means that any form of distributed
system should hide its distributed nature from its users,
appearing and functioning as a normal centralized system.
There are many types of transparency:
1. Access transparency – Regardless of how resource access and
representation has to be performed on each individual computing
entity, the users of a distributed system should always access resources
in a single, uniform way.
2. Location transparency – Users of a distributed system should not have to
be aware of where a resource is physically located.
3. Migration transparency – Users should not be aware of whether a
resource or computing entity possesses the ability to move to a different
physical or logical location.
Transparency
4
5. 4. Relocation transparency – Should a resource move while in use, this
should not be noticeable to the end user.
5. Replication transparency – If a resource is replicated among several
locations, it should appear to the user as a single resource.
6. Concurrent transparency – While multiple users may compete for and
share a single resource, this should not be apparent to any of them.
7. Failure transparency – Always try to hide any failure and recovery of
computing entities and resources.
8. Persistence transparency – Whether a resource lies in volatile or
permanent memory should make no difference to the user.
5
6. • Very difficult to achieve because of the need of
– new component (network),
– Security
– and software complexity.
• Also we assume that
– The network is reliable
– The network is secure
– Everything is homogenous
– The topology does not change
– Latency is zero
– Bandwidth is infinite
– Transport cost is zero
– There is one administrator
• But these are not the cases in distributed system. This can be
assumed in a non-distributed system. Hence there are challenges in
DS.
Why Challenges DS?
6
7. • Hence the different challenges that we repeatedly
face in DS are to achieve the following things.
– Transparency
– Scalability
– Dependability
– Performance
– Flexibility
Challenges in DS
7
8. • Complete transparency is not always desirable
due to the trade-offs with performance and
scalability as well as the problems that can be
caused while confusing local and remote
operations.
• Is not always possible because of natural
limitations on how fast communication take
place in WAN.
Transparency as a challenge
8
9. • In a wide area distributed system, it is expected a large growth.
• According to Neuman a system is said to be scalable if:
It can handle the addition of users and resources without suffering a
noticeable loss of performance or increase in administrative complexity.
• But, this always does not happen because of the problems in followings.
– Size – growth in no of users and resources makes a distributed system
overloaded - because it must process too many user requests
– Geography – increase in geography (distance between the nodes) generally
results in greater communication delays and the potential for communication
failure.
– Administration - As a distributed system grows, its various components (users,
resources, nodes, networks, etc.) will start to cross administrative domains
Scalibility as a challenge
9
10. • Although distributed systems provide the potential for higher availability
due to replication, the distributed nature of services means that more
components have to work properly for a single service to function.
• But for dependability, all the systems should work jointly (dependent with
each other) only when the data are fragmented and distributed.
• In this situation, the dependability requires:
– Consistency
– Security
– Concurrency control and
– Fault tolerance
Which are very difficult to achieve.
Dependability as a challenge
10
11. • Any system should strive for maximum performance
• But in the case of distributed systems this is a
particularly interesting challenge, since it directly
conflicts with some other desirable properties like
– Transparency
– Security
– Dependability
– Scalability
can easily be detrimental to performance.
Performance as a challenge
11
12. • A flexible distributed system can be configured to
provide exactly the services that a user or
programmer needs.
• But for that kind of flexibility a system should
have the number of key properties.
– Extensibility
– Openness
– Interoperability
Which is very difficult to achieve because it
contradicts with the other desirable properties as well
Flexibility as a Challenge
12
13. • A distributed system with all the challenges like
transparency, scalability, Dependability,
Performance, and Flexibility, being solved to its
full extent, is called a True Distributed System
(TDS).
• However, because of the different problems (like
a need of new component (network), security
and software complexity faced during the
development of a distributed system, a TDS is
virtually possible.
True Distributed System
13
15. • Network Operating System
– extension of centralized operating systems
– offer local services to remote clients
– each processor has own operating system
– user owns a machine, but can access others (e.g.
rlogin, telnet)
– no global naming of resources
– system has little fault tolerance
– e.g. UNIX, Windows NT, 2000
15
NOS: characteristics
16. 16
Distributed Operating System (1)
Tightly-coupled operating system for multi-processors and
homogeneous multi-computers. Strong transparency.
18. • A distributed operating system is the logical aggregation of operating
system software over a collection of independent, networked, communicating,
and physically separate computational nodes (a distributed system)
• Individual nodes each hold a specific software subset of the global aggregate
operating system.
• Each subset is a composite of two distinct service provisioners.
• Kernel - that directly controls that node’s hardware.
• System management components - that coordinate the node's individual
and collaborative activities. These components abstract microkernel
functions and support user applications.
• The collection of kernel and system management components work together
to make the distributed systems to appear as a single system image.
Distributed Operating System (3)
18
19. • Distributed Operating Systems
– Allows a multiprocessor or multicomputer network
resources to be integrated as a single system image
– Hide and manage hardware and software resources
– provides transparency support
– provide heterogeneity support
– control network in most effective way
– consists of low level commands + local operating systems
+ distributed features
– Inter-process communication (IPC)
19
DOS: characteristics (1)
20. • remote file and device access
• global addressing and naming
• trading and naming services
• synchronization and deadlock avoidance
• resource allocation and protection
• global resource sharing
• deadlock avoidance
• communication security
• no examples in general use but many research systems:
Amoeba, Chorus etc. see Google “distributed systems
research”
20
DOS: characteristics (2)
23. 1. Client Application (One Tier)
2. Client/Server (2-Tier)
3. 3-Tier
4. N-Tier
Different Types Of client server
Architectures
23
24. Client Application (One Tier)
• The program runs on a single
machine.
• In most cases there is no
separate application layers.
• Database applications are on the
same machine.
• At any given moment, only one
user can use the program
24
25. Client/Server (2-Tier)
• The program runs on both machines.
•The two-tier architecture is like client server application.
•The direct communication takes place between client and server.
There is no intermediate between client and server.
• Relationship between computers as broker / customer.
• This is Message-based infrastructure systems.
• This architecture increases the performance, flexibility of the system
25
26. Client/Server (2-Tier) - Example
Example: An architecture that is needed to save the employee details in
database. The two tiers of two-tier architecture is
•Client Application (client tier)
•Database (data tier)
•This is one to one communication between client and server
•In client application the client writes the program for saving the record in SQL Server
and thereby saving the data in the database.
26
27. • Advantages
– Easy to understand and maintain the architecture
• Disadvantages
– Database may be too huge to store in the single
system (server)
– Performance will be reduced when there are more
users.
Advantages & Disadvantages Of
Client/Server
27
28. • Comes To Solve Client/Server Problems
• There is an additional layer in between client
and server that helps clients to get accessed in
the database in an efficient way.
3-Tier Architecture
28
29. • Three tier architecture having three layers.
1. Client layer - Presentation Layer containing UI ( eg. written in asp.net)
2. Business layer – Business Logic Layer containing business logic ...UI will
call this layer instead of calling data layer directly for security reasons
3. Data layer - Data Access Layer so that all calls to database are abstracted
and no-one can fire any query directly into database
• Client layer: Here we design a client application (say a form using textbox,
label etc.)
• Business layer: It is the intermediate layer which has the functions for client
layer and it is used to make communication faster between client and data
layer. It provides the business processes logic and the data access.
• Data layer: it has the database. The data tier may the collection of large set of
data base servers interconnected with each other in a distributed network
through a DOS
3-Tier Architecture
29
33. • Advantages
– Easy to modify data/application logics with out
affecting other modules
– Fast communication
– Performance will be good in three tier architecture.
• Disadvantages
•Need of database expert e.g, DBA
•Concurrency control mechanism if data is not distributed
Advantages/Disadvantages of 3-Tier
Architecture
33
34. • N-Tier: An unlimited number of tiers.
• Each tier may have multiple computers.
• Adding some more layers like service layer we can make an N-tier
architecture.
1. Client layer - Presentation Layer containing UI ( eg.written in
asp.net)
2. Business layer – Business Logic Layer containing business logic
...UI will call this layer instead of calling data layer directly for
security reasons
3. Service Layer – problem specific logic or data access layer
4. Data layer - Data Access Layer so that all calls to database are
abstracted and no-one can fire any query directly into database
N-Tier Architecture
34
36. • In a database management system (DBMS),
a stored procedure is a set of Structured
Query Language (SQL) statements with an
assigned name that's storedin the database in
compiled form so that it can be shared by a
number of programs.
37. IPC - Introduction
• Process of communication between two
processes that reside in same or different
systems.
• E.g. communication of client and server
– Here a process of client communicates with
another process of server for a specific purpose.
38. IPC by Shared Memory
• The different processes share a common
memory space.
• Both the communicating processes operate on
same space and hence no need of explicit
message passing.
• This basically happens in a centralized system
than in distributed system.
39. IPC by Shared Memory
• However, a distributed system with a shared
memory (tightly coupled distributed systems)
may communicate with each other via shared
memory.
• Advantages
– No communication overhead
– Highly reliable
• Disadvantages (issues)
– Concurrency control
– Race condition and mutual exclusion
40. IPC by RPC(remote procedure call)
• RPC is an interaction between a client and a
server
• Client invokes procedure on sever
• Server executes the procedure and pass the
result back to client
• Calling process is suspended (blocked) and
proceeds only after getting the result from
server
42. 1. The client procedure calls the client stub in the normal way.
2. The client stub builds a message including parameters, name or
number of procedure to be called etc and calls the local operating
system. The packaging of arguments into a network message is called
marshaling.
3. The client's as sends the message to the remote OS via a system call to
the local kernel. To transfer the message some protocol (either
connectionless or connection-oriented) are used.
4. The remote OS gives the message to the server stub.
5. The server stub unpacks the parameters and calls the server.
6. The server does the work and returns the result to the stub.
7. The server stub packs it in a message and calls its local OS.
8. The server's OS sends the message to the client's OS.
9. The client's OS gives the message to the client stub.
10. The stub unpacks the result and returns to the waiting client procedure.
Modern RPC
42
43. Clock Synchronization
• In a distributed system the internal clocks of several
computers may differ with each other .
• Even when initially set accurately, real clocks will differ after
some time due to
– clock drift, caused by clocks counting time at
slightly different rates.
• In serial communication, some people use the term "clock
synchronization" merely to discuss frequency synchronization
and phase synchronization. Such "clock synchronization" is
used in synchronization in telecommunications and automatic
baud rate detection.
43Distributed Operating System(DOS)
44. Clock Synchronization
It is the process of setting all the
cooperating systems of distributed
network to the same logical or
physical clock.
44Distributed Operating System(DOS)
45. Synchronization in Distributed Systems
45
Logical and Physical Clocks
• Clock synchronization is dramatic and it is fitting in process
execution. But, Is it possible to synchronize all the clocks in the
distributed environment into single clock.
• There are different concepts and implementations regarding the
clock synchronization by using logical and physical clocks.
•logical clocks - to provide consistent event ordering
•physical clocks - clocks whose values must not deviate from
the real time by more than a certain amount.
45Distributed Operating System(DOS)
46. Logical and Physical Clocks
• Logical Clocks.
– For many applications:
• it is sufficient that all machines agree on the same time.
• it is not essential that this time also agree with the real time
– E.g. make example - it is adequate that all machines agree that it is 10:00
even if it is really 10:02.
– Meaning: it is the internal consistency of the clocks that matters, not
whether they are particularly close to the real time.
– For these algorithms it is conventional to speak of the clocks as logical
clocks.
• Physical Clocks
• when the additional constraint is present that the clocks
– must not only be the same,
– but also must not deviate from the real time by more than a certain
amount,
• Then the clocks are called physical clocks.
46Distributed Operating System(DOS)
47. The Berkeley algorithm(physical clock
synchronization)
• Berkeley algorithm is also a physical clock synchronization algorithm based on
centralized system.
• This algorithm is more suitable for systems where a radio clock is not present.
• This system has no way of making sure of the actual time other than by
maintaining a global average time as the global time.
• A time server will periodically fetch the time from all the time clients, average the
results, and then report back to the clients the adjustment that needs be made to
their local clocks to achieve the average.
• This algorithm highlights the fact that internal clocks may vary not only in the time
they contain but also in the clock rate.
• Often, any client whose clock differs by a value outside of a given tolerance is
disregarded when averaging the results. This prevents the overall system time
from being drastically skewed due to one erroneous clock.
47Distributed Operating System(DOS)
48. The Berkeley algorithm
• Is an averaging algorithm
– The time daemon asks all the other machines for their clock values.
– The machines answer.
– The Time daemon tells everyone how to adjust their clock.
48Distributed Operating System(DOS)
50. The Berkeley algorithm
• The server process in the Berkeley algorithm, called the master,
periodically polls other slave processes to synchronize the physical
clock. Generally speaking, the algorithm is:
1. A master is chosen via an election process such as Chang and
Roberts algorithm.
2. The master polls the slaves who reply with their time.
3. The master observes the round-trip time (RTT) of the messages and
estimates the time of each slave and its own.
4. The master then averages the clock times, ignoring any values it
receives far outside the values of the others.
5. Instead of sending the updated current time back to the other
process, the master then sends out the amount (positive or
negative) that each slave must adjust its clock. This avoids further
uncertainty due to RTT at the slave processes.
50Distributed Operating System(DOS)
52. Lamport’s Algorithm
• Lamport invented a simple mechanism by which the happened-
before ordering can be captured numerically. A Lamport logical
clock is a incrementing software counter maintained in each
process.
• Algorithm follows:
1. A process increments its counter before each event in that process;
2. When a process sends a message, it includes its counter value with
the message;
3. On receiving a message, the receiver process sets its counter to be
greater than the maximum of its own value and the received value
before it considers the message received.
• Conceptually, this logical clock can be thought of as a clock that only
has meaning in relation to messages moving between processes.
When a process receives a message, it resynchronizes its logical
clock with that sender.
52Distributed Operating System(DOS)
53. Example: Lamport’s Algorithm
• Three processes, each with its own clock.
The clocks run at different rates.
• Lamport’s Algorithm corrects the clock.
Note: ts(A) < ts(B) does not imply A happened before B.
53
(impossible)
Distributed Operating System(DOS)
54. Name Service
• A name service stores a collection of one
or more naming contexts – sets of
bindings between textual names and
attributes for objects.
• Provides a general naming scheme for
entities (such as users and services) that
are beyond the scope of a single service.
• Major operation: resolve a name - to look
up attributes from a given name
• Other operations required: creating new
binding, deleting bindings, listing bound
names and adding and deleting contexts.
55. General Name Service Requirements
Handle arbitrary number of names
and to serve arbitrary number of
administrative organizations.
A long lifetime
High availability
Fault isolation
Tolerance of mistrust
57. Name Spaces
• A name space is a collection of all valid names
recognized by a particular service
• Allow simple but meaningful names to be used
• Potentially infinite number of names
• Structured
– to allow similar subnames without clashes
– to group related names
• Allow re-structuring of name trees
– for some types of change, old programs should
continue to work
• Management of trust
59. Name Resolution
• Resolution is an iterative process whereby a
name is repeatedly presented to the naming
contexts.
• The name is first presented to some initial
naming context; resolution iterates as long as
further context and derived names are output.
• Example1: /etc/passwd in which ‘etc’ is
presented to context / and ‘passwd’ is
presented to context /etc.
• Example 2: www.dcs.qmw.ac.uk in which the
alias is resolved to another domain name
such as copper.dcs.qmw.ac.uk which is
further resolved to produce IP address.
60. Name Servers and Navigation
• Any name service stores a very large database.
• Data is partitioned into servers according to its
domain.
• Partitioning of the data implies that the local
name server cannot answer all the enquiries
without the help of other name servers.
• Process of locating naming data from among
more than one name server in order to resolve a
name is called navigation. Ex: Iterative
Navigation model(DNS)
62. Distributed File Systems
• File system provides an abstract view of secondary storage and
is responsible for global naming, file access, and overall file
organization. These functions are handled by the name service,
the file service, and the directory service.
• File service is the specification of what the file system offers to
its clients.
• File server is a process that runs on some machine and helps
implement the file service.
63. Naming of Distributed Files
• Naming – mapping between logical and physical objects.
• A transparent DFS hides the location where in the network
the file is stored.
• Location transparency – file name does not reveal the
file’s physical storage location.
– File name denotes a specific, hidden, set of physical disk blocks.
– Could expose correspondence between component units and
machines.
• Location independence – file name does not need to be
changed when the file’s physical storage location
changes.
– Better file abstraction.
– Promotes sharing the storage space itself.
– Separates the naming hierarchy from the storage-devices
hierarchy.
64. File service implementation
• File service implementations may be based on
remote access or remote copy and may be
stateful or stateless.
66. Remote copy model
Client Server
Old file
New file
1.File moved to
client
2. Accesses are done
on client
3. When client is
done, file is returned
to server
67. Remote copy model(2)
• A Stateful server maintains information about all
clients that are utilizing the server to access a file.
• A stateless server maintains no client
information.
• Each and every request from a client must include
very specific request information, such as file
name, operation, and exact position in the file.
• The client maintains the state information.
68. System (Processor) Models – A
Need for Distributed Computing
Workstation Model
Processor Pool Model
Hybrid Model
69. 69
Workstation Model
• Process migration
– Users first log on his/her personal workstation.
– If there are idle remote workstations, a heavy job may migrate to
one of them.
• Problems:
– How to find an idle workstation
– How to migrate a job
– What if a user log on the remote machine
100Mbps
LAN
Workstation
Workstation Workstation
WorkstationWorkstation
70.
71. Processor-Pool Model
• Clients:
– They log in one of terminals (diskless
workstations or X terminals)
– All services are dispatched to
servers.
• Servers:
– Necessary number of processors are
allocated to each user from the pool.
• Better utilization but less interactivity
Server 1
100Mbps
LAN
Server N