1. Abstract
Distributed computing allows groups to accomplish work that was not feasible before with
supercomputers, due to cost or time constraints. Although the primary functions of distributed
computing systems is to produce needed processing power to complete complex
computations, distributed computing also reaches outside of the processing arena to other
areas such as network usage. When used properly, both areas complement each other and can
produce needed results. When used maliciously, either processing or networking distributed
attacks can produce a brute force that even the best firewalls or encryption are powerless to
prevent. Distributed computing should be tamed and closely guarded against such uses
through efforts to filter out invalid network packets for distributed attacks, and carefully
monitoring computer software to ensure that a distributed computing processing, brute force
attacks cannot occur.
1
2. Table of Contents
TABLE OF CONTENTS.................................................................................................2
CHAPTER 1 INTRODUCTION.....................................................................................3
1.1 A BRIEF HISTORY OF DISTRIBUTED COMPUTING...............................................................3
1.2 DISTRIBUTED COMPUTING: AN OVERVIEW..................................................................6
CHAPTER 2 ABOUT DISTRIBUTED COMPUTING.........................................................8
2.1MODELS....................................................................................................................8
2.2COMPLEXITY MEASURES.............................................................................................12
2.3 WORKING OF DISTRIBUTED COMPUTING......................................................................14
2.3.1Architecture....................................................................................................14
2.4ALGORITHMS...........................................................................................................18
CHAPTER 3 CHARACTERISTICS OF DISTRIBUTED COMPUTING..................................23
3.1FALLACIES OF DISTRIBUTED COMPUTING:.......................................................................23
3.2 PROBLEMS RELATED TO DISTRIBUTED COMPUTING........................................................28
3.3 DISTRIBUTED COMPUTING APPLICATION CHARACTERISTICS..............................................29
CHAPTER 4 APPLICATIONS ......................................................................................31
4.1TYPES OF DISTRIBUTED COMPUTING APPLICATIONS.........................................................31
4.2 SECURITY AND STANDARD CHALLENGES........................................................................32
CONCLUSION..........................................................................................................34
REFERENCES............................................................................................................35
2
3. Chapter 1 INTRODUCTION
Distributed computing is a field of computer science that studies
distributed systems. A distributed system consists of multiple
autonomous computers that communicate through a computer network.
The computers interact with each other in order to achieve a common
goal.
1.1 A Brief History of Distributed computing
During the earliest years of computing, any tasks that required large
computations and massive processing were generally left up the
government or a handful of large companies. These entities could afford
to buy massive supercomputers and the infrastructure needed to support
them. With the price of personal computing declining rapidly in price,
and supercomputers still very expensive, an alternative was needed. In
1993, Donald Becker and Thomas Sterling introduced Beowulf
clustering. Although not the first example of clustering, this was the first
time that an effort was made to enable anyone to take off the shelf
computers and build a cluster of computers that could rival top
supercomputers. The concept behind clustering, in its simplest form, is
that many smaller computers can be combined in a way to make a
computing structure that could provide all of the processing power
needed, for much less money. All of the nodes of a cluster are connected
to an isolated internal network and same switch as the serving computer
(SC). The serving computer houses the results and distributes new work
units to all of the attached nodes. Each node is a single-use computer,
allowed to only process the problem that it is given and return the results
when finished. Many of the problems that hindered the first clustering
efforts still cause problems today. One of the more expensive elements is
the dedicated internal network, or interconnects, which link all of the
nodes together to the server. Since these nodes are simple banks of
3
4. processors, security is almost entirely nonexistent and therefore requires
a great care in isolating the interconnected network from any outside
networking. An additional element contributing to problems is that of
suitable software for the clusters to run. Even though all of the nodes
work together to process chunks of a complete dataset, the software must
still be written to take advantage of the multiples of processors and the
individual resources, such as memory, that the nodes contain. To obtain a
stable and suitable software layer, it may take months or years to perfect
the software to properly process the needed results and use all of the
available power. In the end, taking a major step forward for computing
power, clustering certainly has its problems and insecurities as a
relatively new technology. With distributed computing, it manages to
encompass a much wider scope than clustering by allowing nodes to exist
anywhere in the world and also be multipurpose/multi-function
machines.
Distributed computing has a similar concept as clustering: take a large
problem, break it into smaller units, and allow many nodes to work on
the problem in parallel. Where distributed computing strays from this
concept is by also allowing the nodes to be multifunction and
multipurpose computers that can exist anywhere in the world while being
attached to the Internet. Distributed computing can actually take on many
different orientations of the nodes, all depending on how the client
computers are connected to the Internet. In addition to this element of
flexibility, there is also a level of redundancy that does not exist in
supercomputing or clustering. With clustering and supercomputing, the
data is generally processed only once, due to the large amounts of time
that the entire project may take. In distributed computing, it is often the
case that work units may be distributed multiple times to multiple nodes.
This method serves two functions: to drastically decrease the possibilities
of processing errors, and to account for processing that is done on slower
CPUs or takes too long to return results. What makes this entire system
possible is the application of a small piece of software called a client.
This client handles the data retrieval and submission stages as well as the
4
5. code necessary to instruct the CPU how to process the work unit. Clients
vary in size, but most are less than 1-2 megabytes in size. The actual data
work units also vary in size, but most are between 250 and 400 kilobytes,
so that the hosting/node CPU can handle the process, and users on slower
internet connections can easily send and receive the data. Between the
small sizes of work units and clients, it seems unreasonable to see a
disadvantage to using distributed computing other than the collection and
analysis of data.
5
6. 1.2 Distributed Computing: An OVERVIEW
Many of you are quite familiar with the power of computers. They can
record music, balance check books, play games with you, process and
print your words, and open a whole encyclopaedia before your eyes. It's
not a sweeping statement to say that computers have made our lives
different. It's hard to envision a life before the age of computers. If a
single computer can change your life, why not connect several of them?
Over the past two decades, networks of computers have even further
changed the way businesses operate and the way the government
functions. From digital library card catalogs to the Think Quest Internet
site you're at now, networks have been another great technological
advancement. Distributed computing is the next steps in computer
progress, where computers are not only networked, but also smartly
distribute their workload across each computer so that they stay busy and
don't squander the electrical energy they feed on. This setup rivals even
the fastest commercial supercomputers built by companies like IBM or
Cray. When you combine the concept of distributed computing with the
tens of millions of computers connected to the Internet, you've got the
fastest computer on Earth.
6
8. Chapter 2 ABOUT DISTRIBUTED
COMPUTING
Many tasks that we would like to automate by using a computer are of
question–answer type i.e. We would like to ask a question and the
computer should produce an answer. In theoretical computer science,
such tasks are called computational problems. Formally, a computational
problem consists of instances together with a solution for each instance.
Instances are questions that we can ask, and solutions are desired answers
to these questions.
Theoretical computer science seeks to understand which computational
problems can be solved by using a computer (computability theory) and
how efficiently (computational complexity theory). Traditionally, it is
said that a problem can be solved by using a computer if we can design
an algorithm that produces a correct solution for any given instance. Such
an algorithm can be implemented as a computer program that runs on a
general-purpose computer: the program reads a problem instance from
input, performs some computation, and produces the solution as output.
The field of concurrent and distributed computing studies similar
questions in the case of either multiple computers, or a computer that
executes a network of interacting processes.
2.1 Models
The discussion below focuses on the case of multiple computers,
although many of the issues are the same for concurrent processes
running on a single computer. Three viewpoints are commonly used:
Parallel algorithms in shared-memory model:
• All computers have access to a shared memory. The algorithm
designer chooses the program executed by each computer.
8
9. • One theoretical model is the parallel random access machines
(PRAM) that are used. However, the classical PRAM model
assumes synchronous access to the shared memory.
• A model that is closer to the behaviour of real-world
multiprocessor machines and takes into account the use of
machine instructions, such as Compare-and-swap (CAS), is that
of asynchronous shared memory. There is a wide body of work
on this model, a summary of which can be found in the
literature.
Parallel algorithms in message-passing model:
• The algorithm designer chooses the structure of the network, as
well as the program executed by each computer.
• Models such as Boolean circuits and sorting networks are used.
A Boolean circuit can be seen as a computer network: each gate
is a computer that runs an extremely simple computer program.
Similarly, a sorting network can be seen as a computer network:
each comparator is a computer.
Distributed algorithms in message-passing model:
• The algorithm designer only chooses the computer program. All
computers run the same program. The system must work
correctly regardless of the structure of the network.
• A commonly used model is a graph with one finite-state machine
per node.
In the case of distributed algorithms, computational problems are
typically related to graphs. Often the graph that describes the structure of
the computer network is the problem instance. This is illustrated in the
following example.
An example:
9
10. Consider the computational problem of finding a colouring of a given
graph G. Different fields might take the following approaches:
Centralized algorithms
• The graph G is encoded as a string, and the string is given as
input to a computer. The computer program finds a colouring of
the graph, encodes the colouring as a string, and outputs the
result.
Parallel algorithms
• Again, the graph G is encoded as a string. However, multiple
computers can access the same string in parallel. Each computer
might focus on one part of the graph and produce a colouring for
that part.
• The main focus is on high-performance computation that
exploits the processing power of multiple computers in parallel.
Distributed algorithms
• The graph G is the structure of the computer network. There is
one computer for each node of G and one communication link
for each edge of G. Initially, each computer only knows about its
immediate neighbours in the graph G; the computers must
exchange messages with each other to discover more about the
structure of G. Each computer must produce its own colour as
output.
• The main focus is on coordinating the operation of an arbitrary
distributed system.
While the field of parallel algorithms has a different focus than the field
of distributed algorithms, there is a lot of interaction between the two
fields. For example, the Cole–Vishkin algorithm for graph colouring was
originally presented as a parallel algorithm, but the same technique can
10
11. also be used directly as a distributed algorithm. Moreover, a parallel
algorithm can be implemented either in a parallel system (using shared
memory) or in a distributed system (using message passing). The
traditional boundary between parallel and distributed algorithms (choose
a suitable network vs. run in any given network) does not lie in the same
place as the boundary between parallel and distributed systems (shared
memory vs. message passing).
11
12. 2.2 Complexity Measures
In parallel algorithms, yet another resource in addition to time and space
is the number of computers. Indeed, often there is a trade-off between the
running time and the number of computers: the problem can be solved
faster if there are more computers running in parallel. If a decision
problem can be solved in polylogarithmic time by using a polynomial
number of processors, then the problem is said to be in the class NC. The
class NC can be defined equally well by using the PRAM formalism or
Boolean circuits – PRAM machines can simulate Boolean circuits
efficiently and vice versa.
In the analysis of distributed algorithms, more attention is usually paid on
communication operations than computational steps. Perhaps the
simplest model of distributed computing is a synchronous system where
all nodes operate in a lockstep fashion. During each communication
round, all nodes in parallel (1) receive the latest messages from their
neighbours, (2) perform arbitrary local computation, and (3) send new
messages to their neighbours. In such systems, a central complexity
measure is the number of synchronous communication rounds required to
complete the task.
This complexity measure is closely related to the diameter of the
network. Let D be the diameter of the network. On the one hand, any
computable problem can be solved trivially in a synchronous distributed
system in approximately 2D communication rounds: simply gather all
information in one location (D rounds), solve the problem, and inform
each node about the solution (D rounds).
On the other hand, if the running time of the algorithm is much smaller
than D communication rounds, then the nodes in the network must
produce their output without having the possibility to obtain information
about distant parts of the network. In other words, the nodes must make
12
13. globally consistent decisions based on information that is available in
their local neighbourhood. Many distributed algorithms are known with
the running time much smaller than D rounds, and understanding which
problems can be solved by such algorithms is one of the central research
questions of the field. Other commonly used measures are the total
number of bits transmitted in the network.
13
14. 2.3 Working of Distributed Computing
In most cases today, a distributed computing architecture consists of very
lightweight software agents installed on a number of client systems, and
one or more dedicated distributed computing management servers. There
may also be requesting clients with software that allows them to submit
jobs along with lists of their required resources.
An agent running on a processing client detects when the system is idle,
notifies the management server that the system is available for
processing, and usually requests an application package. The client then
receives an application package from the server and runs the software
when it has spare CPU cycles, and sends the results back to the server.
The application may run as a screen saver, or simply in the background,
without impacting normal use of the computer. If the user of the client
system needs to run his own applications at any time, control is
immediately returned, and processing of the distributed application
package ends. This must be essentially instantaneous, as any delay in
returning control will probably be unacceptable to the user.
2.3.1 Architecture
Various hardware and software architectures are used for distributed
computing. At a lower level, it is necessary to interconnect multiple
CPUs with some sort of network, regardless of whether that network is
printed onto a circuit board or made up of loosely-coupled devices and
cables. At a higher level, it is necessary to interconnect processes running
on those CPUs with some sort of communication system.
Distributed programming typically falls into one of several basic
architectures or categories: client–server, 3-tier architecture, n-tier
architecture, distributed objects, loose coupling, or tight coupling.
14
15. • Client–server: Smart client code contacts the server for data
then formats and displays it to the user. Input at the client is
committed back to the server when it represents a permanent
change.
• 3-tier architecture: Three tier systems move the client
intelligence to a middle tier so that stateless clients can be used.
This simplifies application deployment. Most web applications
are 3-Tier.
• N-tier architecture: n-tier refers typically to web applications
which further forward their requests to other enterprise services.
This type of application is the one most responsible for the
success of application servers.
• Tightly coupled (clustered): refers typically to a cluster of
machines that closely work together, running a shared process
in parallel. The task is subdivided in parts that are made
individually by each one and then put back together to make the
final result.
• Peer-to-peer: an architecture where there is no special machine
or machines that provide a service or manage the network
resources. Instead all responsibilities are uniformly divided
among all machines, known as peers. Peers can serve both as
clients and servers.
• Space based: refers to an infrastructure that creates the illusion
(virtualization) of one single address-space. Data are
transparently replicated according to application needs.
Decoupling in time, space and reference is achieved.
Another basic aspect of distributed computing architecture is the
method of communicating and coordinating work among
concurrent processes. Through various message passing protocols,
processes may communicate directly with one another, typically in
15
16. a master/slave relationship. Alternatively, a "database-centric"
architecture can enable distributed computing to be done without
any form of direct inter-process communication, by utilizing a
shared database.
Client-Server architecture:
Advantages:
• In most cases, client–server architecture enables the roles
and responsibilities of a computing system to be distributed
among several independent computers that are known to
each other only through a network. This creates an
additional advantage to this architecture: greater ease of
maintenance. For example, it is possible to replace, repair,
upgrade, or even relocate a server while its clients remain
both unaware and unaffected by that change.
• All data is stored on the servers, which generally have far
greater security controls than most clients. Servers can
better control access and resources, to guarantee that only
those clients with the appropriate permissions may access
and change data.
• Since data storage is centralized, updates to that data are far
easier to administer in comparison to a P2P paradigm. In the
latter, data updates may need to be distributed and applied to
each peer in the network, which is both time-consuming and
error-prone, as there can be thousands or even millions of
peers.
• Many mature client–server technologies are already
available which were designed to ensure security,
friendliness of the user interface, and ease of use.
16
17. • It functions with multiple different clients of different
capabilities.
Disadvantages:
• As the number of simultaneous client requests to a given server
increases, the server can become overloaded. Contrast that to a
P2P network, where its aggregated bandwidth actually increases
as nodes are added, since the P2P network's overall bandwidth
can be roughly computed as the sum of the bandwidths of every
node in that network.
• The client–server paradigm lacks the robustness of a good P2P
network. Under client server, should a critical server fail, clients’
requests cannot be fulfilled. In P2P networks, resources are
usually distributed among many nodes. Even if one or more
nodes depart and abandon a downloading file, for example, the
remaining nodes should still have the data needed to complete the
download.
Fig 2: Client
Server Architecture
17
18. 2.4 Algorithms
Various algorithms are used for distributed systems. Some of them are as
follows:
1. Bully Algorithm
2. Byzantine Fault Tolerance
3. Algorithms based on Clock synchronisation
4. Lamport Ordering
5. Algorithms based on Mutual exclusion
6. Snapshot algorithm
7. Algorithms based on Detection of process termination
8. Vector clocks
In this paper, we are going to concentrate more on BULLY ELECTION
ALGORITHM.
The bully algorithm is a method in distributed computing for
dynamically selecting a coordinator by process ID number.
The Bully Algorithm was devised by Garcia-Molina in 1982. When a
process notices that the coordinator is no longer responding to requests, it
initiates an election. Process P, holds an election as follows:
1) P sends an ELECTION message to all processes with higher numbers.
2) If no one responds, P wins the election and becomes coordinator.
3) If one of the higher-ups answers, it takes over. P's job is done.
At any moment, a process can get an ELECTION message from one of
its lower-numbered colleagues. When such a message arrives, the
receiver sends an OK message back to the sender to indicate that it is
alive and will take over. The receiver then holds an election, unless it is
already holding one. Eventually, all processes give up but one and that
one is the new coordinator. It announces its victory by sending all
18
19. processes a message telling them that starting immediately it is the new
coordinator. If a process that was previously down comes back up, it
holds an election. If it happens to be the highest-numbered process
currently running, it will win the election and will take over the
coordinator's job. Thus the biggest guy in town always wins, hence the
name "Bully Algorithm".
The Bully Algorithm was devised by Garcia-Molina in 1982. When a
process notices that the coordinator is no longer responding to requests, it
initiates an election. Process P, holds an election as follows:
Algorithm:
Selects the process with the largest identifier as the coordinator:
1. When a process p detects that the coordinator is not responding to
requests, it initiates an election:
a. p sends an election message to all processes with higher
numbers
b. If nobody responds, then p wins and takes over.
c. If one of the processes answers, then p's job is done.
2. If a process receives an election message from a lower numbered
process at any time,
It
a. Sends an OK message back.
b. Holds an election (unless it’s already holding one).
3. A process announces its victory by sending all processes a
message telling them that
is the new coordinator.
4. If a process that has been down recovers, it holds an election.
Process 4 holds an election
19
20. Processes 5 and 6 respond, telling 4 to stop
Now 5 and 6 hold an election
• The election of coordinator p2,
• after the failure of p4 and the
FIG 3: Election process 1
20
22. FIG 4: Election process 2
Hence, in bully algorithm always the highest-numbered process wins
over the co-coordinator as explained above.
22
23. Chapter 3 Characteristics of distributed computing
So far the focus has been on designing a distributed system that solves a
given problem. A complementary research problem is studying the
properties of a given distributed system.
The halting problem is an analogous example from the field of
centralised computation: we are given a computer program and the task is
to decide whether it halts or runs forever. The halting problem is
undesirable in the general case, and naturally understanding the
behaviour of a computer network is at least as hard as understanding the
behaviour of one computer.
However, there are many interesting special cases that are decidable. In
particular, it is possible to reason about the behaviour of a network of
finite-state machines. One example is telling whether a given network of
interacting (asynchronous and non-deterministic) finite-state machines
can reach a deadlock. This problem is PSPACE-complete, i.e., it is
decidable, but it is not likely that there is an efficient (centralised, parallel
or distributed) algorithm that solves the problem in the case of large
networks.
3.1Fallacies of Distributed computing:
In 1994, Peter Deutsch, a sun fellow at the time, drafted 7 assumptions
architects and designers of distributed systems are likely to make, which
prove wrong in the long run - resulting in all sorts of troubles and pains
for the solution and architects who made the assumptions. In 1997 James
Gosling added another such fallacy. The assumptions are now
collectively known as the "The 8 fallacies of distributed computing.”
They are as follows:
23
24. 1. The network is reliable.
2. Latency is zero.
3. Bandwidth is infinite.
4. The network is secure.
5. Topology doesn't change.
6. There is one administrator.
7. Transport cost is zero.
8. The network is homogeneous.
1. The network is reliable
For one, if your application is a mission critical 365x7 kind of application, you
can just hit that failure--and Murphy will make sure it happens in the most
inappropriate moment. Nevertheless, most applications are not like that.
So what's the problem? Well, there are plenty of problems: Power
failures, someone trips on the network cord, all of a sudden clients
connect wirelessly, and so on. If hardware isn't enough--the software can
fail as well, and it does. The situation is more complicated if you
collaborate with an external partner, such as an e-commerce application
working with an external credit-card processing service. Their side of the
connection is not under your direct control. Lastly there are security
threats like DDOS attacks and the like.
2. Latency is zero
The second fallacy of Distributed Computing is the assumption that
"Latency is Zero". Latency is how much time it takes for data to move
from one place to another (versus bandwidth which is how much data we
can transfer during that time). Latency can be relatively good on a LAN--
but latency deteriorates quickly when you move to WAN scenarios or
internet scenarios. Latency is more problematic than bandwidth.
Here's a quote from a post by Ingo Rammer on latency vs. Bandwidth
that illustrates this: "But I think that it’s really interesting to see that the
end-to-end bandwidth increased by 1468 times within the last 11 years
24
25. while the latency (the time a single ping takes) has only been improved
tenfold. If this wouldn’t be enough, there is even a natural cap on latency.
The minimum round-trip time between two points of this earth is
determined by the maximum speed of information transmission: the
speed of light. At roughly 300,000 kilometres per second (3.6 * 10E12
tera angstrom per fortnight), it will always take at least 30 milliseconds to
send a ping from Europe to the US and back, even if the processing
would be done in real time." You may think all is okay if you only
deploy your application on LANs. However even when you work on a
LAN with Gigabit Ethernet you should still bear in mind that the latency
is much bigger then accessing local memory Assuming the latency is
zero you can be easily tempted to assume making a call over the wire is
almost like making a local calls--this is one of the problems with
approaches like distributed objects, that provide "network
transparency"--alluring you to make a lot of fine grained calls to objects
which are actually remote and expensive (relatively) to call to.
Taking latency into consideration means you should strive to make as few as
possible calls and assuming you have enough bandwidth (which will talk about
next time) you'd want to move as much data out in each of this calls.
3. Bandwidth is infinite
The next Distributed Computing Fallacy is "Bandwidth Is Infinite." This
fallacy, in my opinion, is not as strong as the others. If there is one thing
that is constantly getting better in relation to networks it is bandwidth.
However, there are two forces at work to keep this assumption a fallacy.
One is that while the bandwidth grows, so does the amount of
information we try to squeeze through it. VoIP, videos, and IPTV are
some of the newer applications that take up bandwidth. Downloads,
richer UIs, and reliance on verbose formats (XML) are also at work--
especially if you are using T1 or lower lines. However, even when you
think that this 10Gbit Ethernet would be more than enough, you may be
25
26. hit with more than 3 Terabytes of new data per day (numbers from an
actual project).
The other force at work to lower bandwidth is packet loss. Bandwidth
limitations direct us to strive to limit the size of the information we send
over the wire. The main implication then is to consider that in the
production environment of our application there may be bandwidth
problems which are beyond our control. And we should bear in mind
how much data is expected to travel over the wise.
4. The Network is Secure
Peter Deutsch introduced the Distributed Computing Fallacies back in
1991. You'd think that in the 15 years since then that "the Network is
secure" would no longer be a fallacy. Unfortunately, that's not the case--
and not because the network is now secure. No one would be naive
enough to assume it is.
As an architect you might not be a security expert--but you still need to
be aware that security is needed and the implications it may have (for
instance, you might not be able to use multicast, user accounts with
limited privileges might not be able to access some networked resource
etc.)
5. Topology doesn’t change
The fifth Distributed Computing Fallacy is about network topology.
"Topology doesn't change." That's right; it doesn’t--as long as it stays in
the test lab. When you deploy an application in the wild (that is, to an
organization), the network topology is usually out of your control. The
operations team (IT) is likely to add and remove servers every once in a
while and/or make other changes to the network. Lastly there are server
and network faults which can cause routing changes. When you're talking
about clients, the situation is even worse. There are laptops coming and
26
27. going, wireless ad-hoc networks, new mobile devices. In short, topology
is changing constantly.
6. There is one administrator
The sixth Distributed Computing Fallacy is "There is one administrator".
You may be able to get away with this fallacy if you install your software
on small, isolated LANs (for instance, a single person IT "group" with no
WAN/Internet). However, for most enterprise systems the reality is much
different.
The IT group usually has different administrators, assigned according to
expertise--databases, web servers, networks, Linux, Windows, Main
Frame and the like. This is the easy situation. The problem is occurs
when your company collaborates with external entities (for example,
connecting with a business partner), or if your application is deployed for
Internet consumption and hosted by some hosting service and the
application consumes external services (think Mashups). In these
situations, the other administrators are not even under your control and
they may have their own agendas/rules.
7. Transport cost is zero
On to Distributed Computing Fallacy number 7--"Transport cost is zero".
There are a couple of ways you can interpret this statement, both of
which are false assumptions.
One way is that going from the application level to the transport level is
free. This is a fallacy since we have to do marshalling (serialize
information into bits) to get data unto the wire, which takes both
computer resources and adds to the latency. Interpreting the statement
this way emphasizes the "Latency is Zero" fallacy by reminding us that
there are additional costs (both in time and resources). The second way to
interpret the statement is that the costs (as in cash money) for setting and
running the network are free. This is also far from being true. There are
costs--costs for buying the routers, costs for securing the network, costs
27
28. for leasing the bandwidth for internet connections, and costs for
operating and maintaining the network running. Someone somewhere
will have to pick the tab and pay these costs.
8. The network is homogeneous
The eighth and final Distributed Computing fallacy is "The network is
homogeneous."
Assuming this fallacy should not cause too much trouble at the lower
network level as IP is pretty much ubiquitous (e.g. even a specialized bus
like Infiniband has an IP-Over-IB implementation, although it may result
in suboptimal use of the non-native IP resources. It is worthwhile to pay
attention to the fact the network is not homogeneous at the application
level.
3.2 Problems related to distributed Computing
Traditional computational problems take the perspective that we ask a
question, a computer (or a distributed system) processes the question for
a while, and then produces an answer and stops. However, there are also
problems where we do not want the system to ever stop. Examples of
such problems include the dining philosophers’ problem and other
similar mutual exclusion problems. In these problems, the distributed
system is supposed to continuously coordinate the use of shared
resources so that no conflicts or deadlocks occur.
There are also fundamental challenges that are unique to distributed
computing. The first example is challenges that are related to fault-
tolerance. Examples of related problems include consensus problems,
Byzantine fault tolerance, and self-stabilisation.
A lot of research is also focused on understanding the asynchronous
nature of distributed systems:
28
29. • Synchronizers can be used to run synchronous algorithms in
asynchronous systems.
• Logical clocks provide a causal happened-before ordering of
events.
• Clock synchronization algorithms provide globally consistent
physical time stamps.
3.3 Distributed Computing Application Characteristics
Obviously not all applications are suitable for distributed computing. The
closer an application gets to running in real time, the less appropriate it
is. Even processing tasks that normally take an hour are two may not
derive much benefit if the communications among distributed systems
and the constantly changing availability of processing clients becomes a
bottleneck. Instead you should think in terms of tasks that take hours,
days, weeks, and months. Generally the most appropriate applications,
according to Entropia, consist of "loosely coupled, non-sequential tasks
in batch processes with a high compute-to-data ratio." The high compute
to data ratio goes hand-in-hand with a high compute-to-communications
ratio, as you don't want to bog down the network by sending large
amounts of data to each client, though in some cases you can do so
during off hours. Programs with large databases that can be easily parsed
for distribution are very appropriate.
Clearly, any application with individual tasks that need access to huge
data sets will be more appropriate for larger systems than individual PCs.
If terabytes of data are involved, a supercomputer makes sense as
communications can take place across the system's very high speed
backplane without bogging down the network. Server and other
dedicated system clusters will be more appropriate for other slightly less
data intensive applications. For a distributed application using numerous
PCs, the required data should fit very comfortably in the PC's memory,
with lots of room to spare.
29
30. Taking this further, United Devices recommends that the application should
have the capability to fully exploit "coarse-grained parallelism," meaning it
should be possible to partition the application into independent tasks or
processes that can be computed concurrently. For most solutions there should
not be any need for communication between the tasks except at task boundaries,
though Data Synapse allows some interprocess communications. The tasks and
small blocks of data should be such that they can be processed effectively on a
modern PC and report results that, when combined with other PC's results,
produce coherent output. And the individual tasks should be small enough to
produce a result on these systems within a few hours to a few days.
30
31. Chapter 4 APPLICATIONS
There are two main reasons for using distributed systems and distributed
computing. First, the very nature of the application may require the use
of a communication network that connects several computers. For
example, data is produced in one physical location and it is needed in
another location.
Second, there are many cases in which the use of a single computer
would be possible in principle, but the use of a distributed system is
beneficial for practical reasons. For example, it may be more cost-
efficient to obtain the desired level of performance by using a cluster of
several low-end computers, in comparison with a single high-end
computer. A distributed system can be more reliable than a non-
distributed system, as there is no single point of failure. Moreover, a
distributed system may be easier to expand and manage than a monolithic
uniprocessor system.
4.1 Types of Distributed Computing Applications
Following are some of the areas where distributed computing is being
used:
I. Telecommunication networks
• Telephone networks and cellular networks
• Computer networks such as the Internet.
• Wireless sensor networks.
• Routing algorithms.
II. Network applications:
• World Wide Web and peer-to-peer networks.
31
32. • Massively multiplayer online games and virtual reality
communities.
• Distributed databases and distributed database
management systems.
• Network files systems.
III. Distributed information processing systems such as banking
systems and airline reservation systems:
• Real-time process control
• Aircraft control systems
• Industrial control system
IV. Parallel computation:
• Scientific computing, including cluster computing and
grid computing and various volunteer computing
projects; see the list of distributed computing projects
• Distributed rendering in computer graphics.
4.2 Security and Standard Challenges
The major challenges come with increasing scale. As soon as you move outside
of a corporate firewall, security and standardization challenges become quite
significant. Most of today's vendors currently specialize in applications that stop
at the corporate firewall, though Avaki, in particular, is staking out the global
grid territory. Beyond spanning firewalls with a single platform, lies the
challenge of spanning multiple firewalls and platforms, which means standards.
Most of the current platforms offer high level encryption such as Triple DES.
The application packages that are sent to PCs are digitally signed, to make sure
a rogue application does not infiltrate a system. Avaki comes with its own PKI
(public key infrastructure), for example. Identical application packages are
32
33. typically sent to multiple PCs and the results of each are compared. Any set of
results that differs from the rest becomes security suspect. Even with
encryption, data can still be snooped when the process is running in the client's
memory, so most platforms create application data chunks that are so small, that
it is unlikely snooping them will provide useful information. Avaki claims that
it integrates easily with different existing security infrastructures and can
facilitate the communications among them, but this is obviously a challenge for
global distributed computing.
33
34. CONCLUSION
Distributed computing utilizes a network of many computers, each
accomplishing a portion of an overall task, to achieve a computational
result much more quickly than with a single computer. In addition to a
higher level of computing power, distributed computing also allows
many users to interact and connect openly. Different forms of distributed
computing allow for different levels of openness, with most people
accepting that a higher degree of openness in a distributed computing
system is beneficial.
34