SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Abstract
Distributed computing allows groups to accomplish work that was not feasible before with
supercomputers, due to cost or time constraints. Although the primary functions of distributed
computing systems is to produce needed processing power to complete complex
computations, distributed computing also reaches outside of the processing arena to other
areas such as network usage. When used properly, both areas complement each other and can
produce needed results. When used maliciously, either processing or networking distributed
attacks can produce a brute force that even the best firewalls or encryption are powerless to
prevent. Distributed computing should be tamed and closely guarded against such uses
through efforts to filter out invalid network packets for distributed attacks, and carefully
monitoring computer software to ensure that a distributed computing processing, brute force
attacks cannot occur.
1
Table of Contents
TABLE OF CONTENTS.................................................................................................2
CHAPTER 1 INTRODUCTION.....................................................................................3
1.1 A BRIEF HISTORY OF DISTRIBUTED COMPUTING...............................................................3
1.2 DISTRIBUTED COMPUTING: AN OVERVIEW..................................................................6
CHAPTER 2 ABOUT DISTRIBUTED COMPUTING.........................................................8
2.1MODELS....................................................................................................................8
2.2COMPLEXITY MEASURES.............................................................................................12
2.3 WORKING OF DISTRIBUTED COMPUTING......................................................................14
2.3.1Architecture....................................................................................................14
2.4ALGORITHMS...........................................................................................................18
CHAPTER 3 CHARACTERISTICS OF DISTRIBUTED COMPUTING..................................23
3.1FALLACIES OF DISTRIBUTED COMPUTING:.......................................................................23
3.2 PROBLEMS RELATED TO DISTRIBUTED COMPUTING........................................................28
3.3 DISTRIBUTED COMPUTING APPLICATION CHARACTERISTICS..............................................29
CHAPTER 4 APPLICATIONS ......................................................................................31
4.1TYPES OF DISTRIBUTED COMPUTING APPLICATIONS.........................................................31
4.2 SECURITY AND STANDARD CHALLENGES........................................................................32
CONCLUSION..........................................................................................................34
REFERENCES............................................................................................................35
2
Chapter 1 INTRODUCTION
Distributed computing is a field of computer science that studies
distributed systems. A distributed system consists of multiple
autonomous computers that communicate through a computer network.
The computers interact with each other in order to achieve a common
goal.
1.1 A Brief History of Distributed computing
During the earliest years of computing, any tasks that required large
computations and massive processing were generally left up the
government or a handful of large companies. These entities could afford
to buy massive supercomputers and the infrastructure needed to support
them. With the price of personal computing declining rapidly in price,
and supercomputers still very expensive, an alternative was needed. In
1993, Donald Becker and Thomas Sterling introduced Beowulf
clustering. Although not the first example of clustering, this was the first
time that an effort was made to enable anyone to take off the shelf
computers and build a cluster of computers that could rival top
supercomputers. The concept behind clustering, in its simplest form, is
that many smaller computers can be combined in a way to make a
computing structure that could provide all of the processing power
needed, for much less money. All of the nodes of a cluster are connected
to an isolated internal network and same switch as the serving computer
(SC). The serving computer houses the results and distributes new work
units to all of the attached nodes. Each node is a single-use computer,
allowed to only process the problem that it is given and return the results
when finished. Many of the problems that hindered the first clustering
efforts still cause problems today. One of the more expensive elements is
the dedicated internal network, or interconnects, which link all of the
nodes together to the server. Since these nodes are simple banks of
3
processors, security is almost entirely nonexistent and therefore requires
a great care in isolating the interconnected network from any outside
networking. An additional element contributing to problems is that of
suitable software for the clusters to run. Even though all of the nodes
work together to process chunks of a complete dataset, the software must
still be written to take advantage of the multiples of processors and the
individual resources, such as memory, that the nodes contain. To obtain a
stable and suitable software layer, it may take months or years to perfect
the software to properly process the needed results and use all of the
available power. In the end, taking a major step forward for computing
power, clustering certainly has its problems and insecurities as a
relatively new technology. With distributed computing, it manages to
encompass a much wider scope than clustering by allowing nodes to exist
anywhere in the world and also be multipurpose/multi-function
machines.
Distributed computing has a similar concept as clustering: take a large
problem, break it into smaller units, and allow many nodes to work on
the problem in parallel. Where distributed computing strays from this
concept is by also allowing the nodes to be multifunction and
multipurpose computers that can exist anywhere in the world while being
attached to the Internet. Distributed computing can actually take on many
different orientations of the nodes, all depending on how the client
computers are connected to the Internet. In addition to this element of
flexibility, there is also a level of redundancy that does not exist in
supercomputing or clustering. With clustering and supercomputing, the
data is generally processed only once, due to the large amounts of time
that the entire project may take. In distributed computing, it is often the
case that work units may be distributed multiple times to multiple nodes.
This method serves two functions: to drastically decrease the possibilities
of processing errors, and to account for processing that is done on slower
CPUs or takes too long to return results. What makes this entire system
possible is the application of a small piece of software called a client.
This client handles the data retrieval and submission stages as well as the
4
code necessary to instruct the CPU how to process the work unit. Clients
vary in size, but most are less than 1-2 megabytes in size. The actual data
work units also vary in size, but most are between 250 and 400 kilobytes,
so that the hosting/node CPU can handle the process, and users on slower
internet connections can easily send and receive the data. Between the
small sizes of work units and clients, it seems unreasonable to see a
disadvantage to using distributed computing other than the collection and
analysis of data.
5
1.2 Distributed Computing: An OVERVIEW
Many of you are quite familiar with the power of computers. They can
record music, balance check books, play games with you, process and
print your words, and open a whole encyclopaedia before your eyes. It's
not a sweeping statement to say that computers have made our lives
different. It's hard to envision a life before the age of computers. If a
single computer can change your life, why not connect several of them?
Over the past two decades, networks of computers have even further
changed the way businesses operate and the way the government
functions. From digital library card catalogs to the Think Quest Internet
site you're at now, networks have been another great technological
advancement. Distributed computing is the next steps in computer
progress, where computers are not only networked, but also smartly
distribute their workload across each computer so that they stay busy and
don't squander the electrical energy they feed on. This setup rivals even
the fastest commercial supercomputers built by companies like IBM or
Cray. When you combine the concept of distributed computing with the
tens of millions of computers connected to the Internet, you've got the
fastest computer on Earth.
6
Figure 1: Normal vs. Distributed Computing
7
Chapter 2 ABOUT DISTRIBUTED
COMPUTING
Many tasks that we would like to automate by using a computer are of
question–answer type i.e. We would like to ask a question and the
computer should produce an answer. In theoretical computer science,
such tasks are called computational problems. Formally, a computational
problem consists of instances together with a solution for each instance.
Instances are questions that we can ask, and solutions are desired answers
to these questions.
Theoretical computer science seeks to understand which computational
problems can be solved by using a computer (computability theory) and
how efficiently (computational complexity theory). Traditionally, it is
said that a problem can be solved by using a computer if we can design
an algorithm that produces a correct solution for any given instance. Such
an algorithm can be implemented as a computer program that runs on a
general-purpose computer: the program reads a problem instance from
input, performs some computation, and produces the solution as output.
The field of concurrent and distributed computing studies similar
questions in the case of either multiple computers, or a computer that
executes a network of interacting processes.
2.1 Models
The discussion below focuses on the case of multiple computers,
although many of the issues are the same for concurrent processes
running on a single computer. Three viewpoints are commonly used:
Parallel algorithms in shared-memory model:
• All computers have access to a shared memory. The algorithm
designer chooses the program executed by each computer.
8
• One theoretical model is the parallel random access machines
(PRAM) that are used. However, the classical PRAM model
assumes synchronous access to the shared memory.
• A model that is closer to the behaviour of real-world
multiprocessor machines and takes into account the use of
machine instructions, such as Compare-and-swap (CAS), is that
of asynchronous shared memory. There is a wide body of work
on this model, a summary of which can be found in the
literature.
Parallel algorithms in message-passing model:
• The algorithm designer chooses the structure of the network, as
well as the program executed by each computer.
• Models such as Boolean circuits and sorting networks are used.
A Boolean circuit can be seen as a computer network: each gate
is a computer that runs an extremely simple computer program.
Similarly, a sorting network can be seen as a computer network:
each comparator is a computer.
Distributed algorithms in message-passing model:
• The algorithm designer only chooses the computer program. All
computers run the same program. The system must work
correctly regardless of the structure of the network.
• A commonly used model is a graph with one finite-state machine
per node.
In the case of distributed algorithms, computational problems are
typically related to graphs. Often the graph that describes the structure of
the computer network is the problem instance. This is illustrated in the
following example.
An example:
9
Consider the computational problem of finding a colouring of a given
graph G. Different fields might take the following approaches:
Centralized algorithms
• The graph G is encoded as a string, and the string is given as
input to a computer. The computer program finds a colouring of
the graph, encodes the colouring as a string, and outputs the
result.
Parallel algorithms
• Again, the graph G is encoded as a string. However, multiple
computers can access the same string in parallel. Each computer
might focus on one part of the graph and produce a colouring for
that part.
• The main focus is on high-performance computation that
exploits the processing power of multiple computers in parallel.
Distributed algorithms
• The graph G is the structure of the computer network. There is
one computer for each node of G and one communication link
for each edge of G. Initially, each computer only knows about its
immediate neighbours in the graph G; the computers must
exchange messages with each other to discover more about the
structure of G. Each computer must produce its own colour as
output.
• The main focus is on coordinating the operation of an arbitrary
distributed system.
While the field of parallel algorithms has a different focus than the field
of distributed algorithms, there is a lot of interaction between the two
fields. For example, the Cole–Vishkin algorithm for graph colouring was
originally presented as a parallel algorithm, but the same technique can
10
also be used directly as a distributed algorithm. Moreover, a parallel
algorithm can be implemented either in a parallel system (using shared
memory) or in a distributed system (using message passing). The
traditional boundary between parallel and distributed algorithms (choose
a suitable network vs. run in any given network) does not lie in the same
place as the boundary between parallel and distributed systems (shared
memory vs. message passing).
11
2.2 Complexity Measures
In parallel algorithms, yet another resource in addition to time and space
is the number of computers. Indeed, often there is a trade-off between the
running time and the number of computers: the problem can be solved
faster if there are more computers running in parallel. If a decision
problem can be solved in polylogarithmic time by using a polynomial
number of processors, then the problem is said to be in the class NC. The
class NC can be defined equally well by using the PRAM formalism or
Boolean circuits – PRAM machines can simulate Boolean circuits
efficiently and vice versa.
In the analysis of distributed algorithms, more attention is usually paid on
communication operations than computational steps. Perhaps the
simplest model of distributed computing is a synchronous system where
all nodes operate in a lockstep fashion. During each communication
round, all nodes in parallel (1) receive the latest messages from their
neighbours, (2) perform arbitrary local computation, and (3) send new
messages to their neighbours. In such systems, a central complexity
measure is the number of synchronous communication rounds required to
complete the task.
This complexity measure is closely related to the diameter of the
network. Let D be the diameter of the network. On the one hand, any
computable problem can be solved trivially in a synchronous distributed
system in approximately 2D communication rounds: simply gather all
information in one location (D rounds), solve the problem, and inform
each node about the solution (D rounds).
On the other hand, if the running time of the algorithm is much smaller
than D communication rounds, then the nodes in the network must
produce their output without having the possibility to obtain information
about distant parts of the network. In other words, the nodes must make
12
globally consistent decisions based on information that is available in
their local neighbourhood. Many distributed algorithms are known with
the running time much smaller than D rounds, and understanding which
problems can be solved by such algorithms is one of the central research
questions of the field. Other commonly used measures are the total
number of bits transmitted in the network.
13
2.3 Working of Distributed Computing
In most cases today, a distributed computing architecture consists of very
lightweight software agents installed on a number of client systems, and
one or more dedicated distributed computing management servers. There
may also be requesting clients with software that allows them to submit
jobs along with lists of their required resources.
An agent running on a processing client detects when the system is idle,
notifies the management server that the system is available for
processing, and usually requests an application package. The client then
receives an application package from the server and runs the software
when it has spare CPU cycles, and sends the results back to the server.
The application may run as a screen saver, or simply in the background,
without impacting normal use of the computer. If the user of the client
system needs to run his own applications at any time, control is
immediately returned, and processing of the distributed application
package ends. This must be essentially instantaneous, as any delay in
returning control will probably be unacceptable to the user.
2.3.1 Architecture
Various hardware and software architectures are used for distributed
computing. At a lower level, it is necessary to interconnect multiple
CPUs with some sort of network, regardless of whether that network is
printed onto a circuit board or made up of loosely-coupled devices and
cables. At a higher level, it is necessary to interconnect processes running
on those CPUs with some sort of communication system.
Distributed programming typically falls into one of several basic
architectures or categories: client–server, 3-tier architecture, n-tier
architecture, distributed objects, loose coupling, or tight coupling.
14
• Client–server: Smart client code contacts the server for data
then formats and displays it to the user. Input at the client is
committed back to the server when it represents a permanent
change.
• 3-tier architecture: Three tier systems move the client
intelligence to a middle tier so that stateless clients can be used.
This simplifies application deployment. Most web applications
are 3-Tier.
• N-tier architecture: n-tier refers typically to web applications
which further forward their requests to other enterprise services.
This type of application is the one most responsible for the
success of application servers.
• Tightly coupled (clustered): refers typically to a cluster of
machines that closely work together, running a shared process
in parallel. The task is subdivided in parts that are made
individually by each one and then put back together to make the
final result.
• Peer-to-peer: an architecture where there is no special machine
or machines that provide a service or manage the network
resources. Instead all responsibilities are uniformly divided
among all machines, known as peers. Peers can serve both as
clients and servers.
• Space based: refers to an infrastructure that creates the illusion
(virtualization) of one single address-space. Data are
transparently replicated according to application needs.
Decoupling in time, space and reference is achieved.
Another basic aspect of distributed computing architecture is the
method of communicating and coordinating work among
concurrent processes. Through various message passing protocols,
processes may communicate directly with one another, typically in
15
a master/slave relationship. Alternatively, a "database-centric"
architecture can enable distributed computing to be done without
any form of direct inter-process communication, by utilizing a
shared database.
Client-Server architecture:
Advantages:
• In most cases, client–server architecture enables the roles
and responsibilities of a computing system to be distributed
among several independent computers that are known to
each other only through a network. This creates an
additional advantage to this architecture: greater ease of
maintenance. For example, it is possible to replace, repair,
upgrade, or even relocate a server while its clients remain
both unaware and unaffected by that change.
• All data is stored on the servers, which generally have far
greater security controls than most clients. Servers can
better control access and resources, to guarantee that only
those clients with the appropriate permissions may access
and change data.
• Since data storage is centralized, updates to that data are far
easier to administer in comparison to a P2P paradigm. In the
latter, data updates may need to be distributed and applied to
each peer in the network, which is both time-consuming and
error-prone, as there can be thousands or even millions of
peers.
• Many mature client–server technologies are already
available which were designed to ensure security,
friendliness of the user interface, and ease of use.
16
• It functions with multiple different clients of different
capabilities.
Disadvantages:
• As the number of simultaneous client requests to a given server
increases, the server can become overloaded. Contrast that to a
P2P network, where its aggregated bandwidth actually increases
as nodes are added, since the P2P network's overall bandwidth
can be roughly computed as the sum of the bandwidths of every
node in that network.
• The client–server paradigm lacks the robustness of a good P2P
network. Under client server, should a critical server fail, clients’
requests cannot be fulfilled. In P2P networks, resources are
usually distributed among many nodes. Even if one or more
nodes depart and abandon a downloading file, for example, the
remaining nodes should still have the data needed to complete the
download.
Fig 2: Client
Server Architecture
17
2.4 Algorithms
Various algorithms are used for distributed systems. Some of them are as
follows:
1. Bully Algorithm
2. Byzantine Fault Tolerance
3. Algorithms based on Clock synchronisation
4. Lamport Ordering
5. Algorithms based on Mutual exclusion
6. Snapshot algorithm
7. Algorithms based on Detection of process termination
8. Vector clocks
In this paper, we are going to concentrate more on BULLY ELECTION
ALGORITHM.
The bully algorithm is a method in distributed computing for
dynamically selecting a coordinator by process ID number.
The Bully Algorithm was devised by Garcia-Molina in 1982. When a
process notices that the coordinator is no longer responding to requests, it
initiates an election. Process P, holds an election as follows:
1) P sends an ELECTION message to all processes with higher numbers.
2) If no one responds, P wins the election and becomes coordinator.
3) If one of the higher-ups answers, it takes over. P's job is done.
At any moment, a process can get an ELECTION message from one of
its lower-numbered colleagues. When such a message arrives, the
receiver sends an OK message back to the sender to indicate that it is
alive and will take over. The receiver then holds an election, unless it is
already holding one. Eventually, all processes give up but one and that
one is the new coordinator. It announces its victory by sending all
18
processes a message telling them that starting immediately it is the new
coordinator. If a process that was previously down comes back up, it
holds an election. If it happens to be the highest-numbered process
currently running, it will win the election and will take over the
coordinator's job. Thus the biggest guy in town always wins, hence the
name "Bully Algorithm".
The Bully Algorithm was devised by Garcia-Molina in 1982. When a
process notices that the coordinator is no longer responding to requests, it
initiates an election. Process P, holds an election as follows:
Algorithm:
Selects the process with the largest identifier as the coordinator:
1. When a process p detects that the coordinator is not responding to
requests, it initiates an election:
a. p sends an election message to all processes with higher
numbers
b. If nobody responds, then p wins and takes over.
c. If one of the processes answers, then p's job is done.
2. If a process receives an election message from a lower numbered
process at any time,
It
a. Sends an OK message back.
b. Holds an election (unless it’s already holding one).
3. A process announces its victory by sending all processes a
message telling them that
is the new coordinator.
4. If a process that has been down recovers, it holds an election.
Process 4 holds an election
19
Processes 5 and 6 respond, telling 4 to stop
Now 5 and 6 hold an election
• The election of coordinator p2,
• after the failure of p4 and the
FIG 3: Election process 1
20
21
FIG 4: Election process 2
Hence, in bully algorithm always the highest-numbered process wins
over the co-coordinator as explained above.
22
Chapter 3 Characteristics of distributed computing
So far the focus has been on designing a distributed system that solves a
given problem. A complementary research problem is studying the
properties of a given distributed system.
The halting problem is an analogous example from the field of
centralised computation: we are given a computer program and the task is
to decide whether it halts or runs forever. The halting problem is
undesirable in the general case, and naturally understanding the
behaviour of a computer network is at least as hard as understanding the
behaviour of one computer.
However, there are many interesting special cases that are decidable. In
particular, it is possible to reason about the behaviour of a network of
finite-state machines. One example is telling whether a given network of
interacting (asynchronous and non-deterministic) finite-state machines
can reach a deadlock. This problem is PSPACE-complete, i.e., it is
decidable, but it is not likely that there is an efficient (centralised, parallel
or distributed) algorithm that solves the problem in the case of large
networks.
3.1Fallacies of Distributed computing:
In 1994, Peter Deutsch, a sun fellow at the time, drafted 7 assumptions
architects and designers of distributed systems are likely to make, which
prove wrong in the long run - resulting in all sorts of troubles and pains
for the solution and architects who made the assumptions. In 1997 James
Gosling added another such fallacy. The assumptions are now
collectively known as the "The 8 fallacies of distributed computing.”
They are as follows:
23
1. The network is reliable.
2. Latency is zero.
3. Bandwidth is infinite.
4. The network is secure.
5. Topology doesn't change.
6. There is one administrator.
7. Transport cost is zero.
8. The network is homogeneous.
1. The network is reliable
For one, if your application is a mission critical 365x7 kind of application, you
can just hit that failure--and Murphy will make sure it happens in the most
inappropriate moment. Nevertheless, most applications are not like that.
So what's the problem? Well, there are plenty of problems: Power
failures, someone trips on the network cord, all of a sudden clients
connect wirelessly, and so on. If hardware isn't enough--the software can
fail as well, and it does. The situation is more complicated if you
collaborate with an external partner, such as an e-commerce application
working with an external credit-card processing service. Their side of the
connection is not under your direct control. Lastly there are security
threats like DDOS attacks and the like.
2. Latency is zero
The second fallacy of Distributed Computing is the assumption that
"Latency is Zero". Latency is how much time it takes for data to move
from one place to another (versus bandwidth which is how much data we
can transfer during that time). Latency can be relatively good on a LAN--
but latency deteriorates quickly when you move to WAN scenarios or
internet scenarios. Latency is more problematic than bandwidth.
Here's a quote from a post by Ingo Rammer on latency vs. Bandwidth
that illustrates this: "But I think that it’s really interesting to see that the
end-to-end bandwidth increased by 1468 times within the last 11 years
24
while the latency (the time a single ping takes) has only been improved
tenfold. If this wouldn’t be enough, there is even a natural cap on latency.
The minimum round-trip time between two points of this earth is
determined by the maximum speed of information transmission: the
speed of light. At roughly 300,000 kilometres per second (3.6 * 10E12
tera angstrom per fortnight), it will always take at least 30 milliseconds to
send a ping from Europe to the US and back, even if the processing
would be done in real time." You may think all is okay if you only
deploy your application on LANs. However even when you work on a
LAN with Gigabit Ethernet you should still bear in mind that the latency
is much bigger then accessing local memory Assuming the latency is
zero you can be easily tempted to assume making a call over the wire is
almost like making a local calls--this is one of the problems with
approaches like distributed objects, that provide "network
transparency"--alluring you to make a lot of fine grained calls to objects
which are actually remote and expensive (relatively) to call to.
Taking latency into consideration means you should strive to make as few as
possible calls and assuming you have enough bandwidth (which will talk about
next time) you'd want to move as much data out in each of this calls.
3. Bandwidth is infinite
The next Distributed Computing Fallacy is "Bandwidth Is Infinite." This
fallacy, in my opinion, is not as strong as the others. If there is one thing
that is constantly getting better in relation to networks it is bandwidth.
However, there are two forces at work to keep this assumption a fallacy.
One is that while the bandwidth grows, so does the amount of
information we try to squeeze through it. VoIP, videos, and IPTV are
some of the newer applications that take up bandwidth. Downloads,
richer UIs, and reliance on verbose formats (XML) are also at work--
especially if you are using T1 or lower lines. However, even when you
think that this 10Gbit Ethernet would be more than enough, you may be
25
hit with more than 3 Terabytes of new data per day (numbers from an
actual project).
The other force at work to lower bandwidth is packet loss. Bandwidth
limitations direct us to strive to limit the size of the information we send
over the wire. The main implication then is to consider that in the
production environment of our application there may be bandwidth
problems which are beyond our control. And we should bear in mind
how much data is expected to travel over the wise.
4. The Network is Secure
Peter Deutsch introduced the Distributed Computing Fallacies back in
1991. You'd think that in the 15 years since then that "the Network is
secure" would no longer be a fallacy. Unfortunately, that's not the case--
and not because the network is now secure. No one would be naive
enough to assume it is.
As an architect you might not be a security expert--but you still need to
be aware that security is needed and the implications it may have (for
instance, you might not be able to use multicast, user accounts with
limited privileges might not be able to access some networked resource
etc.)
5. Topology doesn’t change
The fifth Distributed Computing Fallacy is about network topology.
"Topology doesn't change." That's right; it doesn’t--as long as it stays in
the test lab. When you deploy an application in the wild (that is, to an
organization), the network topology is usually out of your control. The
operations team (IT) is likely to add and remove servers every once in a
while and/or make other changes to the network. Lastly there are server
and network faults which can cause routing changes. When you're talking
about clients, the situation is even worse. There are laptops coming and
26
going, wireless ad-hoc networks, new mobile devices. In short, topology
is changing constantly.
6. There is one administrator
The sixth Distributed Computing Fallacy is "There is one administrator".
You may be able to get away with this fallacy if you install your software
on small, isolated LANs (for instance, a single person IT "group" with no
WAN/Internet). However, for most enterprise systems the reality is much
different.
The IT group usually has different administrators, assigned according to
expertise--databases, web servers, networks, Linux, Windows, Main
Frame and the like. This is the easy situation. The problem is occurs
when your company collaborates with external entities (for example,
connecting with a business partner), or if your application is deployed for
Internet consumption and hosted by some hosting service and the
application consumes external services (think Mashups). In these
situations, the other administrators are not even under your control and
they may have their own agendas/rules.
7. Transport cost is zero
On to Distributed Computing Fallacy number 7--"Transport cost is zero".
There are a couple of ways you can interpret this statement, both of
which are false assumptions.
One way is that going from the application level to the transport level is
free. This is a fallacy since we have to do marshalling (serialize
information into bits) to get data unto the wire, which takes both
computer resources and adds to the latency. Interpreting the statement
this way emphasizes the "Latency is Zero" fallacy by reminding us that
there are additional costs (both in time and resources). The second way to
interpret the statement is that the costs (as in cash money) for setting and
running the network are free. This is also far from being true. There are
costs--costs for buying the routers, costs for securing the network, costs
27
for leasing the bandwidth for internet connections, and costs for
operating and maintaining the network running. Someone somewhere
will have to pick the tab and pay these costs.
8. The network is homogeneous
The eighth and final Distributed Computing fallacy is "The network is
homogeneous."
Assuming this fallacy should not cause too much trouble at the lower
network level as IP is pretty much ubiquitous (e.g. even a specialized bus
like Infiniband has an IP-Over-IB implementation, although it may result
in suboptimal use of the non-native IP resources. It is worthwhile to pay
attention to the fact the network is not homogeneous at the application
level.
3.2 Problems related to distributed Computing
Traditional computational problems take the perspective that we ask a
question, a computer (or a distributed system) processes the question for
a while, and then produces an answer and stops. However, there are also
problems where we do not want the system to ever stop. Examples of
such problems include the dining philosophers’ problem and other
similar mutual exclusion problems. In these problems, the distributed
system is supposed to continuously coordinate the use of shared
resources so that no conflicts or deadlocks occur.
There are also fundamental challenges that are unique to distributed
computing. The first example is challenges that are related to fault-
tolerance. Examples of related problems include consensus problems,
Byzantine fault tolerance, and self-stabilisation.
A lot of research is also focused on understanding the asynchronous
nature of distributed systems:
28
• Synchronizers can be used to run synchronous algorithms in
asynchronous systems.
• Logical clocks provide a causal happened-before ordering of
events.
• Clock synchronization algorithms provide globally consistent
physical time stamps.
3.3 Distributed Computing Application Characteristics
Obviously not all applications are suitable for distributed computing. The
closer an application gets to running in real time, the less appropriate it
is. Even processing tasks that normally take an hour are two may not
derive much benefit if the communications among distributed systems
and the constantly changing availability of processing clients becomes a
bottleneck. Instead you should think in terms of tasks that take hours,
days, weeks, and months. Generally the most appropriate applications,
according to Entropia, consist of "loosely coupled, non-sequential tasks
in batch processes with a high compute-to-data ratio." The high compute
to data ratio goes hand-in-hand with a high compute-to-communications
ratio, as you don't want to bog down the network by sending large
amounts of data to each client, though in some cases you can do so
during off hours. Programs with large databases that can be easily parsed
for distribution are very appropriate.
Clearly, any application with individual tasks that need access to huge
data sets will be more appropriate for larger systems than individual PCs.
If terabytes of data are involved, a supercomputer makes sense as
communications can take place across the system's very high speed
backplane without bogging down the network. Server and other
dedicated system clusters will be more appropriate for other slightly less
data intensive applications. For a distributed application using numerous
PCs, the required data should fit very comfortably in the PC's memory,
with lots of room to spare.
29
Taking this further, United Devices recommends that the application should
have the capability to fully exploit "coarse-grained parallelism," meaning it
should be possible to partition the application into independent tasks or
processes that can be computed concurrently. For most solutions there should
not be any need for communication between the tasks except at task boundaries,
though Data Synapse allows some interprocess communications. The tasks and
small blocks of data should be such that they can be processed effectively on a
modern PC and report results that, when combined with other PC's results,
produce coherent output. And the individual tasks should be small enough to
produce a result on these systems within a few hours to a few days.
30
Chapter 4 APPLICATIONS
There are two main reasons for using distributed systems and distributed
computing. First, the very nature of the application may require the use
of a communication network that connects several computers. For
example, data is produced in one physical location and it is needed in
another location.
Second, there are many cases in which the use of a single computer
would be possible in principle, but the use of a distributed system is
beneficial for practical reasons. For example, it may be more cost-
efficient to obtain the desired level of performance by using a cluster of
several low-end computers, in comparison with a single high-end
computer. A distributed system can be more reliable than a non-
distributed system, as there is no single point of failure. Moreover, a
distributed system may be easier to expand and manage than a monolithic
uniprocessor system.
4.1 Types of Distributed Computing Applications
Following are some of the areas where distributed computing is being
used:
I. Telecommunication networks
• Telephone networks and cellular networks
• Computer networks such as the Internet.
• Wireless sensor networks.
• Routing algorithms.
II. Network applications:
• World Wide Web and peer-to-peer networks.
31
• Massively multiplayer online games and virtual reality
communities.
• Distributed databases and distributed database
management systems.
• Network files systems.
III. Distributed information processing systems such as banking
systems and airline reservation systems:
• Real-time process control
• Aircraft control systems
• Industrial control system
IV. Parallel computation:
• Scientific computing, including cluster computing and
grid computing and various volunteer computing
projects; see the list of distributed computing projects
• Distributed rendering in computer graphics.
4.2 Security and Standard Challenges
The major challenges come with increasing scale. As soon as you move outside
of a corporate firewall, security and standardization challenges become quite
significant. Most of today's vendors currently specialize in applications that stop
at the corporate firewall, though Avaki, in particular, is staking out the global
grid territory. Beyond spanning firewalls with a single platform, lies the
challenge of spanning multiple firewalls and platforms, which means standards.
Most of the current platforms offer high level encryption such as Triple DES.
The application packages that are sent to PCs are digitally signed, to make sure
a rogue application does not infiltrate a system. Avaki comes with its own PKI
(public key infrastructure), for example. Identical application packages are
32
typically sent to multiple PCs and the results of each are compared. Any set of
results that differs from the rest becomes security suspect. Even with
encryption, data can still be snooped when the process is running in the client's
memory, so most platforms create application data chunks that are so small, that
it is unlikely snooping them will provide useful information. Avaki claims that
it integrates easily with different existing security infrastructures and can
facilitate the communications among them, but this is obviously a challenge for
global distributed computing.
33
CONCLUSION
Distributed computing utilizes a network of many computers, each
accomplishing a portion of an overall task, to achieve a computational
result much more quickly than with a single computer. In addition to a
higher level of computing power, distributed computing also allows
many users to interact and connect openly. Different forms of distributed
computing allow for different levels of openness, with most people
accepting that a higher degree of openness in a distributed computing
system is beneficial.
34
REFERENCES
[1] http://www.wisegeek.com/what-is-distributed-computing.htm
[2] http://www.scribd.com/doc/6919757/BULLY-ALGORITHM
[3] http://www.extremetech.com/article2/0,2845,1153023,00.asp
[4] www.informatics.sussex.ac.uk/courses/dist-sys/
35

Weitere ähnliche Inhalte

Was ist angesagt?

Concepts of Distributed Computing & Cloud Computing
Concepts of Distributed Computing & Cloud Computing Concepts of Distributed Computing & Cloud Computing
Concepts of Distributed Computing & Cloud Computing Hitesh Kumar Markam
 
Disambiguating Advanced Computing for Humanities Researchers
Disambiguating Advanced Computing for Humanities ResearchersDisambiguating Advanced Computing for Humanities Researchers
Disambiguating Advanced Computing for Humanities ResearchersBaden Hughes
 
Distributed Computing
Distributed Computing Distributed Computing
Distributed Computing Megha yadav
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bankpkaviya
 
From Mainframe to Microservice: An Introduction to Distributed Systems
From Mainframe to Microservice: An Introduction to Distributed SystemsFrom Mainframe to Microservice: An Introduction to Distributed Systems
From Mainframe to Microservice: An Introduction to Distributed SystemsTyler Treat
 
Cluster computing
Cluster computingCluster computing
Cluster computingAdarsh110
 
Simplified Data Processing On Large Cluster
Simplified Data Processing On Large ClusterSimplified Data Processing On Large Cluster
Simplified Data Processing On Large ClusterHarsh Kevadia
 
Implementation of the Open Source Virtualization Technologies in Cloud Computing
Implementation of the Open Source Virtualization Technologies in Cloud ComputingImplementation of the Open Source Virtualization Technologies in Cloud Computing
Implementation of the Open Source Virtualization Technologies in Cloud Computingijccsa
 
Coupling-Based Internal Clock Synchronization for Large Scale Dynamic Distrib...
Coupling-Based Internal Clock Synchronization for Large Scale Dynamic Distrib...Coupling-Based Internal Clock Synchronization for Large Scale Dynamic Distrib...
Coupling-Based Internal Clock Synchronization for Large Scale Dynamic Distrib...Angelo Corsaro
 
Web OS in Cloud Computing: A Case study
Web OS in Cloud Computing: A Case studyWeb OS in Cloud Computing: A Case study
Web OS in Cloud Computing: A Case studyEswar Publications
 
Data Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big DataData Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big Dataijccsa
 

Was ist angesagt? (18)

"Volunteer Computing with BOINC Client-Server side" por Diamantino Cruz e Ric...
"Volunteer Computing with BOINC Client-Server side" por Diamantino Cruz e Ric..."Volunteer Computing with BOINC Client-Server side" por Diamantino Cruz e Ric...
"Volunteer Computing with BOINC Client-Server side" por Diamantino Cruz e Ric...
 
Concepts of Distributed Computing & Cloud Computing
Concepts of Distributed Computing & Cloud Computing Concepts of Distributed Computing & Cloud Computing
Concepts of Distributed Computing & Cloud Computing
 
Class 2
Class 2Class 2
Class 2
 
Disambiguating Advanced Computing for Humanities Researchers
Disambiguating Advanced Computing for Humanities ResearchersDisambiguating Advanced Computing for Humanities Researchers
Disambiguating Advanced Computing for Humanities Researchers
 
Distributed Computing
Distributed Computing Distributed Computing
Distributed Computing
 
Virtualization Approach: Theory and Application
Virtualization Approach: Theory and ApplicationVirtualization Approach: Theory and Application
Virtualization Approach: Theory and Application
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bank
 
From Mainframe to Microservice: An Introduction to Distributed Systems
From Mainframe to Microservice: An Introduction to Distributed SystemsFrom Mainframe to Microservice: An Introduction to Distributed Systems
From Mainframe to Microservice: An Introduction to Distributed Systems
 
Lec2
Lec2Lec2
Lec2
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Simplified Data Processing On Large Cluster
Simplified Data Processing On Large ClusterSimplified Data Processing On Large Cluster
Simplified Data Processing On Large Cluster
 
Implementation of the Open Source Virtualization Technologies in Cloud Computing
Implementation of the Open Source Virtualization Technologies in Cloud ComputingImplementation of the Open Source Virtualization Technologies in Cloud Computing
Implementation of the Open Source Virtualization Technologies in Cloud Computing
 
Coupling-Based Internal Clock Synchronization for Large Scale Dynamic Distrib...
Coupling-Based Internal Clock Synchronization for Large Scale Dynamic Distrib...Coupling-Based Internal Clock Synchronization for Large Scale Dynamic Distrib...
Coupling-Based Internal Clock Synchronization for Large Scale Dynamic Distrib...
 
Aw4103303306
Aw4103303306Aw4103303306
Aw4103303306
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Web OS in Cloud Computing: A Case study
Web OS in Cloud Computing: A Case studyWeb OS in Cloud Computing: A Case study
Web OS in Cloud Computing: A Case study
 
Data Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big DataData Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big Data
 

Andere mochten auch

Using Vector Clocks to Visualize Communication Flow
Using Vector Clocks to Visualize Communication FlowUsing Vector Clocks to Visualize Communication Flow
Using Vector Clocks to Visualize Communication FlowMartin Harrigan
 
Open access stratus 2012
Open access stratus 2012Open access stratus 2012
Open access stratus 2012Fabiana Kubke
 
Perl web programming
Perl web programmingPerl web programming
Perl web programmingJohnny Pork
 
Presentation1'
Presentation1'Presentation1'
Presentation1'tommydjfx
 
5Q Communications "5 Questions That Change The Web"
5Q Communications "5 Questions That Change The Web"5Q Communications "5 Questions That Change The Web"
5Q Communications "5 Questions That Change The Web"Five Q
 
Сделать Karakuri
Сделать KarakuriСделать Karakuri
Сделать KarakuriPeugeotUA
 
About Five Q
About Five Q About Five Q
About Five Q Five Q
 
Davis presentation1 powerpoint fernandez davis
Davis presentation1 powerpoint fernandez davisDavis presentation1 powerpoint fernandez davis
Davis presentation1 powerpoint fernandez davisLM9
 
Hostway StreamPlay Presentation
Hostway StreamPlay PresentationHostway StreamPlay Presentation
Hostway StreamPlay PresentationHostway Romania
 
Full portfolio 2010.indb
Full portfolio 2010.indbFull portfolio 2010.indb
Full portfolio 2010.indbfontingyu
 
Profitable Key phrases
Profitable Key phrasesProfitable Key phrases
Profitable Key phrasesiamzoinks
 
Callture Partner Presentation
Callture Partner PresentationCallture Partner Presentation
Callture Partner PresentationCallture Inc
 

Andere mochten auch (20)

Using Vector Clocks to Visualize Communication Flow
Using Vector Clocks to Visualize Communication FlowUsing Vector Clocks to Visualize Communication Flow
Using Vector Clocks to Visualize Communication Flow
 
Cets 2013 carr making blended learning more effective
Cets 2013 carr making blended learning more effectiveCets 2013 carr making blended learning more effective
Cets 2013 carr making blended learning more effective
 
Guangzhou
GuangzhouGuangzhou
Guangzhou
 
Cets 2014 osborn new competency model
Cets 2014 osborn new competency modelCets 2014 osborn new competency model
Cets 2014 osborn new competency model
 
Hbase
HbaseHbase
Hbase
 
Open access stratus 2012
Open access stratus 2012Open access stratus 2012
Open access stratus 2012
 
Perl web programming
Perl web programmingPerl web programming
Perl web programming
 
Introduction to mentoring
Introduction to mentoringIntroduction to mentoring
Introduction to mentoring
 
Presentation1'
Presentation1'Presentation1'
Presentation1'
 
Diapositiva inehal
Diapositiva inehalDiapositiva inehal
Diapositiva inehal
 
5Q Communications "5 Questions That Change The Web"
5Q Communications "5 Questions That Change The Web"5Q Communications "5 Questions That Change The Web"
5Q Communications "5 Questions That Change The Web"
 
Ecosystem of community
Ecosystem of communityEcosystem of community
Ecosystem of community
 
How to code
How to codeHow to code
How to code
 
Сделать Karakuri
Сделать KarakuriСделать Karakuri
Сделать Karakuri
 
About Five Q
About Five Q About Five Q
About Five Q
 
Davis presentation1 powerpoint fernandez davis
Davis presentation1 powerpoint fernandez davisDavis presentation1 powerpoint fernandez davis
Davis presentation1 powerpoint fernandez davis
 
Hostway StreamPlay Presentation
Hostway StreamPlay PresentationHostway StreamPlay Presentation
Hostway StreamPlay Presentation
 
Full portfolio 2010.indb
Full portfolio 2010.indbFull portfolio 2010.indb
Full portfolio 2010.indb
 
Profitable Key phrases
Profitable Key phrasesProfitable Key phrases
Profitable Key phrases
 
Callture Partner Presentation
Callture Partner PresentationCallture Partner Presentation
Callture Partner Presentation
 

Ähnlich wie Seminar

Models of-network-computing
Models of-network-computingModels of-network-computing
Models of-network-computingAnupam1234
 
Computer's clasification
Computer's clasificationComputer's clasification
Computer's clasificationMayraChF
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computersshopnil786
 
Cloud computing overview
Cloud computing overviewCloud computing overview
Cloud computing overviewKHANSAFEE
 
CC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfCC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfHasanAfwaaz1
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
UNIT I -Cloud Computing (1).pdf
UNIT I -Cloud Computing (1).pdfUNIT I -Cloud Computing (1).pdf
UNIT I -Cloud Computing (1).pdflauroeuginbritto
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster ComputingNIKHIL NAIR
 
Grid computing dis
Grid computing disGrid computing dis
Grid computing disgopishna09
 
Cloud Computing Technology
Cloud Computing TechnologyCloud Computing Technology
Cloud Computing TechnologyAhmed Al Salih
 
CLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdf
CLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdfCLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdf
CLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdfyadavkarthik4437
 

Ähnlich wie Seminar (20)

Cluster computing
Cluster computingCluster computing
Cluster computing
 
CLOUD COMPUTING Unit-I.pdf
CLOUD COMPUTING Unit-I.pdfCLOUD COMPUTING Unit-I.pdf
CLOUD COMPUTING Unit-I.pdf
 
Models of-network-computing
Models of-network-computingModels of-network-computing
Models of-network-computing
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Computer's clasification
Computer's clasificationComputer's clasification
Computer's clasification
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computers
 
Cloud computing overview
Cloud computing overviewCloud computing overview
Cloud computing overview
 
Information system architecture
Information system architectureInformation system architecture
Information system architecture
 
CC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfCC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdf
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
Computers types
Computers typesComputers types
Computers types
 
UNIT I -Cloud Computing (1).pdf
UNIT I -Cloud Computing (1).pdfUNIT I -Cloud Computing (1).pdf
UNIT I -Cloud Computing (1).pdf
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Itc notes
Itc notesItc notes
Itc notes
 
Grid computing dis
Grid computing disGrid computing dis
Grid computing dis
 
Cloud Computing Technology
Cloud Computing TechnologyCloud Computing Technology
Cloud Computing Technology
 
CLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdf
CLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdfCLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdf
CLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdf
 
Cloud Computing
Cloud Computing Cloud Computing
Cloud Computing
 

Kürzlich hochgeladen

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Seminar

  • 1. Abstract Distributed computing allows groups to accomplish work that was not feasible before with supercomputers, due to cost or time constraints. Although the primary functions of distributed computing systems is to produce needed processing power to complete complex computations, distributed computing also reaches outside of the processing arena to other areas such as network usage. When used properly, both areas complement each other and can produce needed results. When used maliciously, either processing or networking distributed attacks can produce a brute force that even the best firewalls or encryption are powerless to prevent. Distributed computing should be tamed and closely guarded against such uses through efforts to filter out invalid network packets for distributed attacks, and carefully monitoring computer software to ensure that a distributed computing processing, brute force attacks cannot occur. 1
  • 2. Table of Contents TABLE OF CONTENTS.................................................................................................2 CHAPTER 1 INTRODUCTION.....................................................................................3 1.1 A BRIEF HISTORY OF DISTRIBUTED COMPUTING...............................................................3 1.2 DISTRIBUTED COMPUTING: AN OVERVIEW..................................................................6 CHAPTER 2 ABOUT DISTRIBUTED COMPUTING.........................................................8 2.1MODELS....................................................................................................................8 2.2COMPLEXITY MEASURES.............................................................................................12 2.3 WORKING OF DISTRIBUTED COMPUTING......................................................................14 2.3.1Architecture....................................................................................................14 2.4ALGORITHMS...........................................................................................................18 CHAPTER 3 CHARACTERISTICS OF DISTRIBUTED COMPUTING..................................23 3.1FALLACIES OF DISTRIBUTED COMPUTING:.......................................................................23 3.2 PROBLEMS RELATED TO DISTRIBUTED COMPUTING........................................................28 3.3 DISTRIBUTED COMPUTING APPLICATION CHARACTERISTICS..............................................29 CHAPTER 4 APPLICATIONS ......................................................................................31 4.1TYPES OF DISTRIBUTED COMPUTING APPLICATIONS.........................................................31 4.2 SECURITY AND STANDARD CHALLENGES........................................................................32 CONCLUSION..........................................................................................................34 REFERENCES............................................................................................................35 2
  • 3. Chapter 1 INTRODUCTION Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal. 1.1 A Brief History of Distributed computing During the earliest years of computing, any tasks that required large computations and massive processing were generally left up the government or a handful of large companies. These entities could afford to buy massive supercomputers and the infrastructure needed to support them. With the price of personal computing declining rapidly in price, and supercomputers still very expensive, an alternative was needed. In 1993, Donald Becker and Thomas Sterling introduced Beowulf clustering. Although not the first example of clustering, this was the first time that an effort was made to enable anyone to take off the shelf computers and build a cluster of computers that could rival top supercomputers. The concept behind clustering, in its simplest form, is that many smaller computers can be combined in a way to make a computing structure that could provide all of the processing power needed, for much less money. All of the nodes of a cluster are connected to an isolated internal network and same switch as the serving computer (SC). The serving computer houses the results and distributes new work units to all of the attached nodes. Each node is a single-use computer, allowed to only process the problem that it is given and return the results when finished. Many of the problems that hindered the first clustering efforts still cause problems today. One of the more expensive elements is the dedicated internal network, or interconnects, which link all of the nodes together to the server. Since these nodes are simple banks of 3
  • 4. processors, security is almost entirely nonexistent and therefore requires a great care in isolating the interconnected network from any outside networking. An additional element contributing to problems is that of suitable software for the clusters to run. Even though all of the nodes work together to process chunks of a complete dataset, the software must still be written to take advantage of the multiples of processors and the individual resources, such as memory, that the nodes contain. To obtain a stable and suitable software layer, it may take months or years to perfect the software to properly process the needed results and use all of the available power. In the end, taking a major step forward for computing power, clustering certainly has its problems and insecurities as a relatively new technology. With distributed computing, it manages to encompass a much wider scope than clustering by allowing nodes to exist anywhere in the world and also be multipurpose/multi-function machines. Distributed computing has a similar concept as clustering: take a large problem, break it into smaller units, and allow many nodes to work on the problem in parallel. Where distributed computing strays from this concept is by also allowing the nodes to be multifunction and multipurpose computers that can exist anywhere in the world while being attached to the Internet. Distributed computing can actually take on many different orientations of the nodes, all depending on how the client computers are connected to the Internet. In addition to this element of flexibility, there is also a level of redundancy that does not exist in supercomputing or clustering. With clustering and supercomputing, the data is generally processed only once, due to the large amounts of time that the entire project may take. In distributed computing, it is often the case that work units may be distributed multiple times to multiple nodes. This method serves two functions: to drastically decrease the possibilities of processing errors, and to account for processing that is done on slower CPUs or takes too long to return results. What makes this entire system possible is the application of a small piece of software called a client. This client handles the data retrieval and submission stages as well as the 4
  • 5. code necessary to instruct the CPU how to process the work unit. Clients vary in size, but most are less than 1-2 megabytes in size. The actual data work units also vary in size, but most are between 250 and 400 kilobytes, so that the hosting/node CPU can handle the process, and users on slower internet connections can easily send and receive the data. Between the small sizes of work units and clients, it seems unreasonable to see a disadvantage to using distributed computing other than the collection and analysis of data. 5
  • 6. 1.2 Distributed Computing: An OVERVIEW Many of you are quite familiar with the power of computers. They can record music, balance check books, play games with you, process and print your words, and open a whole encyclopaedia before your eyes. It's not a sweeping statement to say that computers have made our lives different. It's hard to envision a life before the age of computers. If a single computer can change your life, why not connect several of them? Over the past two decades, networks of computers have even further changed the way businesses operate and the way the government functions. From digital library card catalogs to the Think Quest Internet site you're at now, networks have been another great technological advancement. Distributed computing is the next steps in computer progress, where computers are not only networked, but also smartly distribute their workload across each computer so that they stay busy and don't squander the electrical energy they feed on. This setup rivals even the fastest commercial supercomputers built by companies like IBM or Cray. When you combine the concept of distributed computing with the tens of millions of computers connected to the Internet, you've got the fastest computer on Earth. 6
  • 7. Figure 1: Normal vs. Distributed Computing 7
  • 8. Chapter 2 ABOUT DISTRIBUTED COMPUTING Many tasks that we would like to automate by using a computer are of question–answer type i.e. We would like to ask a question and the computer should produce an answer. In theoretical computer science, such tasks are called computational problems. Formally, a computational problem consists of instances together with a solution for each instance. Instances are questions that we can ask, and solutions are desired answers to these questions. Theoretical computer science seeks to understand which computational problems can be solved by using a computer (computability theory) and how efficiently (computational complexity theory). Traditionally, it is said that a problem can be solved by using a computer if we can design an algorithm that produces a correct solution for any given instance. Such an algorithm can be implemented as a computer program that runs on a general-purpose computer: the program reads a problem instance from input, performs some computation, and produces the solution as output. The field of concurrent and distributed computing studies similar questions in the case of either multiple computers, or a computer that executes a network of interacting processes. 2.1 Models The discussion below focuses on the case of multiple computers, although many of the issues are the same for concurrent processes running on a single computer. Three viewpoints are commonly used: Parallel algorithms in shared-memory model: • All computers have access to a shared memory. The algorithm designer chooses the program executed by each computer. 8
  • 9. • One theoretical model is the parallel random access machines (PRAM) that are used. However, the classical PRAM model assumes synchronous access to the shared memory. • A model that is closer to the behaviour of real-world multiprocessor machines and takes into account the use of machine instructions, such as Compare-and-swap (CAS), is that of asynchronous shared memory. There is a wide body of work on this model, a summary of which can be found in the literature. Parallel algorithms in message-passing model: • The algorithm designer chooses the structure of the network, as well as the program executed by each computer. • Models such as Boolean circuits and sorting networks are used. A Boolean circuit can be seen as a computer network: each gate is a computer that runs an extremely simple computer program. Similarly, a sorting network can be seen as a computer network: each comparator is a computer. Distributed algorithms in message-passing model: • The algorithm designer only chooses the computer program. All computers run the same program. The system must work correctly regardless of the structure of the network. • A commonly used model is a graph with one finite-state machine per node. In the case of distributed algorithms, computational problems are typically related to graphs. Often the graph that describes the structure of the computer network is the problem instance. This is illustrated in the following example. An example: 9
  • 10. Consider the computational problem of finding a colouring of a given graph G. Different fields might take the following approaches: Centralized algorithms • The graph G is encoded as a string, and the string is given as input to a computer. The computer program finds a colouring of the graph, encodes the colouring as a string, and outputs the result. Parallel algorithms • Again, the graph G is encoded as a string. However, multiple computers can access the same string in parallel. Each computer might focus on one part of the graph and produce a colouring for that part. • The main focus is on high-performance computation that exploits the processing power of multiple computers in parallel. Distributed algorithms • The graph G is the structure of the computer network. There is one computer for each node of G and one communication link for each edge of G. Initially, each computer only knows about its immediate neighbours in the graph G; the computers must exchange messages with each other to discover more about the structure of G. Each computer must produce its own colour as output. • The main focus is on coordinating the operation of an arbitrary distributed system. While the field of parallel algorithms has a different focus than the field of distributed algorithms, there is a lot of interaction between the two fields. For example, the Cole–Vishkin algorithm for graph colouring was originally presented as a parallel algorithm, but the same technique can 10
  • 11. also be used directly as a distributed algorithm. Moreover, a parallel algorithm can be implemented either in a parallel system (using shared memory) or in a distributed system (using message passing). The traditional boundary between parallel and distributed algorithms (choose a suitable network vs. run in any given network) does not lie in the same place as the boundary between parallel and distributed systems (shared memory vs. message passing). 11
  • 12. 2.2 Complexity Measures In parallel algorithms, yet another resource in addition to time and space is the number of computers. Indeed, often there is a trade-off between the running time and the number of computers: the problem can be solved faster if there are more computers running in parallel. If a decision problem can be solved in polylogarithmic time by using a polynomial number of processors, then the problem is said to be in the class NC. The class NC can be defined equally well by using the PRAM formalism or Boolean circuits – PRAM machines can simulate Boolean circuits efficiently and vice versa. In the analysis of distributed algorithms, more attention is usually paid on communication operations than computational steps. Perhaps the simplest model of distributed computing is a synchronous system where all nodes operate in a lockstep fashion. During each communication round, all nodes in parallel (1) receive the latest messages from their neighbours, (2) perform arbitrary local computation, and (3) send new messages to their neighbours. In such systems, a central complexity measure is the number of synchronous communication rounds required to complete the task. This complexity measure is closely related to the diameter of the network. Let D be the diameter of the network. On the one hand, any computable problem can be solved trivially in a synchronous distributed system in approximately 2D communication rounds: simply gather all information in one location (D rounds), solve the problem, and inform each node about the solution (D rounds). On the other hand, if the running time of the algorithm is much smaller than D communication rounds, then the nodes in the network must produce their output without having the possibility to obtain information about distant parts of the network. In other words, the nodes must make 12
  • 13. globally consistent decisions based on information that is available in their local neighbourhood. Many distributed algorithms are known with the running time much smaller than D rounds, and understanding which problems can be solved by such algorithms is one of the central research questions of the field. Other commonly used measures are the total number of bits transmitted in the network. 13
  • 14. 2.3 Working of Distributed Computing In most cases today, a distributed computing architecture consists of very lightweight software agents installed on a number of client systems, and one or more dedicated distributed computing management servers. There may also be requesting clients with software that allows them to submit jobs along with lists of their required resources. An agent running on a processing client detects when the system is idle, notifies the management server that the system is available for processing, and usually requests an application package. The client then receives an application package from the server and runs the software when it has spare CPU cycles, and sends the results back to the server. The application may run as a screen saver, or simply in the background, without impacting normal use of the computer. If the user of the client system needs to run his own applications at any time, control is immediately returned, and processing of the distributed application package ends. This must be essentially instantaneous, as any delay in returning control will probably be unacceptable to the user. 2.3.1 Architecture Various hardware and software architectures are used for distributed computing. At a lower level, it is necessary to interconnect multiple CPUs with some sort of network, regardless of whether that network is printed onto a circuit board or made up of loosely-coupled devices and cables. At a higher level, it is necessary to interconnect processes running on those CPUs with some sort of communication system. Distributed programming typically falls into one of several basic architectures or categories: client–server, 3-tier architecture, n-tier architecture, distributed objects, loose coupling, or tight coupling. 14
  • 15. • Client–server: Smart client code contacts the server for data then formats and displays it to the user. Input at the client is committed back to the server when it represents a permanent change. • 3-tier architecture: Three tier systems move the client intelligence to a middle tier so that stateless clients can be used. This simplifies application deployment. Most web applications are 3-Tier. • N-tier architecture: n-tier refers typically to web applications which further forward their requests to other enterprise services. This type of application is the one most responsible for the success of application servers. • Tightly coupled (clustered): refers typically to a cluster of machines that closely work together, running a shared process in parallel. The task is subdivided in parts that are made individually by each one and then put back together to make the final result. • Peer-to-peer: an architecture where there is no special machine or machines that provide a service or manage the network resources. Instead all responsibilities are uniformly divided among all machines, known as peers. Peers can serve both as clients and servers. • Space based: refers to an infrastructure that creates the illusion (virtualization) of one single address-space. Data are transparently replicated according to application needs. Decoupling in time, space and reference is achieved. Another basic aspect of distributed computing architecture is the method of communicating and coordinating work among concurrent processes. Through various message passing protocols, processes may communicate directly with one another, typically in 15
  • 16. a master/slave relationship. Alternatively, a "database-centric" architecture can enable distributed computing to be done without any form of direct inter-process communication, by utilizing a shared database. Client-Server architecture: Advantages: • In most cases, client–server architecture enables the roles and responsibilities of a computing system to be distributed among several independent computers that are known to each other only through a network. This creates an additional advantage to this architecture: greater ease of maintenance. For example, it is possible to replace, repair, upgrade, or even relocate a server while its clients remain both unaware and unaffected by that change. • All data is stored on the servers, which generally have far greater security controls than most clients. Servers can better control access and resources, to guarantee that only those clients with the appropriate permissions may access and change data. • Since data storage is centralized, updates to that data are far easier to administer in comparison to a P2P paradigm. In the latter, data updates may need to be distributed and applied to each peer in the network, which is both time-consuming and error-prone, as there can be thousands or even millions of peers. • Many mature client–server technologies are already available which were designed to ensure security, friendliness of the user interface, and ease of use. 16
  • 17. • It functions with multiple different clients of different capabilities. Disadvantages: • As the number of simultaneous client requests to a given server increases, the server can become overloaded. Contrast that to a P2P network, where its aggregated bandwidth actually increases as nodes are added, since the P2P network's overall bandwidth can be roughly computed as the sum of the bandwidths of every node in that network. • The client–server paradigm lacks the robustness of a good P2P network. Under client server, should a critical server fail, clients’ requests cannot be fulfilled. In P2P networks, resources are usually distributed among many nodes. Even if one or more nodes depart and abandon a downloading file, for example, the remaining nodes should still have the data needed to complete the download. Fig 2: Client Server Architecture 17
  • 18. 2.4 Algorithms Various algorithms are used for distributed systems. Some of them are as follows: 1. Bully Algorithm 2. Byzantine Fault Tolerance 3. Algorithms based on Clock synchronisation 4. Lamport Ordering 5. Algorithms based on Mutual exclusion 6. Snapshot algorithm 7. Algorithms based on Detection of process termination 8. Vector clocks In this paper, we are going to concentrate more on BULLY ELECTION ALGORITHM. The bully algorithm is a method in distributed computing for dynamically selecting a coordinator by process ID number. The Bully Algorithm was devised by Garcia-Molina in 1982. When a process notices that the coordinator is no longer responding to requests, it initiates an election. Process P, holds an election as follows: 1) P sends an ELECTION message to all processes with higher numbers. 2) If no one responds, P wins the election and becomes coordinator. 3) If one of the higher-ups answers, it takes over. P's job is done. At any moment, a process can get an ELECTION message from one of its lower-numbered colleagues. When such a message arrives, the receiver sends an OK message back to the sender to indicate that it is alive and will take over. The receiver then holds an election, unless it is already holding one. Eventually, all processes give up but one and that one is the new coordinator. It announces its victory by sending all 18
  • 19. processes a message telling them that starting immediately it is the new coordinator. If a process that was previously down comes back up, it holds an election. If it happens to be the highest-numbered process currently running, it will win the election and will take over the coordinator's job. Thus the biggest guy in town always wins, hence the name "Bully Algorithm". The Bully Algorithm was devised by Garcia-Molina in 1982. When a process notices that the coordinator is no longer responding to requests, it initiates an election. Process P, holds an election as follows: Algorithm: Selects the process with the largest identifier as the coordinator: 1. When a process p detects that the coordinator is not responding to requests, it initiates an election: a. p sends an election message to all processes with higher numbers b. If nobody responds, then p wins and takes over. c. If one of the processes answers, then p's job is done. 2. If a process receives an election message from a lower numbered process at any time, It a. Sends an OK message back. b. Holds an election (unless it’s already holding one). 3. A process announces its victory by sending all processes a message telling them that is the new coordinator. 4. If a process that has been down recovers, it holds an election. Process 4 holds an election 19
  • 20. Processes 5 and 6 respond, telling 4 to stop Now 5 and 6 hold an election • The election of coordinator p2, • after the failure of p4 and the FIG 3: Election process 1 20
  • 21. 21
  • 22. FIG 4: Election process 2 Hence, in bully algorithm always the highest-numbered process wins over the co-coordinator as explained above. 22
  • 23. Chapter 3 Characteristics of distributed computing So far the focus has been on designing a distributed system that solves a given problem. A complementary research problem is studying the properties of a given distributed system. The halting problem is an analogous example from the field of centralised computation: we are given a computer program and the task is to decide whether it halts or runs forever. The halting problem is undesirable in the general case, and naturally understanding the behaviour of a computer network is at least as hard as understanding the behaviour of one computer. However, there are many interesting special cases that are decidable. In particular, it is possible to reason about the behaviour of a network of finite-state machines. One example is telling whether a given network of interacting (asynchronous and non-deterministic) finite-state machines can reach a deadlock. This problem is PSPACE-complete, i.e., it is decidable, but it is not likely that there is an efficient (centralised, parallel or distributed) algorithm that solves the problem in the case of large networks. 3.1Fallacies of Distributed computing: In 1994, Peter Deutsch, a sun fellow at the time, drafted 7 assumptions architects and designers of distributed systems are likely to make, which prove wrong in the long run - resulting in all sorts of troubles and pains for the solution and architects who made the assumptions. In 1997 James Gosling added another such fallacy. The assumptions are now collectively known as the "The 8 fallacies of distributed computing.” They are as follows: 23
  • 24. 1. The network is reliable. 2. Latency is zero. 3. Bandwidth is infinite. 4. The network is secure. 5. Topology doesn't change. 6. There is one administrator. 7. Transport cost is zero. 8. The network is homogeneous. 1. The network is reliable For one, if your application is a mission critical 365x7 kind of application, you can just hit that failure--and Murphy will make sure it happens in the most inappropriate moment. Nevertheless, most applications are not like that. So what's the problem? Well, there are plenty of problems: Power failures, someone trips on the network cord, all of a sudden clients connect wirelessly, and so on. If hardware isn't enough--the software can fail as well, and it does. The situation is more complicated if you collaborate with an external partner, such as an e-commerce application working with an external credit-card processing service. Their side of the connection is not under your direct control. Lastly there are security threats like DDOS attacks and the like. 2. Latency is zero The second fallacy of Distributed Computing is the assumption that "Latency is Zero". Latency is how much time it takes for data to move from one place to another (versus bandwidth which is how much data we can transfer during that time). Latency can be relatively good on a LAN-- but latency deteriorates quickly when you move to WAN scenarios or internet scenarios. Latency is more problematic than bandwidth. Here's a quote from a post by Ingo Rammer on latency vs. Bandwidth that illustrates this: "But I think that it’s really interesting to see that the end-to-end bandwidth increased by 1468 times within the last 11 years 24
  • 25. while the latency (the time a single ping takes) has only been improved tenfold. If this wouldn’t be enough, there is even a natural cap on latency. The minimum round-trip time between two points of this earth is determined by the maximum speed of information transmission: the speed of light. At roughly 300,000 kilometres per second (3.6 * 10E12 tera angstrom per fortnight), it will always take at least 30 milliseconds to send a ping from Europe to the US and back, even if the processing would be done in real time." You may think all is okay if you only deploy your application on LANs. However even when you work on a LAN with Gigabit Ethernet you should still bear in mind that the latency is much bigger then accessing local memory Assuming the latency is zero you can be easily tempted to assume making a call over the wire is almost like making a local calls--this is one of the problems with approaches like distributed objects, that provide "network transparency"--alluring you to make a lot of fine grained calls to objects which are actually remote and expensive (relatively) to call to. Taking latency into consideration means you should strive to make as few as possible calls and assuming you have enough bandwidth (which will talk about next time) you'd want to move as much data out in each of this calls. 3. Bandwidth is infinite The next Distributed Computing Fallacy is "Bandwidth Is Infinite." This fallacy, in my opinion, is not as strong as the others. If there is one thing that is constantly getting better in relation to networks it is bandwidth. However, there are two forces at work to keep this assumption a fallacy. One is that while the bandwidth grows, so does the amount of information we try to squeeze through it. VoIP, videos, and IPTV are some of the newer applications that take up bandwidth. Downloads, richer UIs, and reliance on verbose formats (XML) are also at work-- especially if you are using T1 or lower lines. However, even when you think that this 10Gbit Ethernet would be more than enough, you may be 25
  • 26. hit with more than 3 Terabytes of new data per day (numbers from an actual project). The other force at work to lower bandwidth is packet loss. Bandwidth limitations direct us to strive to limit the size of the information we send over the wire. The main implication then is to consider that in the production environment of our application there may be bandwidth problems which are beyond our control. And we should bear in mind how much data is expected to travel over the wise. 4. The Network is Secure Peter Deutsch introduced the Distributed Computing Fallacies back in 1991. You'd think that in the 15 years since then that "the Network is secure" would no longer be a fallacy. Unfortunately, that's not the case-- and not because the network is now secure. No one would be naive enough to assume it is. As an architect you might not be a security expert--but you still need to be aware that security is needed and the implications it may have (for instance, you might not be able to use multicast, user accounts with limited privileges might not be able to access some networked resource etc.) 5. Topology doesn’t change The fifth Distributed Computing Fallacy is about network topology. "Topology doesn't change." That's right; it doesn’t--as long as it stays in the test lab. When you deploy an application in the wild (that is, to an organization), the network topology is usually out of your control. The operations team (IT) is likely to add and remove servers every once in a while and/or make other changes to the network. Lastly there are server and network faults which can cause routing changes. When you're talking about clients, the situation is even worse. There are laptops coming and 26
  • 27. going, wireless ad-hoc networks, new mobile devices. In short, topology is changing constantly. 6. There is one administrator The sixth Distributed Computing Fallacy is "There is one administrator". You may be able to get away with this fallacy if you install your software on small, isolated LANs (for instance, a single person IT "group" with no WAN/Internet). However, for most enterprise systems the reality is much different. The IT group usually has different administrators, assigned according to expertise--databases, web servers, networks, Linux, Windows, Main Frame and the like. This is the easy situation. The problem is occurs when your company collaborates with external entities (for example, connecting with a business partner), or if your application is deployed for Internet consumption and hosted by some hosting service and the application consumes external services (think Mashups). In these situations, the other administrators are not even under your control and they may have their own agendas/rules. 7. Transport cost is zero On to Distributed Computing Fallacy number 7--"Transport cost is zero". There are a couple of ways you can interpret this statement, both of which are false assumptions. One way is that going from the application level to the transport level is free. This is a fallacy since we have to do marshalling (serialize information into bits) to get data unto the wire, which takes both computer resources and adds to the latency. Interpreting the statement this way emphasizes the "Latency is Zero" fallacy by reminding us that there are additional costs (both in time and resources). The second way to interpret the statement is that the costs (as in cash money) for setting and running the network are free. This is also far from being true. There are costs--costs for buying the routers, costs for securing the network, costs 27
  • 28. for leasing the bandwidth for internet connections, and costs for operating and maintaining the network running. Someone somewhere will have to pick the tab and pay these costs. 8. The network is homogeneous The eighth and final Distributed Computing fallacy is "The network is homogeneous." Assuming this fallacy should not cause too much trouble at the lower network level as IP is pretty much ubiquitous (e.g. even a specialized bus like Infiniband has an IP-Over-IB implementation, although it may result in suboptimal use of the non-native IP resources. It is worthwhile to pay attention to the fact the network is not homogeneous at the application level. 3.2 Problems related to distributed Computing Traditional computational problems take the perspective that we ask a question, a computer (or a distributed system) processes the question for a while, and then produces an answer and stops. However, there are also problems where we do not want the system to ever stop. Examples of such problems include the dining philosophers’ problem and other similar mutual exclusion problems. In these problems, the distributed system is supposed to continuously coordinate the use of shared resources so that no conflicts or deadlocks occur. There are also fundamental challenges that are unique to distributed computing. The first example is challenges that are related to fault- tolerance. Examples of related problems include consensus problems, Byzantine fault tolerance, and self-stabilisation. A lot of research is also focused on understanding the asynchronous nature of distributed systems: 28
  • 29. • Synchronizers can be used to run synchronous algorithms in asynchronous systems. • Logical clocks provide a causal happened-before ordering of events. • Clock synchronization algorithms provide globally consistent physical time stamps. 3.3 Distributed Computing Application Characteristics Obviously not all applications are suitable for distributed computing. The closer an application gets to running in real time, the less appropriate it is. Even processing tasks that normally take an hour are two may not derive much benefit if the communications among distributed systems and the constantly changing availability of processing clients becomes a bottleneck. Instead you should think in terms of tasks that take hours, days, weeks, and months. Generally the most appropriate applications, according to Entropia, consist of "loosely coupled, non-sequential tasks in batch processes with a high compute-to-data ratio." The high compute to data ratio goes hand-in-hand with a high compute-to-communications ratio, as you don't want to bog down the network by sending large amounts of data to each client, though in some cases you can do so during off hours. Programs with large databases that can be easily parsed for distribution are very appropriate. Clearly, any application with individual tasks that need access to huge data sets will be more appropriate for larger systems than individual PCs. If terabytes of data are involved, a supercomputer makes sense as communications can take place across the system's very high speed backplane without bogging down the network. Server and other dedicated system clusters will be more appropriate for other slightly less data intensive applications. For a distributed application using numerous PCs, the required data should fit very comfortably in the PC's memory, with lots of room to spare. 29
  • 30. Taking this further, United Devices recommends that the application should have the capability to fully exploit "coarse-grained parallelism," meaning it should be possible to partition the application into independent tasks or processes that can be computed concurrently. For most solutions there should not be any need for communication between the tasks except at task boundaries, though Data Synapse allows some interprocess communications. The tasks and small blocks of data should be such that they can be processed effectively on a modern PC and report results that, when combined with other PC's results, produce coherent output. And the individual tasks should be small enough to produce a result on these systems within a few hours to a few days. 30
  • 31. Chapter 4 APPLICATIONS There are two main reasons for using distributed systems and distributed computing. First, the very nature of the application may require the use of a communication network that connects several computers. For example, data is produced in one physical location and it is needed in another location. Second, there are many cases in which the use of a single computer would be possible in principle, but the use of a distributed system is beneficial for practical reasons. For example, it may be more cost- efficient to obtain the desired level of performance by using a cluster of several low-end computers, in comparison with a single high-end computer. A distributed system can be more reliable than a non- distributed system, as there is no single point of failure. Moreover, a distributed system may be easier to expand and manage than a monolithic uniprocessor system. 4.1 Types of Distributed Computing Applications Following are some of the areas where distributed computing is being used: I. Telecommunication networks • Telephone networks and cellular networks • Computer networks such as the Internet. • Wireless sensor networks. • Routing algorithms. II. Network applications: • World Wide Web and peer-to-peer networks. 31
  • 32. • Massively multiplayer online games and virtual reality communities. • Distributed databases and distributed database management systems. • Network files systems. III. Distributed information processing systems such as banking systems and airline reservation systems: • Real-time process control • Aircraft control systems • Industrial control system IV. Parallel computation: • Scientific computing, including cluster computing and grid computing and various volunteer computing projects; see the list of distributed computing projects • Distributed rendering in computer graphics. 4.2 Security and Standard Challenges The major challenges come with increasing scale. As soon as you move outside of a corporate firewall, security and standardization challenges become quite significant. Most of today's vendors currently specialize in applications that stop at the corporate firewall, though Avaki, in particular, is staking out the global grid territory. Beyond spanning firewalls with a single platform, lies the challenge of spanning multiple firewalls and platforms, which means standards. Most of the current platforms offer high level encryption such as Triple DES. The application packages that are sent to PCs are digitally signed, to make sure a rogue application does not infiltrate a system. Avaki comes with its own PKI (public key infrastructure), for example. Identical application packages are 32
  • 33. typically sent to multiple PCs and the results of each are compared. Any set of results that differs from the rest becomes security suspect. Even with encryption, data can still be snooped when the process is running in the client's memory, so most platforms create application data chunks that are so small, that it is unlikely snooping them will provide useful information. Avaki claims that it integrates easily with different existing security infrastructures and can facilitate the communications among them, but this is obviously a challenge for global distributed computing. 33
  • 34. CONCLUSION Distributed computing utilizes a network of many computers, each accomplishing a portion of an overall task, to achieve a computational result much more quickly than with a single computer. In addition to a higher level of computing power, distributed computing also allows many users to interact and connect openly. Different forms of distributed computing allow for different levels of openness, with most people accepting that a higher degree of openness in a distributed computing system is beneficial. 34
  • 35. REFERENCES [1] http://www.wisegeek.com/what-is-distributed-computing.htm [2] http://www.scribd.com/doc/6919757/BULLY-ALGORITHM [3] http://www.extremetech.com/article2/0,2845,1153023,00.asp [4] www.informatics.sussex.ac.uk/courses/dist-sys/ 35