Java Abs Peer To Peer Design & Implementation Of A Tuple Space

PEER TO PEER DESIGN &
IMPLEMENTATION OF A
TUPLE SPACE
Coordination between nodes within distributed systems is a complex problem and
a current focus of research. It needs to take into account issues of performance,
scalability, dependability and heterogeneity.
One interesting method of coordination that can be utilised is a decentralised tuple
space layer built on top of a peer to peer network. Potentially this solution could be
more efficient, flexible, robust and scalable than other coordination
implementations.
The goal of this project is to investigate this assertion by implementing a
distributed and fully decentralised tuple space co-ordination layer on top of a peer
to peer network.
It will provide a virtual shared space that can be accessed by any computer node
within a peer to peer network regardless of its physical location. Nodes within the
network should be able to post data to the shared space and retrieve data from the
shared space based on the content of the data
The overall goal of this project is to create a distributed system which implements
a tuple space over a peer to peer network. Tuple spaces are a major area of
research within the field of distributed computing at the present moment. Their
main primary concern is the coordination of multiple heterogeneous computers in
geographically remote locations in order to achieve a common task.
Communication is achieved through the exchange of tuples in the tuple space,
rather than direct communication between nodes. This is known as asynchronous
and decoupled communication. This could be useful, for example, in an mobile
environment, where there are not guarantees of an ‘always-on’ service. It is also
concerned with achieving these interactions in a scalable, robust and efficient way.

PROJECT TERMINOLOGY
A tuple space is an example of shared associative memory that provides a
repository for bags of tuples (a tuple is a typed set of values - see figure 1.1).
Unlike physical memory where data is stored by its address, a tuple space is
associative in that tuples are stored and retrieved by its content or by its type.
An important distinction is a that it is logically shared memory rather than physically
shared. This means that the tuples could be distributed over a set of nodes. The
tuple space simply provides the necessary abstraction for higher level applications.
A tuple space provides decoupled asynchronous communication between nodes in
a network i.e. for a node to communicate data to another node, it does not have to
establish a permanent connection.
Tanenbaum (2002) states that a “distributed system is a collection of independent
computers that appears to its users as a single coherent system”[1].

This definition works well within the context of the tuple space paradigm as a tuple
space provides a single entry point into a distributed system; higher level
applications do not need to concern themselves with the implementation of the
distributed system underneath

Problem definition
Developing a tuple space over an underlying peer to peer network provides a
number of interesting challenges, namely:
• How does the tuple space decide where the tuples are stored within the system
in such a way they can reliably and efficiently be retrieved?
• How can a flexible solution be provided that will adapt well to many different
application level problems?
• How to separate the different concerns of the system into various components?

AIMS OF PROJECT
The primary aim of this project is to develop a tuple space layer coordination layer
and to investigate the potential robustness of this solution.
The scalability of the tuple space implementation could also be investigated.
However this is considered to be outside the scope of the project as it would need
considerable time and resources.
A secondary aim is to investigate how this implementation can be mapped on top
of a peer to peer network, more specifically a Chord open network overlay. The
final aim is to investigate how this implementation can be integrated into the Gridkit
architecture as a plug-in for the interaction framework.

Primary aims
• To investigate the role of peer to peer technology in supporting decentralised tuple
space operations.
• To determine an efficient mapping between tuple spaces and Chord like distributed
hash table data structures.
• To investigate issues of flexibility within the system: i.e. how to provide a flexible
solution to the application level without sacrificing other factors.
• To investigate issues performance of the system.
• To investigate how multi-dimensional data can be efficiently retrieved from the tuple
space.

Secondary aims
Consideration will be given to these aims during design, implementation and evaluation of
the system. However they may not necessarily be covered in depth due to the time
constraints of the project.
• To investigate a component based approach to lie within the Gridkit middleware
architecture.
• Use this to determine what this can provide it in configurability and re-configurability
i.e. how the system can be adapted to the application-level’s needs.
• To investigate the scalability and robustness of the solution

Existing Tuple Space Systems
Existing tuple space systems can be classed into two different types : client-server
based and peer to peer based
Existing peer to peer based systems, various tuple space implementations (both
client-server based and peer to peer based) currently available, different methods
available for constructing a peer to peer tuple space implementations and provides
a look into the Gridkit and Open COM architecture.

Peer to Peer systems
To first understand the requirements of this project, a look into existing peer to peer networks and
their properties will be needed. The motivation for using peer to peer technology as a method of
developing this system will also be considered.
A peer to peer network is one in which all nodes (known as ‘peers’) in the network are equal; there
is no single point of failure. Research into peer to peer networks is focused on how to both store
and find data within the networks. There are two main schools of thought, structured and
unstructured peer to peer networks.

Gnutella
Gnutella is an open source, fully decentralised peer to peer network, originally developed by
Nullsoft. It is an example of an unstructured peer to peer network providing methods for distributed
searches.
Gnutella used a method called query flooding, which although provided scalability, was inefficient
and did not provide guaranteed lookup results.
Gnutella also provided high fault tolerance, due to its method of sending queries out to every
active node that it is connected to. This ensures that the query propagates its way though the
network even if connected nodes have failed.

Distributed Hash Tables (DHT’S)
Distributed Hash Table have been designed to provide efficient and guaranteed lookups and
reliable resource discovery whilst providing the scalability of solutions such as Gnutella. They work
by partitioning a set of keys and their respective values over a number of nodes within a network.
DHT’s can efficiently route messages to a unique owner of a particular key. Most DHT’s use
consistent hashing to map keys to nodes. For example a key is mapped using a certain hash
function to a certain ID, then some mechanism is used to route that key to the node that is
responsible for it.
The value of that key could then be retrieved by hashing on the key, as it will produce exactly the
same ID. To route messages in a DHT, a routing table is used, which contains a set of links of
nodes that are close to it, these in turn form an overlay network. There are many different DHT
implementations examples being Chord[3] and CAN[4].

Chord
Stoica, Morris et al presented a DHT implementation called Chord(2001). Chord envisaged the
nodes in the overlay network as being conceptually joined in a circle using a type of doubly-linked
list. Chord provided lookups in the network using only log(n) messages, n being the number of
nodes in the network.
Chord introduced the notion of successors and predecessors. The node in which has an ID that
succeeds the key is responsible for providing storage for that key. If that node was to leave the
network, it would be moved to the next successor. This method ensures a high level of robustness
whilst at the same time minimising the load placed on nodes, the network adapts itself and
distributes the keys to the changing topology of the network(i.e. joining and leaving nodes).
Each node maintains a routing table with details of nodes logically close to it, for routing Chord
messages. This makes it practical to scale to many nodes. Figure 2.1 shows an example Chord
identifier circle with 3 nodes. Key 1 is located at node 1, key 2 is located at node 2 and key 6 is
located at node 0.

Motivation for using DHT peer to peer technology
Peer to peer networks have a number of interesting properties over traditional communication
models such as client-server. They can be more scalable then their client-server counterparts as
there is no single bottleneck i.e. a central server for the peers in the system to communicate with.
They also can be more robust in terms of both searching data and the storage of data. This is due
to the decentralised operation of servers and possibilities of distributed replication of files across
the network. These factors potentially provide a system with greater availability than existing
approaches.
The DHT variant can provide this functionality combined with efficient guaranteed lookups.

Existing Tuple Space Systems
Existing tuple space systems can be classed into two different types : client-server
based and peer to peer based. This section will present the motivations behind
investigating a peer to peer approach.

Linda Spaces[5]
The tuple space concept was first introduced by Gelernter(1985). It was developed with the
concept of coordination within parallel programming in mind and was designed as an extension to
existing programming languages.
It pioneered the concept of using a logical shared associative memory space to store operations
and the use of the three tuple operations to write, read and destructively read tuples from the tuple
space. More recently the concept has been adapted for use in coordination within distributed
environments.
It also developed the concept of using ‘template tuples’ to provide lookups within the tuple space.
Template tuples can provide all or some of the values required to retrieve tuples from the tuple
space. They also specify the use of wildcard and range searches to provide flexibility for retrieving
tuples.

Client-Server based Systems
Many of the Tuple Space systems currently available are based on the client-server model. Java
Spaces[6] within the JINI technology platform and TSpaces[7] from IBM are examples of this
approach.
The advantage of this model is in its simplicity, it does not have the problems of coordinating the
system over a set of distributed nodes. The primary disadvantage of the client-server model is that
it provides a single point of failure and may place a high load on the server.
This two problems affect the respective systems potential of scalability, something in which
decentralised tuple space systems are being designed to address.

Motivation for developing a decentralised peer to peer tuple space
The previous section detailed some of the reasons for using peer to peer technology in this
project. Notably due the potential of greater scalability, performance and robustness. What needs
to be considered is the motivation for developing a tuple space over peer to peer technology.
The tuple space paradigm, is at its most useful when used in environments with a large number of
geographically dispersed nodes which have intermittent availability. Therefore the traditional
client-server paradigm does not make sense as it would be difficult to enable this sort of
functionality. Peer to peer technology lends it self well to this functionality, and combined with its
other characteristics it makes for an interesting platform.

EXISTING PEER TO PEER TUPLE SPACE SYSTEMS
More recently researchers have been looking to decentralise the operation of tuple
spaces as a way of improving aspects of availability and scalability. This section
will detail some of the approaches that have been considered.

Comet - Li and Parashar(2005) present a system called Comet. Comet makes use of a Hilbert
Space Filling curve mapping to map tuples onto underlying chord nodes. Space filling curves will
be described in more detail following this section, however they basically provide a multi-
dimension to singular-dimension mapping. The Hilbert Space Filling curve is a locality preserving
function in that contextually similar tuple are grouped together. This improves the performance of
the system when looking up data and performing range or wildcard searches as similar tuples
should be grouped together on similar peers.

PeerGameSpace - Wang, Hsiao et al(2005) present a method of using ‘shortcuts’ within a
Chord network to point to various applicable tuples. However it is not made clear how many
shortcuts would be needed to retrieve tuples in multi-dimensional context or how efficient and
flexible range queries could be implemented. They have developed a simple peer to peer game to
run on top of this implementation.

Panda - Christian, Durate et al(2004) present an entirely different method in the Panda system.
Panda uses a two-tier approach to storing tuples with a tuple space, tuples are stored in the
underlying node that is responsible for the ‘signature’ of its tuple. The signature of the tuple being
the complete type of the tuple: an ordered list of the tuple type fields. Inside the hash table in the
node, the tuples are stored by a key that is hashed on the result of the ‘content’ of the data.

Pier - Although not a tuple space in the traditional sense; Pier presents a method of using
‘distributed joins’ and providing database querying techniques to lookup tuples in a layer
implemented on top of a DHT. It currently uses CAN as an underlying DHT overlay. The method
given is designed to be scalable to massively distributed systems.

SPACE FILLING CURVES
The previous section detailed an approach of using space filling curves to map tuple attributes to
th
Chord nodes. Space filling curves[12] were first described by 19 century mathematician Peano;
examples being Hilbert Space Filling curves and Z-Order curves.
They are ‘curves whose ranges contain the entire 2-dimensional unit square’ or in the case of
many dimensions, the ‘n-dimensional hypercube’. Space filling curves have been presented as a
method for mapping multi-dimensional data (i.e. n-dimensional tuples) into 1-dimension.
This makes them incredibly useful in a distributed context, as it provides a method of mapping
tuples onto a one-dimension Chord node.

Z-Order curves
Further research indicated the use of Z-Order curves for indexing and querying multi-dimensional
data in a database. This approach was pioneered by Tropf and Herzog(1981)[13]. Binary search
trees were presented as an efficient method of looking up data; supporting both exact and range
searching facilities.
More recently Chawathe, LaMarca et al(2005)[14] have proposed a method of using Z-Order
curves to map 2-dimensional geographical information tuples onto a Distributed Hash Table.
Binary prefix tries(a type of binary tree where each subsequent node searched leads to a certain
piece of data) are used to store and lookup data; tuples are only stored in leaf nodes.

GRIDKIT / OPENCOM ARCHITECTURE

Gridkit
Coulson, Grace, Blair et al(2004)[16] describe a middleware called Gridkit being developed for use
within grid computing. Grid middleware acts as an intermediary between the different components
within a distributed system; therefore allowing potential grid applications to make use of this
functionality without having to understand the underlying complexity.
Gridkit is a component based architecture that supports a number of middleware services such as
interaction types, resource discovery, resource management and security. These middleware
services are built on top of an ‘open overlays’ layer which in turn abstracts over the underlying
communications layer to provide various support e.g. peer to peer communication.

The main concerns for this project are interaction types and overlays. Overlay networks are
‘virtual’ networks layered over an underlying physical network. An example of an overlay network
is the Chord distributed hash table described previously. Interaction types are built on top the
overlay networks to provide a particular service desirable to higher level applications. Publish-
Subscribe and Tuple Spaces are examples of interaction types that could be used within the
Gridkit framework.

Component architecture
Gridkit makes use of OpenCOM v2(2004) [17] as its component object model. The main concept
that needs to be understood is that of interfaces, receptacles and connections. Interfaces describe
a unit of service provision, receptacles describe a unit of service requirement and connections
allow for the binding between components with receptacles and interfaces. This component model
will be used within the development of this system as it will provide a method of integrating the
tuple space system within Gridkit and the existing Chord/DHT components. OpenCOM however
does not take into account the distribution of the system (i.e. different components situated on
different nodes) therefore the Chord layer makes use of Java RMI, to provide a method of
invocating methods on components situated on different nodes.

Motivation for using component architecture
There are many advantages as to the use of a Gridkit/component based architecture as described
above. Namely it creates the possibility of configurable/reconfigurable solution as per to the aims.
The tuple space does not have to be necessarily dependent on a single overlay network(such as
Chord), a different component could be selected depending on the needs of the system. Similarly,
differing components could also be created to represent a variety of tuple space algorithms which
could be adaptive to the needs of the application-level.

Java Abs Peer To Peer Design & Implementation Of A Tuple Space

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (7)

Similar to Java Abs Peer To Peer Design & Implementation Of A Tuple Space

Similar to Java Abs Peer To Peer Design & Implementation Of A Tuple Space (20)

More from ncct

More from ncct (20)

Recently uploaded

Recently uploaded (20)

Java Abs Peer To Peer Design & Implementation Of A Tuple Space