Dynamic System Configuration using SOA

Saxion Hogeschool Enschede

Thesis
Dynamic System Configuration using SOA

Version 1.0

Contractors: Supervisors:
Jeroen Rosenberg Richard van der Laan
jeroen.rosenberg@luminis.nl richard.vanderlaan@luminis.nl

Lesley Wevers Ferenc Schopbarteld
lesley.wevers@luminis.nl ferenc.schopbarteld@nl.thalesgroup.com

Douwe van Twillert
d.a.vantwillert@saxion.nl

Abstract
Thales uses a static configuration to map software components to hardware compo-
nents. In case of hardware failures, this mapping has to be adapted manually to restore
the system. This requires the system to be inoperative for a significant amount of time,
which isn’t acceptable in the mission critical systems Thales builds. Thales feels they were
not technologically able to find a solution for this problem in the past, but they now see
an opportunity to tackle the problem using the principles and patterns of service oriented
architectures (SOA). To recover the system, processes which ran on failed processing nodes
could be moved to available processing nodes. A SOA layer has been defined on top of
the radar chain model to coordinate the process of restoring the system. This SOA layer is
realized using the SOA based OSGi framework and the R-OSGi extension.

Hengelo, December 22, 2009

Change log
Version Date Modiﬁcations
0.1 2009-02-09 Initial version
0.2 2009-03-13 Distributed Systems, SOA characteristics
0.3 2009-04-17 Case Study Radar Chain Case, SOA principles and patterns
0.4 2009-05-08 Introduction, Systematic Approach chapter
0.5 2009-05-25 Background, Assignment, Problem Analysis, Solution, Design
and Implementation
1.0 2009-06-01 Summary, Conclusion, Retrospective

1

Samenvatting
Deze scriptie is geschreven in het kader van een afstudeerproject bij luminis in de klantcontext
van Thales. Thales gebruikt zeer omvangrijke gedistribueerde systemen om de ingewikkelde
berekeningen uit te voeren die te pas komen bij het verwerken van radarsignalen. De mapping
van software componenten op hardware componenten in deze radarketen vindt plaats op basis
van een statische configuratie. Als er componenten uitvallen in het systeem of er veranderingen
in de configuratie plaatsvinden, dient deze mapping in de huidige situatie handmatig aangepast
te worden, waardoor het systeem voor een significant tijdsbestek inoperatief is. Dit is onac-
ceptabel in de kritieke systemen die Thales gebruikt, zoals de hierboven beschreven radarketen.
Derhalve is in dit afstudeerproject onderzoek gedaan naar de mogelijkheden van service oriented
architecture (SOA) om (her)configuratie dynamisch te laten plaatsvinden, waarbij de focus lag
op de representatieve Thales radarketen.

In een gedistribueerd systeem, zoals de radarketen, kunnen er hele andere problemen optreden
dan in een volledig lokaal systeem. Een slecht ontworpen gedistribueerd systeem kan volledig plat
komen te liggen doordat er een component uit is gevallen. Componenten dienen te allen tijde
beschikbaar te zijn en zo min mogelijk last te hebben van het uitvallen van andere componenten.
Met deze gegevens dient er in het ontwerp van een dynamisch configureerbaar gedistribueerd
systeem rekening gehouden te worden. Bij het ontwerpen van het systeem passen we een aantal
principes en patterns van SOA toe. SOA is een architectueel paradigma binnen software design
dat gebaseerd is op samenwerkende services die een bepaalde taak uitvoeren. Een aantal SOA
patterns lossen verschillende subproblemen op die we tegenkomen bij het ontwerpen van een
dynamisch configureerbaar gedistribueerd systeem. Het lookup pattern helpt bij het vinden van
beschikbare services; met behulp van het leasing pattern kan gedetecteerd worden of services
inactief worden en met het whiteboard pattern kan de levenscyclus van componenten consequent
beheert worden. In een logisch ontwerp defini¨ren we een aantal SOA services om dynamische
e
configuratie mogelijk te maken.

Voor de implementatie van een proof-of-concept is gebruik gemaakt van het op SOA gebaseerde
OSGi framework in combinatie met R-OSGi, een extensie van OSGi. OSGi biedt ons een aantal
van de benodigde faciliteiten die gedefiniërd zijn in het logisch ontwerp. Zo zorgt de OSGi
e
Module layer dat de afzonderlijke componenten ongevoelig zijn voor het falen van andere com-
ponenten; de Life cycle layer voor dynamisch beheren van de levenscyclus van componenten en
de WireAdmin voor het dynamisch configureren van de verbindingen tussen componenten.

R-OSGi implementeert het Service Location Protocol (SLP), waardoor het ook mogelijk is om
services op andere processing nodes binnen een netwerk te lokaliseren en te gebruiken. Daar-
naast biedt R-OSGi de zogeheten RemoteEvents welke door middel van broadcasting kunnen
worden verzonden om andere services binnen een netwerk op de hoogte te stellen van bepaalde
gebeurtenissen, zoals het wegvallen van een zekere service. Het logisch ontwerp is vertaald naar
een oplossing binnen het OSGi model.

In eerste instantie is er een implementatie gemaakt van een vereenvoudigde weergave van de
Thales radarketen, waarin faal scenario’s gesimuleerd konden worden. Vervolgens is het logisch
ontwerp ge¨ ımplementeerd bovenop het OSGi framework, zodat het systeem dynamisch confi-
gureerbaar was binnen een lokale machine. Tenslotte is het systeem aangepast met behulp van
R-OSGi, waardoor herconfiguratie ook mogelijk was in een gedistribueerde omgeving.

2

Summary
This thesis is written as a part of the graduation internship at luminis in the client context of
Thales. Thales uses very large distributed systems to make complicated computations which are
needed for the processing of radar signals. The mapping of software components to hardware
components in this radar chain is based on a static configuration which has to be adapted man-
ually in case of failures. As a result the system could be inoperative for a significant amount of
time, which isn’t acceptable in the critical systems Thales uses, such as the so-called radar chain.
In this project research to the possibilities of service oriented architecture (SOA) for dynamic
(re)configuration has been carried out while we focused on the representative Thales radar chain.

A distributed system such as the radar chain can pose quite different problems than a fully local
system. A poorly designed distributed system could crash completely if one component has
failed. Components should be available at all times and be fault-tolerant with regard to failures
of other components. These aspects have to be taken into account while designing a dynamic
configurable distributed system. In the design of the system several principles and patterns of
SOA are applied. SOA is an architectural paradigm in software design based on interoperable
services which perform a certain task. Several subproblems we face while designing the system
are solved by SOA patterns. The lookup pattern allows finding available services; the leasing
pattern provides detection of services becoming inactive and the whiteboard pattern allows con-
sequent life cycle management of components. In a logical design a set of SOA services is
defined to allow dynamic configuration.

The SOA-based OSGi framework and the R-OSGi extension have been used to implement a
proof-of-concept. OSGi provides several capabilities defined in the logical design. The OSGi
Module layer allows components to be fault-tolerant; the Life cycle layer provides dynamic life
cycle management and the WireAdmin allows dynamic wiring between components.

R-OSGi implements the Service Location Protocol (SLP) which allows finding and using re-
mote services within a network. Additionally, R-OSGi provides RemoteEvents which could be
broadcasted to notify other services within a network of certain events, such as the failure of a
particular service. The logical design is translated to a solution within the OSGi model.

Firstly, a simplified model of the Thales radar chain is implemented, as such fail scenarios could
be simulated. Secondly, the logical design is implemented on top of the OSGi framework so
the system was dynamic configurable on a local machine. Finally, the system has been adapted
using R-OSGi for integration in distributed environments.

3

Preface
This thesis is written as part of the documentation of the graduation internship of Jeroen Rosen-
berg and Lesley Wevers. This internship is part of the study in computer science provided by
Saxion Hogescholen in Enschede, the Netherlands. The thesis work has been carried out from
February 2009 to June 2009 at the Surface Radar Department of Thales Hengelo, and consisted
of one paper and a developed prototype as a proof-of-concept. This thesis marks the end of our
study.

The following people are acknowledged for their assistance: Ing. D.A. van Twillert, R. van der
Laan, F. Schopbarteld and R. van Hees. We wish to take this opportunity to express our grati-
tude to Ir. J.W.M. Stroet and Mr. dr. H.J.A. Mentink for support, advice and encouragement
through ups and downs.

Hengelo, June 2009

Lesley Wevers
Jeroen Rosenberg

4

CONTENTS CONTENTS

Contents
Samenvatting 2

Summary 3

Preface 4

1 Introduction 9
1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Client and organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Document structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

I Problem Analysis & Assignment 11

2 Problem analysis 12
2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Case Study: The Thales Radar Chain . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 System context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.2 System components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Problem deﬁnition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Assignment 18
3.1 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Study scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.1 Solution criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.2 Outside the scope of this study . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

II Literature Study 20

4 Distributed Systems 21
4.1 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Challenges and Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5 Service Oriented Architectures 24
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2 Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.3 Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5

CONTENTS CONTENTS

6 OSGi 33
6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.2 Module Layer and Fault-Tolerance . . . . . . . . . . . . . . . . . . . . . . . . 34
6.3 Lifecycle Layer and Dynamic Life Cycle Management . . . . . . . . . . . . . . 34
6.4 Service Layer and Service Discovery . . . . . . . . . . . . . . . . . . . . . . . 35
6.5 Event Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.6 Wiring of Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.7 R-OSGi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.7.1 Remote Service Discovery . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.7.2 Using Remote Services through Dynamic Proxies . . . . . . . . . . . . 37
6.7.3 Remote Event Handling . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

III Solution Approach & Analysis, System Design & Implementation 40

7 Solution approach 41
7.1 Analysing a naive solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
7.2 Solution proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
7.3 Prototype considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
7.4 Conlusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

8 Solution analysis 43
8.1 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
8.1.1 System instantiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
8.1.2 Restoring the system . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
8.2 Use-case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
8.2.1 Use-case diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
8.2.2 Actors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
8.2.3 Administrator use-cases . . . . . . . . . . . . . . . . . . . . . . . . . . 47
8.2.4 Conﬁguration system use-cases . . . . . . . . . . . . . . . . . . . . . . 47
8.3 Secundairy use-cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
8.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

9 System design 50
9.1 Design challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
9.2 Service decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
9.3 Service interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
9.4 Service capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
9.5 Service descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
9.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

10 System implementation 55
10.1 Process and link implementations . . . . . . . . . . . . . . . . . . . . . . . . . 55
10.1.1 Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
10.1.2 Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
10.1.3 Demonstration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
10.2 Conﬁguration system local implementation . . . . . . . . . . . . . . . . . . . . 57

6

LIST OF FIGURES LIST OF FIGURES

10.2.1 OSGi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
10.2.2 OSGi service mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
10.2.3 OSGi bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
10.2.4 Service implementations . . . . . . . . . . . . . . . . . . . . . . . . . . 59
10.2.5 Demonstration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
10.3 R-OSGi integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
10.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

IV Conclusion & Recommendations 63

11 Conclusion 64
11.1 System recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
11.2 Service oriented architecures . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
11.3 OSGi and R-OSGi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
11.4 Final conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

12 Recommendations 67
12.1 Applying the study results to the O2 framework . . . . . . . . . . . . . . . . . 67
12.2 Remove single point of failures . . . . . . . . . . . . . . . . . . . . . . . . . . 67
12.3 Code provisioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
12.4 Dynamic conﬁguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

References 69

Glossary 70

Appendices 72

A Sequence diagrams 73
A.1 Service lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
A.2 Mapping service start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
A.3 Process service lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
A.4 Processing node start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
A.5 Processing node goes down . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
A.6 Software system speciﬁcation changed . . . . . . . . . . . . . . . . . . . . . . 78

List of Figures
1 A high-level overview of the Thales radar chain. . . . . . . . . . . . . . . . . . 12
2 A more detailed view of the software processing subsystem. . . . . . . . . . . . 13
3 OSGi Framework layering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4 Use-case diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5 Class diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6 Class diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7 Service lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
8 Mapping Service start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
9 Process Service lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

7

LIST OF FIGURES LIST OF FIGURES

10 Processing node start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
11 Processing node goes down . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
12 Software system speciﬁcation changed . . . . . . . . . . . . . . . . . . . . . . 78

8

1 INTRODUCTION

1 Introduction
1.1 Purpose
This document is a final report of our study during the graduation period. The main purpose
of this study is to validate the following thesis statement:

Thesis statement. The principles and patterns of service oriented architecture contribute to
implementing a system which can automatically restore the health of a software system instance
after it has become damaged due to processing nodes becoming unavailable.

1.2 Client and organization
The graduation assignment was commissioned by Thales and is performed in an intensive part-
nership with luminis under supervision of Ferenc Schopbarteld from Thales and Richard van der
Laan from luminis.

luminis is a free thinking and innovative company which has a wide range of services in the field
of consulting, coaching, training, application development and software engineering. Richard
van der Laan, the project supervisor on behalf of luminis, is a part of the Software Development
department, one of the six cores of luminis.

Thales operates in several market segments, from marine radar to eTicketing and security.
Within Thales, the software section of the Surface Radar / Technical Unit Processing business
unit is responsible for the development of software for use in radar and optronic systems for
naval and air defense applications.

1.3 Document structure
This document has been structured into four parts which cover different aspects of this study.
Each part consists of a number of chapters discussing related matters. Below, a general overview
is provided of the document structure, including a brief description of the content of the chapters.

• Part I: Problem Analysis, Assigment & Approach

– Problem Analysis
The motivation of the problem is explained, the terminology used in this thesis is
defined, the problem is defined and the research questions are introduced.
– Assignment
The goal of the project is defined and the scope of the study is defined by defining
the solution criteria and the assumptions which have been adopted during the study.

9

1.3 Document structure 1 INTRODUCTION

• Part II: Literature Study

– Distributed Systems
The main characteristics and common issues of a distributed system are described.
– Service Oriented Architecture
The principles and relevant patterns of Service Oriented Architecture are detailed.
– OSGi
The OSGi framework is explained and solutions of case related issues are provided.
Furthermore, the additional solutions provided by the R-OSGi framework are detailed.

• Part III: System Analysis, Design & Implementation

– Solution Approach
The problem is analysed and a functional solution to the problem is provided.
– Solution Analysis
Scenarios for the solution proposed in the previous chapter illustrate the workings of
the solution. Following, the scenarios are analysed using a use-case analysis.
– System Design
The design challenges posed by the use-cases are identiﬁed. A logical design of
cooperating services conquers these challenges.
– System Implementation
Describes the mapping of OSGi and R-OSGi to the logical design. Additionally, the
implementation of the prototype is explained and motivated.

• Part IV: Conclusion, Recommendations & Retrospective

– Conclusion
The main research question is answered, conclusions are drawn regarding the suit-
ability of service oriented architecture for dynamic reconﬁguration and the thesis
statement is validated.
– Recommendations
Recommendations regarding usage of service oriented architecture and future devel-
opment are made.

10

Part I
Problem Analysis & Assignment

11

2 PROBLEM ANALYSIS

2 Problem analysis
This chapter analyses the problem posed by Thales. First, the context of the problem is ex-
plained. Next, a case which captures the essence of the problem is analysed and the associated
terminology is defined. And finally, based on the case study the problem is defined along with
the main research question.

2.1 Motivation
The Surface Radar department of Thales has developed a generic middleware and service frame-
work which goes under the name of O2. Within the O2 framework, hardware systems and
software applications can be modelled using UML diagrams and XML. These models can be
read and validated by O2 to generate software components (C or Java) which can run on a
multitude of platforms. Besides software components, O2 is also able to generate hardware
components (VHDL) for applications demanding high-performance.

For processing radar signals, Thales uses distributed systems running their O2 framework. These
systems can contain hundreds of hardware boards on which distributed O2 applications can be
run. To instantiate the system, a configuration is defined which maps O2 software components
to the available hardware boards.

While the system is operational, it is not uncommon for hardware boards to fail. In case this
happens to one of the hardware boards running a crucial O2 component the whole system may
fail, possibly resulting in significant downtime of mission critical systems.

To repair the system in the current situation, a hardware board needs to be replaced and the
mapping has to be adapted to match the new hardware configuration. As this process may take
a while, the system may be down for a significant period of time. This is a serious problem in
mission-critical situations, a solution to this problem has to be found.

2.2 Case Study: The Thales Radar Chain
Thales has defined a case which captures the essence of the problem as described in the previous
section. The case is based on an existing O2 based system where the output of a physical radar
system is processed and transformed to a form which can be displayed on a radar screen. In this
section the case and the terminology will be defined to be used in later chapters.

2.2.1 System context
In a radar chain, a radar system generates data which has to be displayed on a radar screen in a
form that is understandable by humans. The complete system can be divided into a number of
subsystems. Figure 1 provides a high-level overview of the radar chain, showing the subsystems
and the flow of data between them.

Figure 1: A high-level overview of the Thales radar chain.

12

2.2 Case Study: The Thales Radar Chain 2 PROBLEM ANALYSIS

The first subsystem is the physical radar system itself. The radar system picks up an analog
signal of electromagnetic waves and transforms this into a digital signal which can be processed
to extract relevant information.

The radar system can generate hundreds of gigabytes of data per second which all needs to be
processed in realtime. At present this is a lot of data to handle by software running on general
purpose processors. Thales has chosen to reduce the data stream to a more manageable level
before moving to software processing. This first processing step is implemented in hardware as
this allows for much higher processing rates than software implementations.

After this initial processing step, at most a few megabytes of data per second remain to be
processed, which allows further processing to be performed in the software domain. Software
processing is performed on a distributed system to distribute the workload onto multiple pro-
cessing nodes. This part of the system will be referred to as the software processing subsystem.

The final step in the radar chain is to actually do something with the processed data. The data
can for example be visualized and send to a screen for an operator to view.

The problem, which is introduced in the next chapter, revolves mainly around the software
processing subsystem. The details of the other subsystem are not in the scope of this study
and they will therefore not further be discussed. The next section will continue by defining the
software processing system in more detail.

2.2.2 System components
As noted in the previous section, this case study will primarily focus on the software processing
subsystem. This section will define the elements of which this subsystem consists. Also, the
relations between these elements will be defined, together providing a global overview of the
architecture of this subsystem.

Figure 2: A more detailed view of the software processing subsystem.

The software processing subsystem can broadly be divided into two domains, the hardware
domain containing physical hardware, and the software domain containing the software elements.

13


Hardware domain
The software processing subsystem’s hardware domain consists of processing nodes which are
interconnected by physical connections. A processing node is a physical hardware board capable
of running processes from the software domain.

A processing node is considered to be available if it is able to host processes, or already hosting
processes. Otherwise, the processing node is considered to be unavailable. A processing node
may become unavailable at any time due to hardware failures or by an administrator turning the
system off.

A collection of interconnected processing nodes is called a hardware system. The topology of a
hardware system may change during operation of the system in the event of processing nodes
becoming available or unavailable.

For the purpose of this case, it can be assumed that all processing nodes within a hardware system
are able to communicate with eachother at all times. Processing nodes can’t get isolated by the
network failing and processing nodes are always connected to the network while they are in an
unavailable state.

Software domain
The software processing subsystem’s software domain consists of the non-physical elements that
make up a system. For the purpose of this case, the software domain consists of processes and
links which together form a software system.

• Process A process is a unit of software which can accept input on an input port and
which can produce output on an output port.

For every process in a software system, a process configuration is available. The process
configuration contains a name which unique identifies a process, and the name of the
runtime which implements the functionality of the process. Based on this configuration, a
process instance can be instantiated on a processing node. A process is considered to be
instantiated if a process instance exists for the process, otherwise the process is considered
to be uninstantiated. For the purpose of this case, it can be assumed that processes can
only be instantiated once.

A process instance can be destroyed to make a process uninstantiated. Also, if a pro-
cessing node becomes unavailable, the processes running on it get destroyed and become
uninstantiated.

• Link
A link is a connection between the output port of one process to the input port of another
process. A link allows a data stream to be set up between two processes. The process
producing the output is defined as the producer of the link, and the process accepting
data is defined as the consumer of the link.

14


For every link in a software system, a link configuration is available. This link configura-
tion specifies the name of the producer process and the name of the consumer process.
If the producer process is instantiated, a link instance can be instantiated by making the
producer process send its output to the address of the consumer process’s input port.

If a link instance is available for a link, the link can be considered to be instantiated,
otherwise the link is uninstantiated. A link can further be considered valid or invalid. If
both the producer process and the consumer process of a link are instantiated, the link is
valid. If one of both processes of the link is not instantiated, the link is invalid.

A link can be instantiated and invalid at the same time, in this case the producer process
of the link is still sending its output to the previous location of the consumer process, but
the consumer process is no longer instantiated. Furthermore, if an uninstantiated link is
valid, it can be instantiated to set up a data stream between the processes.

A link instance is considered to be healthy if the link it represents is valid, otherwise the
link instance is considered to be damaged. A healthy link instance becomes damaged if
the consumer process becomes uninstantiated.

A link instance can be destroyed to make a link become uninstantiated. This is accom-
plished by making the producer process stop sending its output to the consumer process.
Also, a link instance gets destroyed if the producer process becomes uninstantiated.

• Software system
A collection of processes and links which make up a processing chain is defined as a soft-
ware system.

The configuration of a software system is defined by a software system specification con-
sisting of a collection of process configurations and link configurations.

Based on a software system specification, a software system instance can be instanciated
by instantiating processes for all process configurations, and instantiating links for all link
configurations. If a software system instance is instantiated for a software system, the
software system is considered to be instantiated, otherwise it is uninstantiated.

A software system instance is considered to be healthy if all processes and links, as defined
in the software system configuration, are instantiated, and all link instances are healthy.
If this is not the case, the instance is considered to be damaged.

When a software system instance is created, at first all processes and links are uninstan-
tiated. This means that a software system instance always starts in a damaged state.
Bringing a software system instance from a damaged state to a healthy state is defined
as recovering the software system instance.

15

2.3 Problem definition 2 PROBLEM ANALYSIS

2.3 Problem definition
To instantiate a software system, the processes and links as specified in a software system speci-
fication need to be be mapped onto available processing nodes. In the current situation, Thales
defines this mapping in a static configuration of hardware components and software compo-
nents. In case the configuration of software components or hardware components changes, the
mapping has to be adapted manually to match the new configuration.

While a software system instance is operational, the processing nodes it is instantiated on may
become unavailable due to hardware failures. This causes any processes running on these pro-
cessing nodes to become uninstantiated, resulting in a damaged software system instance. To
restore the software system instance back to a healthy state, its processes and links which have
become uninstantiated have to be instantiated again, and link instances which have become
damaged have to become healthy again.

In the current situation, a failed processing node has to be replaced and configured to perform
the tasks of the processing node it is replacing. After replacing the failed processing node, the
system becomes healthy again. Thales wants the radar systems to be more reliable in the event
of processing nodes becoming unavailable. In case of the software processing subsystem, this
means downtime of software system instances need to be minimized in the event of a processing
node becomes unavailable. To accomplish this, Thales wants a software system instance to be
able to recover by itself automatically in case of processing nodes becoming unavailable.

The problem to be solved to get from the current situation to the desired situation can now be
defined as follows:

“How can a software system instance be automatically restored to health after it has become
damaged due to processing nodes becoming unavailable?”

Thales feels they were not technologically able to handle this problem in the past, so no work
has been done yet to solve the problem. They now see an opportunity to tackle the problem by
the use of the principles and patterns of service oriented architectures .

A service oriented architecture, or SOA in short, is an architectural style in which related business
processes are grouped and packaged as services which can interoperate to coordinate actions.
Over the past years, SOA has been widely adopted in the industry and as such principles and
patterns have started to emerge to solve common design problems. Some of the principles and
patterns of SOA might be helpful to solve the problem.
Thales wants to know how a system can be implemented to solve the problem just defined,
which incorporates the patterns and principles of SOA. The problem to be solved can now be
defined as follows:

“How can a system be implemented, based on the principles and patterns of SOA, which can
automatically restore the health of a software system instance after it has become damaged due
to processing nodes becoming unavailable?”

16

2.4 Research questions 2 PROBLEM ANALYSIS

2.4 Research questions
Now the problem is defined, additional sub-questions rise. Below is an overview of the sub-
questions we’re about to answer in the upcoming chapters.

Which techniques and implementations could contribute to implementing
a system to solve the problem?
What characterizes a distributed system such as the Thales radar chain?
What is service oriented architecture?
Part II

How can the principles and patterns of service oriented architecture con-
tribute to implementing a system to solve the problem?
Which existing implementations of service oriented architecture could con-
How can a dynamic configurable system be realized based on the principles
and patterns of SOA?
What kind of approach could be taken to restore the health of a damaged
system?
Which scenarios can be identified?
Which use-cases can be identified?
Part III

Which design challenges need to be solved?
How can the principles and patterns of SOA be applied to solve these design
challenges?
How can a system be designed to implement the solution?
How can the system design be implemented?
What conclusions can be drawn based on this study?
In what ways does service oriented architecture contribute to solving to the
Part IV

main problem?
Which recommendations can be made?

2.5 Conclusion
In this chapter, first the context of the problem was defined. Next, the Thales radar chain case
was introduced in order to define the problem domain and terminology used in this document.
Finally, the problem was defined based on the Thales radar chain case.

17

3 ASSIGNMENT

3 Assignment
This chapter describes the assignment as given by Thales. First, the goals of this study are
defined. Next, the scope of the study is defined by specifying the solution criteria, making
assumptions about the problem domain as defined by the Thales radar chain case, and specifying
what is outside the scope of this study.

3.1 Goal
The main goal of this study is to determine how a system can be implemented, based on the
principles and patterns of SOA, to automatically restore the health of a software system instance
after it has become damaged due to processing nodes becoming unavailable.

To reach this goal, the following partial goals have been defined:

1. Determine how a software system instance can be restored to health after it has become
damaged due to processing nodes becoming unavailable.

2. Determine how SOA can contribute to implement a system to automatically restore a
software system instance after it has become damaged due to processing nodes becoming
available.

3. Design and implement a prototype of a solution based on the principles and patterns of
SOA.

3.2 Study scope
3.2.1 Solution criteria
The solution to be found must adhere to the following criteria:

1. The current architecture as described in the case study should be kept intact as much as
possible.

2. Every distinct piece of data may be only processed once per process

3. The design and implementation must be based on the principles and patterns of SOA.

Further, the following assumptions are made:

1. Processes can run on any processing node.

2. If a processing node is available, it is always connected to all other available processing
nodes.

3. A processing node which is already running is never connected afterwards to another
running processing node.

4. Addresses of processing nodes do not change while the system is operational.

5. All processing nodes have access to all runtimes required by processes.

18

3.3 Conclusion 3 ASSIGNMENT

3.2.2 Outside the scope of this study
This study does not deal with the following aspects:

• Multiple software system specifications.

• Management of software system specifications.

• Connections to the exterior of the software processing subsystem.

• Loss of data which is processes by a software system instance.

• Handling of software failures.

• Optimizing system performance by any means.

• Removing single point of failures from the system.

• Applying the system to or integrating the system with any existing technologies.

3.3 Conclusion
In this chapter the assignment given by thales was defined. First, the study goals were defined.
Next, the study scope was defined by specifying the solution criteria, assumptions made about
the problem and defining what is outside the scope of this study.

19

Part II
Literature Study

20

4 DISTRIBUTED SYSTEMS

4 Distributed Systems
Processing radar signals requires many complicated computations to be performed. To accom-
plish this, Thales has distributed these computations throughout hundreds of hardware boards,
using a technique called distributed computing. Distributed computing is a form of parallel
computing and deals with both hardware and software systems containing more than one pro-
cessing element, storage element, concurrent process or program. Within distributed computing
a program is divided into parts which can run simultaneously on multiple computers within a
network. Such hardware or software systems are called distributed systems.

The subsequent sections provide a more detailed overview of distributed systems, their charac-
teristics and challenges. This chapter thereby attempts to answer the research question:

Research question. What characterizes a distributed system such as the Thales radar chain?

Firstly, the main characteristics regarding distributed computing are detailed. Secondly, (un)handled
issues of distributed systems are discussed. These topics are relevant with respect to the Thales
radar chain case.

4.1 Characteristics
A distributed system is not just another name for a network of computers. It is an application
that executes a collection of protocols to coordinate the actions of multiple processes on a net-
work, such that all components cooperate together to perform a single or small set of related
tasks. Components in networked computers communicate and coordinate their actions only by
passing messages. A distributed system is build on top of a network, presenting separated com-
ponents and multiple computers as if they were a single entity, providing the user, the consumer,
whatever services are required.

The main goal of a distributed system is to connect users and resources in a transparent, open
(i.e. each subsystem is continually open to interaction with other systems), and scalable way.
Ideally this arrangement is drastically more fault-tolerant and more powerful than many combi-
nations of stand-alone computer systems.

To accomplish this goal, a few requirements have to be met:

• The system must be extremely robust. For instance, it’s unacceptable that error messages
hold up the entire system until required user input is provided.

• Plug and play capability. Additional hardware or software can be instantly added to the
system, without needing to install them.

• High compatibility. Services and devices can interact with one another without the need
of additional conﬁguration.

• Automatic detection of new services or devices (e.g. a camera detects a newly connected
printer)

21

4.2 Objectives 4 DISTRIBUTED SYSTEMS

4.2 Objectives
Reliability is an important aspect in distributed computing. Because different subsystems in-
clude heterogeneous, overlapping and possibly conflicting information (pluralism), the system
has to deal with concurrency and inconsistency. Besides, executed actions or made publications
cannot be reverted (monotonicity).

To be truly reliable, a distributed system must have certain characteristics, which are summarized
in the listing below. [3, 4, 5] A distributed system needs to be:

Fault-tolerant It can recover from component failures without performing incorrect actions.

Highly available It can restore operations, permitting it to resume providing services even when
some components have failed.

Recoverable Failed components can restart themselves and rejoin the system, after the cause
of failure has been repaired.

Consistent The system can coordinate actions by multiple components often in the presence
of concurrency and failure. This underlies the ability of a distributed system to act like a
non-distributed system.

Scalable It can operate correctly even as some aspect of the system is scaled to a larger size
(e.g. increasing the size of the network, or the number of users).

Predictable performance The ability to provide desired responsiveness in a timely manner.

Secure The system authenticates access to data and services

Extensible Interfaces should be cleanly separated and publicly available to enable easy exten-
sions to existing components and add new components.

Interoperable despite heterogeneity Various entities in the system must be able to interop-
erate with one another, despite differences in hardware architectures, operating systems,
communication protocols, programming languages, software interfaces, security models,
and data formats.

4.3 Challenges and Issues
Distributed systems cause problems more frequently than fully local systems. Moreover, some
problem categories aren’t even relevant in local systems, for example (potential) networking
problems. In the first place, because processes and their required resources are distributed
across the network, the code or the data used by a process needs to be moved over and over
again. This requires compilation and installation respectively uniformity in data formats. Sec-
ondly, it can take a lot longer to access remote data, due to latency. Therefore the time that
it will take to complete an operation cannot be bounded in advance (unbounded determinism).
Thirdly, partial failures of the network can be a huge problem if the unavailability of a node can
cause disruption of the other nodes.

The characteristics listed in the previous section are high standards, which are challenging to
achieve. Probably the most difficult challenge is that a distributed system must be able to
continue operating correctly even when components fail. Services have to be highly available

22

4.4 Conclusion 4 DISTRIBUTED SYSTEMS

and fault-tolerant. A highly available service is one that continues to provide a possibly de-
graded service despite a certain number and type of process failures and despite disconnected
operations. A fault-tolerant service is one that always behaves correctly despite up to a given
number and type of failures.

To design a distributed system with the characteristics listed in the previous section, one must
design for failure. This implies not making any assumptions about the reliability of the com-
ponents of a system. Below is a listing of the eight most commonly (yet premature) made
assumptions, better known as the eight fallacies of distributed computing [1, 2].

Eight Fallacies

1. The network is reliable

2. Latency is zero

3. Bandwidth is inﬁnite

4. The network is secure

5. Topology doesn’t change

6. There’s one administrator

7. Transport costs are zero

8. It’s an homogeneous network 1

4.4 Conclusion
This chapter focused on several aspects of distributed systems which are relevant with regard
to the Thales radar chain case. We’ve overlooked some important requirements and objectives
when designing a distributed system. For the Thales radar chain case, robustness, fault-tolerance
and high availability are the most important among these requirements and objectives. The last
part of this chapter focused on important challenges and issues which should be conquered in
our system’s design. Especially the eight fallacies of distributed computing should be taken into
account.

1
This fallacy was added six years later by James Gosling (inventor of Java).

23

5 SERVICE ORIENTED ARCHITECTURES

5 Service Oriented Architectures
The previous chapter focused on common issues and challenges regarding distributed systems
which should be taken into account by designing a dynamic reconfigurable system. This chapter
details about architectural patterns that could be of use while designing such a system. This
chapter thereby attempts to answer the research questions:
Research question. What is service oriented architecture?
Research question. How can the principles and patterns of service oriented architecture con-
The principles and patterns of service oriented architecture treated in this chapter provide the
first step to a logical design.

5.1 Overview
Service oriented architecture, or SOA for short, can essentially be defined as an architectural
paradigm in software design which is based on services which interoperate to perform a certain
task. There is no official definition of SOA, but a more elaborate one is stated by OASIS (Or-
ganization for the Advancement of Structured Information Standards):

“A paradigm for organizing and utilizing distributed capabilities that may be under the control
of different ownership domains. It provides a uniform means to offer, discover, interact with
and use capabilities to produce desired effects consistent with measurable preconditions and
expectations.”

This definition still leaves a lot of gaps to be filled if one wants to implement SOA. As can be
seen by studying existing SOA implementations, the vision put forth in existing implementations
can vary greatly for most aspects of this definition.

This chapter will further explore the field of SOA by first looking at the characteristics which
define a SOA. The principles for a good SOA design are explained followed by patterns for solving
common design problems in SOA. Finally, the key elements of some SOA implementations are
discussed.

5.2 Principles
Services are the building blocks of SOA applications. They are an embodiment of the separation
of concerns theory which is based on the notion that large problems become easier to handle
as they are broken down into smaller problems. In a way services in SOA are similar to classes
in object oriented programming. Using classes to break down a problem into seperate concern
works well on small levels, but as a system gets bigger the shear number of classes can introduce
a lot of complexity.

Services, however, break problems down on a much more granular level to solve these complexity
issues. They provide a collection of related capabilities to service consumers and are called as
such service providers. The definition of a service does not place any limits on what kind of
capabilities a service can provide. A service could for example provide functionality for user au-
thentication, but it could as well provide access to hardware systems to allow service consumers

24

5.2 Principles 5 SERVICE ORIENTED ARCHITECTURES

to perform computations on those systems.

Over the past years, the industry defined a common set of design principles for SOA which
should make implementing a SOA more successful. Interoperability is fundamental to every one
of these principles and therefore an expected service design characteristic. Moreover, stating
that services should exist implies stating that services should be interoperable. Each of the eight
common principles supports or contributes to interoperability in a way. These principles and
their relation to the overarching principle of interoperability will be discussed in the subsequent
sections.

Loose Coupling
Coupling implies some kind of connection or relationship between entities, thus, a level of de-
pendency. There are numerous types of coupling involved in the design of a service within the
context of SOA, regarding service contracts, their implementation and service consumers. The
principle of loose coupling addresses to reduction (’loosening’) of these dependencies, by the
creation of a specific type of relationship within and outside of service boundaries. By making
the individual services less dependent on others, they are more accessible for different consumers
and interoperability is increased.

Loose coupling could obviously be achieved by detaching the service interface from its underlying
implementation, but the appropriate level of coupling requires that practical considerations be
balanced against various service design preferences. This includes the independent design and
evolution of a service’s logic and implementation while still guaranteeing baseline interoperability
with dependent consumers. [19]

Having loosely coupled components also means having a more fault-tolerant system, because
dependencies between components are minimalized. When a single component fails, the other
components could still be in a operational state. This is very important in the previously
described Thales radar chain case.

Service Contracts
A service contract communicates the purpose and capabilities of a service and describes how the
service interacts with its consumers. It could be viewed as a composition of functional meta-
data and a set of policies, such as security constraints, transport and service level agreements.
For instance, security requirements may differ when the service is consumed outside a trusted
network. Information about services is limited to what is published in service contracts.

A service contract consists of the following components:

Header section including the name, version, owner and type (e.g. process, data, etc. ) of the
service. The name should indicate the functionality of the service in general terms. The
type helps to distinguish the layer in which the service resides.

Functional section contains the functional requirements, invocation means (e.g. SOAP, REST,
Event Trigger, etc.) including the URL and interface, supported operations, methods and
actions of the service. The description should be very accurate.

25


Non-Functional section contains security constraints and roles, service level agreement which
determines the amount of latency allowed and quality of service which determines the
allowable failure rate. Additionally, in case the service is part of a larger transaction the
means to control this should be indicated.

All services within the same repository should use a standardized format for describing a service
contract to maximize interoperability. Service contracts enable loose coupling by hiding service-
internal details from the outside world behind a facade.

Abstraction
A service should never detail about how it goes about its business to meet the requirements of
the contract. For example, it doesn’t matter which programming language or platform was used
to implement the service, as long as the service sticks to its side of the contract. Abstracting
service details limits all interoperation to the service contract.

By obeying these guidelines, the underlying service logic can be exchanged or evolved indepen-
dently of the components which rely on the service. This increases the long-term consistency
of interoperability.

Reusability
Reusability forms the base of key service models. The official definition for this principle states:
”Services contain and express agnostic logic and can be positioned as reusable enterprise re-
sources.” Individual service capabilities should be appropiately defined in relation to an agnostic
(i.e. asserting the uncertainty of all claims to knowledge) service context. Reusability fur-
ther requires a high-level of interoperability between the service and several potential service
consumers.

Autonomy
The underlying service logic requires a certain autonomy with regard to its execution environ-
ment and resources to provide their capabilities in a consistent and reliable way. Increasing this
degree of control to a significant level leads to minimization or at least reduction of depen-
dencies on shared resources. Moreover, it contributes to making the behaviour of the service
more consistently predictable by simultaneously increasing its reuse potential and thereby its
attainable level of interoperability.

Autonomy on a service level distinquishes service boundaries from one another, although the
service might still share several underlying resources. This can be illustrated, for instance, by
a wrapper service that encapsulates a legacy system which is independently utilized from the
service and still shares resources with other legacy based clients.

Autonomy could also be taken one step further, as such the underlying logic is completely owned
by the service. This generally is the case when the supportive service logic has been built from
the ground up. On the one hand, this obviously is advantageous with regard to scalability.
Besides, it provides a more reliable solution to countering the single point of failure (i.e. a part
of a system which, if it fails, will stop the entire system from working) risk. This is particularly
relevant in the previously explored Thales radar chain case, which currently contains such a

26


single point of failure due to its static configuration. Increasing service autonomy could decrease
mutual dependency of components of the radar chain and thereby increasing the system’s fault-
tolerance. On the other hand, this implies the need of rendering and deployment of new service
logic, which could increase expenses and efforts.

Statelessness
Services, ideally, are designed to contain state information only when this is explicitly required.
Management of this information could namely compromise their availability and undermine their
scalability potential. Therefore, a stateless design allows services to interoperate more frequently
and reliably. In such a design, adequacy of the surrounding technology architecture to provide
state management delegation and deferral options should be taken into account.

Discoverability
The discoverability characteristic of a SOA is meant to help avoiding the accidental creation of
services that are either redundant or implement redundant logic. Owing to the fact that each
particular service operation is meant to provide a potentially reusable piece of automation logic,
metadata that comes attached to a service must sufficiently describe the functionality offered
by its individual operations in addition to its overall purpose.

Although this particular characteristic is distinct from discoverability on an architectural level,
in which case the term service discoverability refers to the technology architecture’s ability to
provide a mechanism of discovery (e.g. a service directory or registry), it is largely consistent
with it. This actually becomes part of the overall infrastructure that is meant to support the
implementation of a SOA.

On a service level, the term discoverability refers to the design of an individual service so discov-
erability is maximized, regardless the needs for it in its surrounding implementation environment.
Even if there’s no need for a service registry, services should be designed as highly discoverable
resources by equipping them with sufficient metadata to properly communicate its purpose and
capabilities. This simply allows services to be more easily located by potential consumers. Be-
sides, the evolutionary governance can be better managed when the service portfolio increases
in size. [10, 11]

When looking at the previously posed Thales radar chain case, service discovery could solve the
problem of checking whether all required services are still up and running. This is of particularly
importance, because all services are essential and if they appear to be down, they should be
restarted in some ways.

Discovery mechanism To allow service consumers to access their requested services, it’s
required for them to know how to find and access the service. To accomplish this, the so-called
Offer-Discover-Interact model can be used. This model consists of the following steps:

Offer When a service becomes available, it publishes its services by registering it’s interface, so
other entities can make use of them.

Discover Service consumers can find published services by using a discovery mechanism. Usu-
ally, a consumer sends a lookup request to a service registry, which contains all available

27

5.3 Patterns 5 SERVICE ORIENTED ARCHITECTURES

services and provides a service interface for the consumer to ’communicate’ with.

Interact The service consumer can now use the published services to accomplish its tasks,
through the service interface. The consumer thereby monitors the progress of the service.

These steps can be accomplished by using a service discovery protocol such as the Service Lo-
cation Protocol (SLP) or the one provided by the Jini framework. SLP provides a framework
which allows discovering the existence, location and configuration of networked services. Jini
is an open software architecture that enables developers to create services that are adaptable
to changes in the network. Its specification offers a standard lookup service, which can be
discovered with a simple API call once running. [15]

The following steps summarize the procedure for using the SLP or Jini lookup service:

1. The address, respectively a connector stub is registered with the lookup service, possibly
giving additional attributes that qualify the connector, and can be used as filters.

2. The client queries the lookup service, and retrieves one or more addresses, respectively
connector stubs that match the query.

3. Finally, the client obtains a connector that is connected with the server identified by a
retrieved address respectively connects directly to the server using the provided connector
stub.

Composability
Composability of a service addresses its requirement to be capable of participating as an effective
composition member, regardless of whether there is a direct need to be listed in a composition.
Again, interoperability is an important precondition. In addition, succeeding in meeting the
composability requirements often depends on the extent to which services are standardized and
data exchange between them is optimized.

5.3 Patterns
As more and more software systems are developed, similar solutions will be used to solve prob-
lems which cause patterns to emerge. Just like in object-oriented design, in SOA the same
architectural problems arise over and over again. In this section we’ll take a look at a couple of
relevant SOA patterns with regard to the several problems we need to solve.

Lookup
The lookup pattern provides a way of finding and accessing resources, regardless of whether
they are local or distributed. [12] A resource could initially be anything, for instance a piece of
data. In the current context, a service is regarded as a resource.

Problem A fundamental problem of resource acquisition is finding the concerning resource (if
available) in the first place. Resources could be managed (i.e. adding and removing resources)
by resource providers. Such a resource providers could, for example, frequently send broadcast
messages offering available resources, so interested consumers become aware of their existence.

28


Conversely, consumers could send broadcast messages requesting required resources. The con-
sumer could then choose the offered resources it needs from all replying resource providers.

Both ways, however, could frustrate efficiency since lots of messages are send across the network
(in case of a distributed system). An efficient and inexpensive solution requires [12]:

Availability A resource consumer must be aware of available resources in its environment.

Bootstrapping A resource consumer should be able to obtain an initial reference to a resource
provider that offers the resource.

Location independence Resource consumers and providers should be able to acquire respec-
tively provide a resource, regardless of whether they know each others locations.

Simplicity Resource consumers and providers shouldn’t be burdened.

Solution The lookup pattern addresses this problem by using a so called lookup service as a
mediating instance. Via this lookup service, the resource provider publishes resources along with
describing properties. In the same ways, resource providers also register references to themselves,
so consumers could retrieve these, search for required resources using the properties, retrieve
and finally use these resources. [12]

A Jini lookup services contains service type, id’s and specific attributes of registered services.
Consumers search into the lookup service for their desired service, based on type, service id (if
they happen to know this) or specific attributes. [16].

Leasing
Leasing solves a lot of the problems inherent in distributed computing. Self-healing addresses
one of the primary concerns. Distributed systems should function for a long time without needing
humans to make repairs or reconfigurations. A second concern is evolvability (e.g. upgrading
the system). It is out of the question to take the system down for maintenance. Moreover,
it isn’t guaranteed every machine is reachable to be upgraded smoothly without failures. One
must be able to evolve the system incrementally.

Problem At a certain point, a resource user may lose his interest in using the concerning
resource. The resource is then needlessly consumed, unless the user releases it by explicitly
terminating its relationship with the provider. This not only negatively affects the performance
of resource user and provider, but may also have a degrading effect on resource availability for
other users.

A second problem could occur when dealing with distributed resource users and providers. When
the machine of the latter crashes, the resource user, being uninformed about resources becoming
unavailable, may continue to reference resources which are no longer available. [12]

Solution The primary idea behind leasing is that a lease holder must establish a continued
proof-of-interest in using some resource, which can be essentially anything, if it is allowed access
to it in the first place. So, for every resource used by some resource user a lease is introduced.
This lease is granted by a grantor and obtained by a holder, typically the resource provider

29


respectively the resource user. Additionally, a time duration for usage of the ’reserved’ resource
is specified by the lease.[12] If the lease holder fails to demonstrate interest, the lease expires
and the resource is released.

By granting a lease, the system guarantees that failures will be detected without requiring any
separate component other than the lease grantor. Leasing also guarantees that irrelevant data
will simply be forgotten when leases expire; it automatically cleans up after failed components
and the concerning service will be forgotten. This provides also a way to evolve parts of the
system in isolation. One is free to run a different version of a ’forgotten’ service and plug it in.

In a Jini system, for instance, the lookup service uses time-based resource reservation for storing
service items, called a lease. The grantor of the lease, the lookup service, makes the call,
accepting or denying the lease. While a lease is active, the lease holder can cancel it, in which
case the corresponding resource is also freed. The holder, the service, can renew the lease. If
the lease isn’t renewed for a certain amount of time, the service is supposed to be unavailable
and will be ’forgotten’ (i.e. the service item will be cleaned up) [16].

Proxy
The proxy pattern lets resource consumers communicate with a representative, rather than to
the resource itself. This straightforwarded principle serves many purposes, such as providing
easier access and protection of unautorized access. [14]

Problem In many cases it is often considered inappropriate to access a component or resource
directly. It is undesired to configure their physical location in a static way and unrestricted
access to them may be inefficient or even insecure. Additional control mechanisms are needed
to ensure access to entities lapses in an efficient, safe and transparent 2 way. In addition, a
consumer should be able to access any component or resource using the same calling behaviour
and syntax.

Solution The solution to the problems stated above is to a representative, a so called proxy,
to offer the interface of the concerning entity. This representative performs additional pre- and
postprocessing (e.g. access-control, checking or making read-only copies of the original).

In a Jini system, each application uses services through so-called proxies. A proxy allows the
program to communicate with the service, but shields its details. Proxies are dynamically down-
loaded by the consumers of the service. This way, extension of functionality can be accomplished
on-the-fly. Proxies use the same protocol as the backend portion of the service. Consumers are
shielded from this information. All they care about is the provided functionality of the service.

One special service, the Lookup Service, keeps track of all the available services and provides
access to them. Services publish themselves by storing their specific proxy in the lookup service.
This publishing process is called join [16]. The Lookup Service now contains a so-called service
entry, which consists of a unique service id, a proxy and a number of attributes which describe
the functionality of the service. Consumers query the lookup service for available services and
the Lookup Service provides the proxy of the requested service (type).
2
Full transparency can obscure cost differences between services.

30


Publish-subscribe
Publish-subscribe is an asynchronous messaging paradigm where senders (publishers) of mes-
sages are programmed to characterize the messages into classes before posting them, regardless
what receivers (subscribers) might or might not read them. Subscribers express interest in one
or more classes, and only receive messages that are of interest, without knowing what publishers
posted these messages.

Problem In the traditional tightly-coupled client-server paradigm, the client cannot post mes-
sages to the server while the server process is not running, nor can the server receive messages
unless the client is running. This means that system components need to check if a speciﬁc
service is up and running each time they want to send a message to it. This unnecessarily
burdens the system.

Solution The solution to the problem stated above is the decoupling of publishers and sub-
scribers, which can allow for greater scalability and a more dynamic network topology.

Distributed event-based systems use the publish-subscribe paradigm in which an event-generating
object publishes the type of events that will be available for other objects. These systems are
useful for communication between heterogeneous components and their asynchronous nature
allows publishers and subscribers to be decoupled.

Whiteboard pattern
The whiteboard pattern deﬁnes a central application manager to handle dependencies between
event sources and event listeners. This straightforwarded principle is of great importance when
dealing with dynamic behaviour of system components.

Problem The most relevant but not so obvious issue with the traditional Listener pattern is
the dependency that is created between the event source and the listener. This is called the life
cycle issue: If the event source goes away, the listener must clean up any references it holds and
vice versa. This removal phase is hard to verify. It is often not handled at all in workstation
environments, where an application is started by the user, because management of listeners
is a non-issue and will be handled when exiting the application. However, when dealing with
continuously running applications in a dynamic environment, as in the Thales radar chain case,
consequent life cycle management is extremely important.

Solution Applying the whiteboard pattern solves the problem stated above. Unlike the listener
pattern, the whiteboard pattern leverages a central application manager for handling life cycle
management. Instead of having event listeners track event sources and then register themselves
with the event source, the whiteboard pattern has event listeners register themselves at a central
application manager. When the event source has an event object to deliver, the event source calls
all event listeners in this application manager. As a result, both server and application become
simpler because they reuse the central application manager and can delegate the responsibility
for managing the details of dependencies between source and listeners to it.

31

5.4 Conclusion 5 SERVICE ORIENTED ARCHITECTURES

5.4 Conclusion
This chapter explained what service oriented architecture (SOA) means. It focused on several
relevant SOA principles and patterns which could be applied when designing our system. Es-
pecially the lookup pattern (locating available resources), leasing pattern (detection of services
going down) and the whiteboard pattern (consequent life cycle management) have proven to be
very useful.

By using these patterns and keeping the SOA principles in mind a set of interoperable services
could be defined for dynamic life cycle management of system components. This way, a logical
design is defined for a dynamic reconfigurable system. The next chapter details about an existing
SOA implementation which could provide a lot of the required facilities.

32

6 OSGI

6 OSGi
This chapter details about an existing SOA implementation named OSGi. Only the Thales radar
chain case related aspects of the OSGi framework will be described. For each aspect will be
defined which problem posed by the Thales radar chain case is solved. This chapter thereby
attempts to answer the research question:

Research question. Which existing implementations of service oriented architecture could con-

6.1 Overview
OSGi provides a service-oriented, component-based environment for developers and offers stan-
dardized ways to manage the software lifecycle. Technically, OSGi is a specification for a service
platform framework and service bundles. An OSGi implementation has to implement the frame-
work and can optionally provide service bundles which support basic functionalities such as
logging.

The OSGi Framework implements a complete and dynamic component model, which doesn’t
exist in standalone Java/VM environments. It is a service framework in which services, pack-
aged into software components called bundles, can be installed, updated and removed without
restarting the framework. Although it is intended for relatively small embedded devices, it is
widely applicable.

The OSGi framework consists of three layers (see figure 3), namely the module layer, the lifecycle
layer and the service layer. Each of these layers contribute to solving subproblems posed in the
previous chapter.

Figure 3: OSGi Framework layering

33

6.2 Module Layer and Fault-Tolerance 6 OSGI

6.2 Module Layer and Fault-Tolerance
Owing to the fact that we’re dealing with a distributed system in our case, loose coupling be-
tween system components is very important. The system must be fault-tolerant in such way
that when a processing node fails, the rest of the system remains in an operational state. To
accomplish this, modularization is an important issue.

The modularization concept in OSGi Framework is supported by the module-based class loading
policy defined by the module layer. Usually, Java applications have a flat class loader architec-
ture. OSGi bundles add a modularization layer to Java which allows modules to declare shared
and private class space and controls linking between modules.

A bundle is the central unit of OSGi. It’s a JAR file which contains resources such as Java code
or native libraries. Bundles are encapsulated and separated from each other by a name space
concept.

OSGi applications can consist of several bundles which are loaded by (at least) one individual
private class loader. Bundles could be used by other applications running on the same platform,
but unless Package-Exports are defined, bundle code is private. Package-Imports and Package-
Exports define dependencies between bundle code and are stored as additional entries in the Jar
Manifest File. Exported packages are public and could be used for resolving imports of other
bundles who defined a package import. These bundles resolve the import by consulting the
package database and creating a delegation from the importing class loader to the exporting
class loader. This allows dynamic runtime linking of bundle code.

When, for instance, a service must input every class within the framework, import dependencies
of packages cannot be determined during compilation time. In these cases the DynamicImport
mechanism could be used by defining a wildcard asterisk (*) in the bundle manifest. This
indicates that additional packages might be required.

6.3 Lifecycle Layer and Dynamic Life Cycle Management
The second important issue is dynamic life cycle management. This means in our case that pro-
cessing nodes running container services, as defined in the logical design in the previous chapter,
could be added to or removed from the system on-the-fly. The rest of the system should remain
in an operational state.

The lifecycle layer introduces this kind of dynamics that are normally not part of an applica-
tion. It deploys application or components as OSGi bundles which can be managed at runtime.
Bundles can be remotely installed, started, stopped, updated and uninstalled without the need
of rebooting the system. They rely on the module layer for class loading but add an API to
manage the modules in run time. Each container from our logical design is implemented as an
OSGi bundle and could be inserted to the system on-the-fly.

Bundles have their own Activator class which implements the start and stop methods of the
BundleActivator interface. These methods will be invoked when a Bundle is started respectively
stopped. A so-called BundleContext object which is passed by these methods supports usage of
the OSGi framework. In general, bundles hold a public static reference to the BundleContext

34

6.4 Service Layer and Service Discovery 6 OSGI

object after receiving it in the start method. This allows other classes to interact with the
framework.

Bundles are installed by creating at least one new class loader. Deinstallation is achieved by
disposing the private class loaders. Implicitly, all the bundle code is then removed from the
system without affecting other bundles. Private code parts of active bundles could be updated
at runtime. Exported code could only be updated when the PackageAdmin services enforces the
framework to reload.

6.4 Service Layer and Service Discovery
In our design we’ve defined several services, each with different responsibilities. Services need
to be available for other services or components throughout the system, so some kind of lookup
service, as defined in the previous chapters, is required.

OSGi provides these mechanisms in the service layer. Each bundle may provide multiple services
by registering service objects using the BundleContext. A service is a java object which can be
used by other bundles. This way, interaction between bundles is decoupled. The service layer
maintains a service registry with all provided services together with an optional set of service
attributes that can be passed to the framework during registration. The service registry makes
it possible for bundles to detect newly added or removed services.

Bundles can retrieve a service by requesting a service reference for the name of an interface,
not knowing whether a service that implemented that interface actually exists on the service
platform. Service requests could also contain LDAP String Filters. These filters are matched
against the service attributes3 of a candidate.

ServiceFactories are special kinds of service provider classes. For every bundle that requests
a service, a new instance of the service object is created. However, the framework caches in-
stances per bundle so a bundle might get the same instance all the time. To track the lifecycle
of bundles that provide services, the ServiceTracker can be used. The ServiceTracker is a service
and provides tracking of all bundles matching certain criteria.

The ServiceTracker could contribute to solving our problem of detection of services becoming
unavailable. When, for instance, a container service becomes unavailable, the ServiceTracker
detects this. This way, we could find out that a processing node and all its process runtimes
have become unavailable.

6.5 Event Handling
As stated before, a ServiceTracker could be used to keep track of the status of a certain service.
Additionally, a mechanism is required to notify the rest of the system in case of state changes
of services or components.

OSGi signals state changes in the framework by Events. Bundles can subscribe for certain
event types by implementing corresponding listeners. Events related to lifecycle management
3
Although every Comparable object can be used as an attribute, only Boxed types of the eight basic types,
Vectors and Arrays containing them could be safely matched unambiguously.

35

6.6 Wiring of Processes 6 OSGI

are grouped into FrameworkEvents and BundleEvents. State changes related to services fire
ServiceEvents which are detected by the previously described ServiceTracker. Bundles can also
generate their own events.

FrameworkEvents are fed into a so-called EventAdmin service. This service provides a generic
framework for interservice communication. Owing to the fact that services dynamically appear
and disappear, the EventAdmin uses the publish-subscribe pattern which reflects loose coupling.
It could be seen as a channel between sending and receiving services.

Events are published under a certain topic based on hierarchical name spaces. This topic is
stored in the property field EVENT TOPIC. OSGi services generally use the form fully/qualified/-
package/Classname/ACTION. For instance, framework events have the topic org/osgi/frame-
work/FrameworkEvent/STARTED. Similar to service attributes, events can have EventProper-
ties that provide additional information about the event.

Bundles can subscribe for events by registering an EventHandler instance as a service. The
EVENT TOPIC property is set to an array of relevant topics. To solve the problem of detecting
services becoming unavailable, a bundle which depends on a service could register for service
down events regarding that service. Again, the asterisk (*) is the wildcard character which
indicates all events with a matching prefix in the topic name will be handled. As with services,
an additional LDAP style filter string can be assigned to the EVENT FILTER property to narrow
the scope.

To publish events, a bundle has to retrieve the EventAdmin service and invoke sendEvent()
or postEvent() for synchronous respectively asynchronous delivery of events. The former one
should be used with care due to the risk of deadlocks.

6.6 Wiring of Processes
In addition to (remotely) starting and stopping container services and their corresponding pro-
cess runtimes, we need a mechanism for linking the output of a process to the input of another
process. In OSGi, this can be accomplished by the use of the so-called WireAdmin Service.

The goal of the OSGi WireAdmin Service is to enable services that generate some sort of data
to send it to the services interested in the same data. The data can be updated dynamically
so that the interested services can receive the new values regularly. The WireAdmin Service
provides configuration data (in the OSGi ConfigurationAdmin Service) through which new virtual
connections (known as wires) can be established when a new service needs to receive the data
output. Useless wires can easily be removed. The main advantage of using the WireAdmin
service is that it decreases the need for wired bundles to have context-specific knowledge about
the opposite party. They never need to communicate with each other directly but through the
WireAdmin Service.

6.7 R-OSGi
OSGi solves a couple of problems related to detection and tracking of services and other system
components on a local machine. Now we need to go one step further, because we’re dealing
with a distributed system. This poses a whole lot of new problems, for instance, some services

36

Dynamic System Configuration using SOA

Dynamic System Configuration using SOA

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Dynamic System Configuration using SOA

Ähnlich wie Dynamic System Configuration using SOA (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Dynamic System Configuration using SOA