1. Project overview, use cases, specifications,
software development and experimental activities
RINA Workshop, Dublin, January 28th –29th 2014
Investigating RINA as an Alternative to TCP/IP
2. Agenda
• Project overview
• Use cases
– Basic scenarios (Phases 1 and 2)
– Advanced scenarios (Phases 2 and 3)
• Specifications
– Shim DIF over 802.1Q
– PDU Forwarding Table Generator
– Y2 plans
• Software development
–
–
–
–
High level software architecture
User-space
Kernel-space
Wrap-up
• Experimental activities
–
–
–
–
Intro, goals, Y1 experimentation use case
Testbed and results at i2CAT OFELIA island
Testbed and results at iMinds OFELIA island
Conclusions
2
3. Project at a glance
•
What? Main goals
– To advance the state of the art of RINAtowards an architecture reference
model and specificationsthat are closerto enable implementations
deployable in production scenarios.
– The designand implementation of a RINA prototype on top of Ethernet
will enable the experimentationand evaluation of RINA in comparison to
TCP/IP.
Who?5partners
From 2014
5 activities:
WP1: Project management
WP2: Architecture, Use cases and
Requirements
WP3: Software Design and
Implementation
WP4: Deployment into OFELIA
testbed, Experimentation and
Validation
WP5: Dissemination, Standardisation
and Exploitation
Budget
Total Cost
1.126.660 €
EC Contribution
870.000 €
Duration
2 years
Start Date
1st January 2013
External Advisory Board
Juniper Networks, ATOS,
Cisco Systems, Telecom Italia
3
4. Objectives (I)
• Enhancement of the RINA specifications
– The specification of a shim DIF over Ethernet
– The completion of the specifications that enable DIFs that provide a
level of service similar to the current Internet (low security, best-effort)
– The project use cases
• RINA Open Source Prototype for the Linux Operating System
– Targeting both the user and kernel spaces, allowing RINA to be used
on top of different technologies (Ethernet, TCP, UDP, etc)
– It will provide a solid baseline for further RINA work after the project.
IRATI will setup an initial open source community around the
prototype.
4
5. Objectives (II)
• Experimentation with RINA and comparison with TCP/IP
– IRATI will follow iterative cycles of research, design, implementation
and experimentation, with the experimental results retrofitting the
research of the next phase
– Experiments will collect and analyse data to compare RINA and
TCP/IP in various aspects like: application API, programmability, cost
of supporting multi-homing, simplicity, etc.
• Interoperability with other RINA prototypes
– The achievement of interoperability between independent
implementations is a good sign that a specification is well done and
complete.
– Current RINA prototypes target different programming platforms
(middleware vs. OS kernel) and work over different underlying
technologies (UDP/IP vs. Ethernet) compared to the IRATI prototype.
5
6. Objectives (III)
• Provide feedback to OFELIA
– Apart from the feedback to the OFELIA facility in terms of bug reports
and suggestions of improvements, IRATI will actively contribute to
improving the toolset used to run the facility.
– Moreover, the experimentation with a non-IP based solution is an
interesting use case for the OFELIA facility, since IRATI will be the first to
conduct these type of experiments in the OFELIA testbed.
6
7. Project Outcomes
•
Enhanced RINA architecture reference model and specifications,
contributed to the Pouzin Society for experimentation. IRATI will focus on
advancing the RINA state of the art in the following areas:
–
–
–
–
–
•
DIFs over Ethernet
DIFs over TCP/UDP
DIFs for hypervisors
Routing
Data transfer
Linux OS kernel implementation of the RINA prototype over Ethernet
– By the end of the project an open source community will be setup in order to
allow the research/industrial networking community to use the prototype
and/or contribute to its development
•
Experimental results of the RINA prototype, compared to TCP/IP
•
DIF over TCP/UDP extensions, interoperable with existing RINA prototypes
7
11. Basic use cases
Shim DIF over Ethernet
•
Goal: to ensure that the shim DIF over Ethernet provides the required
functionality. The purpose of a Shim DIF is to provide a RINA interface to
the capability of a legacy technology, rather than give the legacy
technology the full capability of a RINA DIF.
11
12. Basic use cases
Turing machine DIF
•
Goal: to provide a testing scenario to check a normal DIF complies with
a minimal set of functionality (the “Turing machine” DIF).
12
14. Advanced use cases
Introduction
•
RINA applied to a hybrid cloud/network provider
– Mixed offering of connectivity (Ethernet VPN, MPLS IP VPN, Ethernet
Private Line, Internet Access) + computing (Virtual Data Center)
Datacenter Design
Access Network
Wide Area Network
14
15. Advanced use cases
Modeling
PE
CE
CE
Customer 1 Site A
Customer 1 Site B
PE
CE
MPLS backbone
CE
Customer 1 Site C
PE
Customer 2 Site A
CE
PE
Customer 2 Site B
PE
Internet GW
CE
CE
TOR
TOR
TOR
TOR
HV VM
VM VM
HV VM
VM VM
HV VM
VM VM
HV VM
HV VM
VM VM
HV VM
VM VM
Public Internet
HV VM
VM VM
HV VM
VM VM
VM VM
HV VM
VM VM
HV VM
VM VM
HV VM
VM VM
HV VM
VM VM
HV VM
VM VM
HV VM
VM VM
HV VM
VM VM
HV VM
VM VM
Data Center 1
Customer 2 Site C
End
user
Data Center 2
15
16. Advanced use cases
Enterprise VPN over operator’s network
Wide Area Network
• Logical separation of customers through: MPLS
encapsulation, BGP-based MPLS VPNS and Virtual
Routing and Forwarding (VRF)
Access network
• Use of Ethernet switching within
metro-area networks
• Logical separation of traffic
belonging to multiple customers
implemented through IEEE 802.1Q
16
17. Advanced use cases
Enterprise VPN over operator’s network: Applying RINA
•
Backbone DIF: provides the equivalent of the MPLS network. This DIF must be able to
provide flows with “virtual circuit” characteristics, equivalent to MPLS LSPs.
•
Provider top-level DIF: This DIF provides IPC services to the different customers, by
connecting together the CE routers. The DIF may provide different levels of service,
depending on the customer’s requirements. There may be one or more of these DIFs
(one per customer, one for all the provider customers, etc).
•
Intra customer-site DIFs: The DIF whose scope is a single customer site. Its
characteristics will depend on the size and needs of the customer (e.g. could be a
campus network, an enterprise network, etc)
•
Customer A DIF: Can provide connectivity to all the application processes within
customer A’s organization. More specialized DIFs targeting concrete application types
(e.g. voice, file transfer) could be created on top.
17
18. Advanced use cases
Hypervisor integration: With TCP/IP
Virtual Machine 3
Virtual Machine 2
eth0
eth0
vif3.0
shared
memory
192.168.1.3
192.168.1.2
SW bridge 0
bridge if
eth0
VLAN 2
eth6
192.168.1.3
eth1
Top of Rack Switch
Out of the DC
eth0
shared
memory
eth3
eth0.2
bridge if
SW bridge 1
eth2
Hypervisor Machine
vif3.0
vif2.0
shared
memory
eth1.5
Virtual Machine 1
Out of the DC
eth5
Hypervisor Machine
eth1
eth1
eth0
eth0
VLAN 5
192.168.1.1
eth0
bridge if
SW bridge 0
shared
memory
vif1.0
192.168.1.2
vif2.0
shared
memory
eth0
Virtual Machine 3
eth1.5
bridge if
eth0.2
Virtual Machine 2
Virtual Machine 1
192.168.1.1
SW bridge 1
vif3.0
shared
memory
eth0
18
19. Advanced use cases
Hypervisor integration: With RINA
Hypervisor
Hypervisor
Green customer DIF
VM
Shim DIF over
802.1q
TOR
Shim DIF for HV
VM
VM
Out of the DC (to customer VPN
or Internet Gateway)
19
20. Advanced use cases
VDC + Enterprise VPNs over the Internet: With TCP/IP
Green Customer premises
Border router
Customer machines
Switch
Blue Customer premises
Border router
NAT,
Gateway
NAT,
Gateway
Customer machines
Switch
Datacentre
Border router
Public Internet
eth2
eth3
Public Internet
NAT,
Gateway
eth0
eth1
Datacenter
premises
20
21. Advanced use cases
VDC + Enterprise VPNs over the Internet: With RINA
Hypervisor
Hypervisor
Green customer DIF
VM
Shared
memory
Shim DIF over
802.1Q
VLAN 2
Shim DIF for HV
Shared
memory
VLAN 2
TOR
VM
VLAN 2
VM
Shared
memory
DC Border
router
Server
Shim DIF over
TCP/UDP
Datacenter
premises
Public
Internet
VLAN 10
Green Customer
premises
Customer Border
router
Server
Shim DIF over
802.1Q
VLAN 10
Layer 2
switch
VLAN 10
21
22. Agenda
• Project overview
• Use cases
– Basic scenarios (Phases 1 and 2)
– Advanced scenarios (Phases 2 and 3)
• Specifications
– Shim DIF over 802.1Q
– PDU Forwarding Table Generator
– Y2 plans
• Software development
–
–
–
–
High level software architecture
User-space
Kernel-space
Wrap-up
• Experimental activities
–
–
–
–
Intro, goals, Y1 experimentation use case
Testbed and results at i2CAT OFELIA island
Testbed and results at iMinds OFELIA island
Y2 plans
22
24. Shim DIF over Ethernet
General requirements
• The task of a shim DIF is to put a small as possible veneer over a
legacy protocol to allow a RINA DIF to use it unchanged.
• The shim DIF should provide no more service or capability than
the legacy protocol provides.
24
25. Examining the Ethernet Header
• Ethernet II: specification released by DEC, Intel,
Xerox (hence also called DIX Ethernet)
Preamble
MAC dest
MAC src
802.1q
header
(optional)
Ethertype
Payload
FCS
Interframe
gap
7 bytes
6 bytes
6 bytes
4 bytes
2 bytes
42-1500
bytes
4 bytes
12 bytes
25
26. Ethertype
• Identifies the syntax of the encapsulated protocol
• Layers below need to know the syntax of the layer
above
• Layer violation!
26
27. Consequences of using an Ethertype
• Also means only one flow can be distinguished
between an address pair
• The MAC address doubles as the connection
endpoint-id
27
28. Shim DIF over Ethernet
Environment
Investigating RINA as an Alternative to TCP/IP
28
29. Address Resolution Protocol
• Resolves a network address to a hardware address
– Most ARP implementations do not conform to the standard
– Shim IPC process assumes RFC826 compliant implementation
30
30. Usage of ARP
• Maps the application process name to a shim IPC
Process address (MAC address)
– Application process name is transformed into a network
protocol address
Process name: My_IPC_Process
Process instance: 1
My_IPC_Process/1/Management/2
Entity name: Management
Entity instance: 2
– Application registration adds an entry in the local ARP cache
• Flow allocation request results in an ARP request/reply
– Instantiates a MAC protocol machine equivalent of DTP (cf.
Flow Allocator)
IRATI - Investigating RINA as an Alternative to TCP/IP
32. PDU Forwarding Table Generator
Requirements and general choices
It’s all policy!
• Every DIF can do it its own
way
• We start with a link-state
routing approach
33
33. PDU Forwarding Table Generator
High-level view and relationship to other IPC Process components
IPC Process
PDU Forwarding Table Generator
Enrollment
Task
Events
N-1 flow allocated
N-1 flow deallocated
N-1 flow down
N-1 flow up
Update
knowledge on N1 flow state
Propagate
knowledge on N1 flow state
Events
Enrollment completed
successfully
Neighbor B invoked write
operation on object X
PDU Forwarding
Table
Recompute
forwarding table
Lookup PDU Forwarding
table to select output N-1
flow for each PDU
Invoke write operation on
object X to neighbor A
Relaying and
Multiplexing Task
RIB Daemon
5
6
7
1
2
3
4
N-1 Flows to nearest neighbors
(Layer management)
CDAP
Incoming CDAP messages
from neighbor IPC Processes
CDAP
Resource
Allocator
Outgoing CDAP messages to
neighbor IPC Processes
N-1 Flows to nearest
neighbors (Data Transfer)
34
34. Plans for Year 2
• Shim DIF for Hypervisors
– Enable communications between VMs in the same physical
machine without using the networking subsystem
• Updated shim DIF over TCP/UDP
– Current version requires manual discovery of mappings of app
names to IP addresses and TCP/UDP ports, investigate the use of
DNS
• Updated PDU Forwarding Table Generator
– Based on lessons learned from implementation and experimentation
• Feedback to EFCP
– Based on implementation and experimentation experience
• Faux sockets API
35
35. Agenda
• Project overview
• Use cases
– Basic scenarios (Phases 1 and 2)
– Advanced scenarios (Phases 2 and 3)
• Specifications
– Shim DIF over 802.1Q
– PDU Forwarding Table Generator
– Y2 plans
• Software development
–
–
–
–
High level software architecture
User-space
Kernel-space
Wrap-up
• Experimental activities
–
–
–
–
Intro, goals, Y1 experimentation use case
Testbed and results at i2CAT OFELIA island
Testbed and results at iMinds OFELIA island
Y2 plans
36
37. Project’ targets and timeline (SW)
• IRATI SW goals:
•
• fx
•
Release 3 SW prototypes in 2 years
• Each prototype provides incremental functionalities
• 1st prototype: basic functionalities (unreliable flows)
• Comparable to a UDP/IP
• 2nd prototype: “complete” stack (reliable flows + routing)
• Comparable to a TCP/IP
• 3rd prototype: enhancements (hardened proto + RINA over IP + …)
• More product-like than prototype-like
• Glancing at extendibility, portability, performances & usability
The SW components lay at both kernel & user spaces
Investigating RINA as an Alternative to TCP/IP
38
38. Problems …
• Problems are mostly SW-engineering related
– Time constrained
1.
2.
3.
Ref-specs → HL arch
HL arch → detailed design
Detail design → implementation, debug, integration …
• Since the IRATI stack spans user and kernel spaces…
• User-space problems (as usual):
–
–
–
–
–
Memory (e.g. corruptions, leaks)
Bad logic (e.g. faults)
Concurrency (e.g. dead-locks, starvation)
…
Anything that special (but … time consuming for sure)
Investigating RINA as an Alternative to TCP/IP
39
39. … and problems
• Kernel space problems are the user-space ones PLUS:
– A harsher environment, e.g.
• The develop, install & test cycle is (a lot) slower
– Huge code-base (takes lot to compile)
– Faults in the kernel code may bring the whole host down
– Reboot s are usually required to test a new “version” (at early stages)
• C is “the” language → less expressive than others in userland
• No “external libraries” …
– The kernel is “cooperative”, e.g.
• Stack & heap handling must be “careful”, e.g.
– Memory corruptions could propagate everywhere
– Different mechanics, e.g.
• Mutex, semaphores, spinlocks, rcus … coupled with un-interruptable
sleeps
– Syscalls may sleep … but spinlocks can’t be held while “sleeping”
• No recursive locking
• Memory allocation is in different flavours: NOWAIT, NOIO, NOFS …
– ... … …
Investigating RINA as an Alternative to TCP/IP
40
40. Outline
• Introduction
• High level software architecture
• Detailed software architecture
– Kernel space
– User space
• Wrap-up
Investigating RINA as an Alternative to TCP/IP
41
41. Splitting the spaces: user vs kernel
Fast/slow paths → user vs kernel
• We split the “design” in different “lanes” and placed SW
components there, depending on their timing requirements
– Fast-path → stringent timings → kernel-space
– Slow-path → loose timings → user-space
• ... looking for our optimum
– fiddling with time/easiness/cost/problems/schedule/final-solution etc.
User
Kernel
Kernel
User
Investigating RINA as an Alternative to TCP/IP
43
42. API & kernel
•
OS Processes request services to the kernel with
syscalls
–
–
•
Modern *NIX systems extend the user/kernel
communication mechanisms
–
•
User OR kernel originated
Multicast/broadcast
We adopted syscalls and Netlink
–
Syscalls (fast-path):
•
–
Application
Application
Application
Application
Application
M
Netlink, uevent, devfs, procfs, sysfs etc.
We wanted a “bus-like” mechanism: 1:1/N:1,
user/kernel & user/user
–
–
•
User originated (user → kernel)
Unicast
Bootstrapping & SDUs R/W (fast-path)
Netlink(mostly slow-path):
•
We introduced a RINA “family” and its related
messages
IPC Process
IPC Process
Daemon
IPC Process
Daemon
Daemon
IPC Manager
Daemon
N
1
User
Kernel
Kernel
1
(*) Bootstrapping needs: Syscalls create kernel components
which will be using Netlink functionalities later on
Investigating RINA as an Alternative to TCP/IP
44
43. Introducing librina
• Syscalls are “wrapped” by libc (kernel abstraction)
– i.e. syscall(SYS_write, …) → write(…)
– glibc in a OS/Linux
• Changes to the syscalls → changes to glibc
– Breaking glibc could break the whole host
• Sandboxed environments are necessary
– Dependencies invalidation → Time consuming compilations
– That sort of changes are really hard to get approved
upstream
– etc.
• We introduced librina as the initial way to overcome
these problems …
– … use IRATI in a host without breaking the whole system
Investigating RINA as an Alternative to TCP/IP
45
44. librina
• It is more a framework/middleware than a library
– It has explicit memory allocation (no garbage collection)
– It’s event-based
– It’s threaded
• Completely abstract the interactions with the kernel
– syscalls and Netlink
• Adds functionalities upon them
• Provides them to userland (apps & daemons)
– Static/dynamic linking (i.e. for C/C++ programs)
– Scripting language extensions (i.e. Java)
Investigating RINA as an Alternative to TCP/IP
46
45. librina interface
• librina contains a set of “components”:
– Internal components
– External components
• And a portable framework to build components on
top, e.g.:
– Patterns: e.g. singletons, observers, factories, reactors
– Concurrency: e.g. threads, mutexes, semaphores, condition
variables
– High level “objects” in its core
• FlowSpecification, QoSCube, RIBObject etc.
• Only the “external “components are “exported” as
classes
Investigating RINA as an Alternative to TCP/IP
47
46. librina core (HL) SW architecture
• Configure PDU Forwarding Table
• Create / delete EFCP instances
• Allocation of kernel resources to support a flow
• Creation
• Deletion
• Configuration
Application
eventPoll()
eventWait()
• Allocate / deallocate flows
• Read / write SDUs to flows
• Register/unregister to 1+ DIF(s)
eventPost()
common
cdap
faux-sockets
sdu-protection
ipc-process
ipc-manager
application
API
framework
Core components
Event Queue
NetlinkManager
librina
NetlinkSession
NetlinkSession
NetlinkSessions
RINA
Manager
nl_send() / nl_recv()
Syscall wrappers
syscall(SYS_*)
libnl / libnl_genl
User
kernel
RINA Netlink
Investigating RINA as an Alternative to TCP/IP
RINA syscalls
50
47. How to RAD, effectively ?
• OO was the “natural” way to represent the RINA entities
• We embraced C++ as the “core” language for librina:
– Careful usage produces binaries comparable to C
– The STL reduces the dependencies
• in the plain C vs plain C++ case
– Producing C bindings is possible
– …
…
• There was the ALBA prototype already working …
• … and ALBA has RINABand …
• BUT that prototype is Java based …
Investigating RINA as an Alternative to TCP/IP
51
48. Interfacing librina to other languages
• We “adopted” SWIG: the Software Wrapper and Interface
Generator
• SWIG “automatically” generates all the code needed to
connect C/C++ programs to scripting languages
– Such as Python, Java and many, many others …
example.h
int fact(int n);
example.c
#include "example.h"
example.i
/* File: example.i */
%module example
SWIG
%{
#include "example.h"
%}
High level
wrapper
int fact(int n);
int fact(int n) { … }
Low level
wrapper
example_wrap.c
GCC
Native
interface
libexample.so
Investigating RINA as an Alternative to TCP/IP
example.py
Python
52
49. librina wrapping
• Wrapping “cost”:
– The wrappers (.i files) are small: ~480 LOCs
– They produce ~13.5 KLOCS bindings → ~1/28 ratio …
• The wrappers are the only thing needed to obtain the
bindings for a scripting language
– SWIG support vary on the target language, i.e.
• Java: so-so (not all data-types mapped natively)
• Python: good
• …
– Our wrappers contain only the missing data-type mappings for
Java
• Java interface = C++ interface
• Bindings for other languages (i.e. Python) are expected to
be straightforward
Investigating RINA as an Alternative to TCP/IP
53
50. High level software architecture
RINABand HL
RINABand HL
ipcpd
ipcmd
RINABand LL
rinad
(Java)
Language X
imports
Third parties
SW Packages
(Applications)
Java “imports”
SWIG HL wrappers
(Language X)
SWIG HL wrappers (Java)
JNI
Language X “NI”
SWIG LL wrappers
(C++, for language X)
SWIG LL wrappers
(C++, for Java)
librina
API (C)
Static/dynamic
linking
API (C++)
Core (C++)
libnl / libnl-gen
syscalls
Netlink
Kernel
Investigating RINA as an Alternative to TCP/IP
54
52. The Linux object model
•
Linux has its “generic” object abstraction: kobject, kref and kset
Garbage collection &SysFS integration
structkref { atomic_trefcount; }
Naming &sysfs
structkobject {
const char *
name;
structkset {
structlist_headentry;
structlist_headlist;
structkobject *
parent;
spinlock_tklist_lock;
structkset *
kset;
structkobjectkobj;
structkobj_type *
ktype;
const structksetset_uevent_ops * uevent_ops;
structsysfs_dirent * sd;
};
structkrefkref;
unsigned int state_initialized:1;
unsigned int state_in_sysfs:1;
Objects (dynamic) [re-]parenting
unsigned int state_add_uevent_sent:1;
unsigned int state_remove_uevent_sent:1;
(loosely typed)
unsigned int uevent_suppress:1;
};
Objects grouping
SysFS integration
•
Generic enough to be applied “everywhere”
References counting (explicit)
– E.g. FS, HW Subsystems, Device drivers
Investigating RINA as an Alternative to TCP/IP
56
53. kobjects, ksets and krefs in IRATI
• They are the way to go for embracing OOD/OOP kernel-wide
• If the design has a “limite scope” the code gets bloated for:
– Ancillary functions & data structures
– (unnecessary) Resources usage
• We don’t need/want all these functionalities (everywhere):
– Reduced (finite) number of classes
• We don’t have the needs of a “generic kernel”
– Reduced concurrency (can be missing, depending on the object)
– Object parenting is “fixed”(obj x is always bound to obj y)
• E.g. DTP/DTCP are bound to EFCP …
– Not all our objects have to be published into sysfs
– We have different lookups requirements
• No needs to “look-up by name” every object
– Inter-objects bindings shouldn’t loose the object’ type
– …
Investigating RINA as an Alternative to TCP/IP
57
54. Our OOP/OOD approach
•
•
•
•
We adopted a (slightly) different OOD/OOP approach
(almost) Each “entity” in the stack is an “object”
All our “objects” provide a basic common interface & behavior
They have no implicit embedded locking semantics
structobject_t{ … };
API opaque
structobj_ops_t {
result_x_t (* method_1)(object_t * o, …);
…
result_y_t (* method_n)(object_t * o, …);
};
Static
Dynamic
vtable (if needed)
intobj_init(object_t * o, …);
void
obj_fini(object_t * o);
Interruptablectxt
object_t * obj_create(…);
object_t * obj_create_ni(…);
intobj_destroy(object_t * o);
Non-interruptablectxt
intobj_<method_1>(object_t * o, …);
...
intobj_<method_n>(object_t * o, …);
vtable proxy (if needed)
Investigating RINA as an Alternative to TCP/IP
58
55. OOD/OOP & the framework
• This approach:
– Reduces the stack (overall) bloating
• no krefs, spinlocks, sysfs etc. where unnecessary
• Only objects requiring sysfs, debugfs and/or uevents embed a kobject
– (or it is comparable)
• E.g. the same bloating related to _init, _fini, _create and _destroy
– Speeds-up the developments
– Helps debugging
• (re-)Parenting is constrained to specific objects
• No loose-typing → type-checking is maintained (no casts)
– Decouples (mildly) from the underlying kernel
• With these assumptions we built our framework
– Basic components: robj, rmem, rqueue, rfifo, rref, rtimer, rwq, rmap,
rbmp
– OOP facilities/Patterns: Factories, singletons, facades, observers,
flyweights, publisher/subscribers, smartpointers, etc.
– Ownership-passing + smart-pointing memory model
Investigating RINA as an Alternative to TCP/IP
59
56. The HL software architecture (Y1)
rinad
RINABand HL
ipcpd
Third parties
SW Packages
ipcmd
SWIG HL wrappers (Java)
SWIG HL wrappers
(Language X)
SWIG LL wrappers
(C++, for Java)
rinad
SWIG LL wrappers
(C++, for language X)
User
space
librina
librina
Framework
API (C)
API (C++)
Core (C++)
libnl / libnl-gen
syscalls
Netlink
Personality mux/demux
KIPCM
core
RNL
IPCP Factories
Framework
Kernel
space
KFA
kernel
KIPCM
shim-eth-vlan
Normal IPC P.
PFT
RMT
EFCP
shim-dummy
RINA-ARP
Investigating RINA as an Alternative to TCP/IP
62
57. The API exposed to user-space:
KIPCM + RNL
• Kernel interface = syscalls + Netlink messages
• KIPCM:
– Manages the syscalls
• Syscalls: a small-numbered, well defined set of calls (#8) :
– IPCs: ipc_create and ipc_destroy
– Flows: allocate_portand deallocate_port
– SDUs: sdu_read, sdu_write, mgmt_sdu_read and mgmt_sdu_write
• RNL:
– Manages the Netlink part
• Abstracts message’s reception, sending, parsing & crafting
• Netlink: #36 message types (with dynamic attributes):
– assign_to_dif_req, assign_to_dif_resp, dif_reg_notif, dif_unreg_notif…
• Partitioning:
– Syscalls→ KIPCM → “Fast-path” (read and write SDUs)
– Netlink→ RNL → “Slow-path” (mostly conf and mgmt)
Investigating RINA as an Alternative to TCP/IP
63
58. KIPCM & KFA
•
The KIPCM:
– Counterpart of the IPC Manager in user-space
– Manages the lifecycle the IPC Processes and KFA
– Abstract IPC Process instances
• Same API for all the IPC Processes regardless the
type
• maps: ipc-process-id → ipc-process-instance
•
KIPCM
KFA
Manages ports and flows
– Ports
• Flow handler and ID
• Port ID Manager
– Flows
• maps: port-id → ipc-process-instance
Normal
IPCP
EFCP
Both “bind” the kernel stack:
–
–
•
syscalls
Netlink
The KFA
–
•
User space
Top: user-interface
Bottom: ipc processes (maps)
–
When KIPCM calls KFA to inject/get SDUs:
• N-IPCP → EFCP → RMT → PDU-FWD → Shim/IPC
Process
Shim
IPCP
RMT
They are the Initial point where “recursion” is
transformed into “iteration”
Investigating RINA as an Alternative to TCP/IP
PDU-FWD-T
OUT
IN
64
59. The RINA Netlink Layer (RNL)
• Integrates Netlink in the SW framework
– Hides all the configuration, generation and destruction of Netlink sockets and
messages from the user
– Defines a Generic Netlink family (NETLINK_RINA) and its messages
Investigating RINA as an Alternative to TCP/IP
66
60. The IPC Process Factories
• They are used by IPC Processes to publish/unpublish their availability
– Publish:
• x = kipcm_ipcp_factory_register(…, char * name, …)
– Unpublish:
• kipcm_ipcp_factory_unregister(x)
• The factory name is the way KIPCM can look for a
specific IPC Process type
– It’s published into sysfs too
• There are two “major” types of IPC Processes :
– Normal
– Shims
Investigating RINA as an Alternative to TCP/IP
67
61. The IPC Process Factories Interface
• Factory operations are the same for both types
• Upon registration
– A factory publishes its hooks
.init
.fini
.create
.destroy
.configure
→
→
→
→
→
x_init
x_fini
x_create
x_destroy
x_configure
• Upon user-request (ipc_create)
– The KIPCM creates a particular IPC Process instance
1.
2.
3.
4.
Looks for the correct factory (by name)
Calls the .create “method”
The factory returns a “compliant” IPC Process object
Binds that object into its data model
• Upon un-registration
– The factory triggers the “destruction” of all the IPC Processes
it “owns”
Investigating RINA as an Alternative to TCP/IP
68
62. IPC Process Instances
• The .create provided to the factories returns an IPC
Process “object”
• There are two “major” types of IPC Processes:
– Normal
– Shims
• Regardless of its type
– The interface is the same
– Each IPC Process implements its “core” code:
• Shim IPC Process:
– Each Shim IPC Processes provide its implementation
• Normal IPC Process:
– The stack provides an implementation for all of them
Investigating RINA as an Alternative to TCP/IP
69
63. IPC Process Instances Interface
• The IPC Process “object”
• instance_data
• instance_ops
• The IPC Process Interface is the same for all types,
but each type decides which ops will support
– Some are specific for normal or shim, a few are
common to both
instance_ops
•
•
•
•
•
•
•
.application_register
= x_application_register
.application_unregister = x_application_unregister
.assign_to_dif
= x_assign_to_dif
.sdu_write
= x_sdu_write
.flow_allocate_request = shim_allocate_request
.flow_allocate_response = shim_allocate_response
.flow_deallocate
= shim_deallocate
•
•
•
•
•
•
•
.connection_create
= normal_ connection_create
. connection_update
= normal _ connection_update
. connection_destroy
= normal _ connection_destroy
.connection_create_arrived = normal _connection_arrived
.pft_add
= normal_pft_add
. pft_remove
= normal_pft_remove
. pft_dump
= normal_pft_dump
– They support similar functionalities (except the PFT’s)
– How they translate into ops depends on the type
Investigating RINA as an Alternative to TCP/IP
70
66. Shim IPC Processes
• The shims are the “lowest” components in the kernelspace
• They have two interfaces:
– NB: The same for each shim, represented by hooks published
into KIPCM factories
– SB: Depends on the technology
• There are currently 2 shims:
– shim-dummy:
• Confined into a single host (“loopback”)
• Used for debugging & testing the stack
– shim-eth-vlan:
• As defined in the spec, runs over 802.1Q
Investigating RINA as an Alternative to TCP/IP
73
71. Introduction to the user space framework
IPC Manager
Daemon
Main logic
IDD
RIB & RIB
Daemon
Manageme
nt agent
Normal IPC Process
IPC(Layer Management)
Process Daemon
Enrollment
(Layer Management)
librina
Application A
Application A
Application A
Application logic
Netlink
sockets
System
calls
Netlink
sockets
Sysfs
Netlink
sockets
PDU
Forwarding
Table
Generation
Flow
allocation
librina
System calls
RIB & RIB
Daemon
Resource
allocation
librina
System calls
Netlink
sockets
Sysfs
User space
Kernel
•
•
•
IPC Manager Daemon: Broker between apps & IPC Processes, central point of Management
in the system
IPC Process Daemon: Implements the layer management components of an IPC Process
Librina: Abstracts out the communication details between daemons and the kernel
79
73. The IPC Process and IPC Manager
Daemons
• IPC Manager Daemon
–
–
–
–
Manages the IPC Processes lifecycle
Broker between applications and IPC Processes
Local management agent
DIF Allocator client (to search for applications not available through local DIFs)
• IPC Process Daemon
– Layer Management components of the IPC Process
• RIB Daemon, RIB,
• CDAP parsers/generators
• CACEP
• Enrollment
• Flow Allocation
• Resource Allocation
• PDU Forwarding Table Generation
• Security Management
81
74. IPC Manager Daemon
Message
Message
IPC Manager Daemon (Java)
classes
Console
classes
classes
IPC Manager core classes
IPC Process
Manager
Flow Manager
Application
Registration
Manager
Call operation on IPC
Manager core classes
Command
Line
Interface
Server
Thread
Operation result
Call IPC Process Factory, IPC
Process or Application
Manager
local TCP
Connection
CLI Session
Message
Call operation on IPC
Manager core classes
Main event
loop
Message
Configura
classes
classes
tion
classes
Bootstrapper
Configuration file
EventProducer.eventWait()
EventProducer.eventWait()
SWIG Wrappers (high-level, Java)
Java Native Interface (JNI)
SWIG Wrappers (Low-level, C++)
librina (C++)
IPC
Process
IPC Process
Factory
Message
Message
classes
Model
classes
classes
Message
Message
classes
Event
classes
classes
Event
Producer
Application Manager
System calls
Netlink Messages
83
75. IPC Process Daemon
IPC Process Daemon (Java)
Supporting classes
Delimite
r
CDAP
parser
Encoder
Layer Management function classes
Enrollment
Task
Flow
Allocator
Resource
Allocator
Registration
Manager
Forwarding
Table
Generator
RIB Daemon
Resource
Information
Base (RIB)
RIBDaemon.
sendCDAPMessage()
RIBDaemon.cdapMessageReceived()
Call IPCManager or
KernelIPCProcess
CDAP
Message
reader
Thread
Main event
loop
EventProducer.eventWait()
KernelIPCProcess.writeMgmtSDU()
KernelIPCProcess.readMgmtSDU()
SWIG Wrappers (high-level, Java)
Java Native Interface (JNI)
SWIG Wrappers (Low-level, C++)
librina (C++)
KernelIP
C
Process
IPC
Manager
System calls
Message
Message
classes
Model
classes
classes
Message
Message
classes
Event
classes
classes
Netlink Messages
Event
Producer
85
76. Example workflow : IPC Process creation
•
The IPC Manager reads a configuration file with instructions on the IPC
Processes it has to create at startup
–
•
Or the system administrator can request creation through the local console
The configuration file also instructs the IPC Manager to register the IPC
Process in one or more N-1 DIFs, and to make it member of a DIF
3. Initialize librina
4. When completed notify IPC Manager (NL)
local TCP
Connection
10. Update state and forward to Kernel (NL)
5. IPC Process initialized (NL)
CLI Session
OR
8. Notify IPC Process registered (NL)
IPC Manager Daemon
9. Assign to DIF request (NL)
IPC Process
Daemon
13. Assign to DIF response (NL)
Configuration file
1. Create
IPC Process
(syscall)
6. Register
2.
app
Fork(syscall
request(NL)
)
7. Register app
response (NL)
11. Assign to
DIF request
(NL)
12. Assign to
DIF response
(NL)
User space
Kernel
86
77. Example workflow : Flow allocation
•
An application requests a flow to another application, without
specifying what DIF to use
2. Check app permissions
3. Decide what DIF to use
4. Forward request to adequate IPC Process Daemon
5. Allocate Flow Request (NL)
1. Allocate Flow
Request (NL)
IPC Manager
Daemon
12. Forward response to app
Application A
13. Allocate Flow
Request Result (NL)
14. Read data from the
flow (syscall) or write
data to the flow (syscall)
User space
11. Allocate Flow Request Result (NL)
IPC Process
Daemon
6. Request port-id (syscall)
7. Create connection request (NL)
8. On create connection response
(NL), write CDAP message to N-1
port (syscall)
9. On getting an incoming CDAP
message response (syscall),
update connection (NL)
10. On getting update connection
response (NL) reply to IPC
Manager (NL)
Kernel
87
79. Y1: Where we are / What do we have…
• 9 months, ~3700 commits and ~214 KLOCs later …
–
–
–
–
~27 KLOCs in the kernel;
~87 KLOCs in the librina (hand-written);
~35 KLOCS in the librina (automatically generated);
~65 KLOCs in rinad
• .. the project released its 1st prototype (internal release):
– User and kernel space components providing unreliable flow
functionalities
– We have the building|configuration|development frameworks
– A testing framework
• A testing application (RINABand, compilation-time)
• A regression framework (ad-hoc, run-time)
• We’re actively working on the 2nd prototype
Investigating RINA as an Alternative to TCP/IP
89
80. Y2: Plans …
• Prototype 2:
– Reliable flows support
– Shim DIF for HV
• Same schema as shim-dummy/shim-eth-vlan as in prototype 1
– Complete routing
– Public release as FOSS (July 2014)
• Prototype 3:
– Shim DIF over TCP/UDP
• same schema as prototype 2
– Faux sockets API via
1. FI: Functions interposition (dynamic linking)
2. SCI: System calls interposition (static linking)
Investigating RINA as an Alternative to TCP/IP
90
81. Agenda
• Project overview
• Use cases
– Basic scenarios (Phases 1 and 2)
– Advanced scenarios (Phases 2 and 3)
• Specifications
– Shim DIF over 802.1Q
– PDU Forwarding Table Generator
– Y2 plans
• Software development
–
–
–
–
High level software architecture
User-space
Kernel-space
Wrap-up
• Experimental activities
–
–
–
–
Intro, goals, Y1 experimentation use case
Testbed and results at i2CAT OFELIA island
Testbed and results at iMinds OFELIA island
Conclusions
92
84. IRATI experimentation in a nutshell
Phase I
Phase III
Phase II
PSOC
OFELIA
OFELIA
iLab.t
iLab.t
iLab.t
EXPERI
MENTA
OFELIA
EXPERI
MENTA
Investigating RINA as an Alternative to TCP/IP
OFELIA
EXPERI
MENTA
95
86. Available Tools
• Rinaband
RINABand
1
RINABandClient
1
Data
Contr
– Test application for RINA
AE
ol
AE
– Java (user space)
– Requires multiple flows between to Api’s
1 control flow
N data flows
Contr
ol
AE
Data
AE
DIF
• Echoserver/client
– test parameters number and size of SDUs to be sent
– Ping-like operation
– The test completes when either all the SDUs have been sent and
received, or when more than a certain interval of time elapses
without receiving an SDU.
– client and server report statistics
• the number of transmitted and received SDUs
• time the test lasted.
– Single flow between two Api’s
Investigating RINA as an Alternative to TCP/IP
97
87. First Phase Prototype capabilities
• Capabilities
– Decision to focus on the Shim- ETH-VLAN
– Supports only a single flow between two APi’s
Preamble
MAC dest
MAC src
802.1q
header
(optional)
Ethertype
Payload
FCS
Interframe
gap
7 bytes
6 bytes
6 bytes
4 bytes
2 bytes
42-1500
bytes
4 bytes
12 bytes
• Impact on experiments
– Could not use RinaBand
– Rely on Echoserver/client application
Investigating RINA as an Alternative to TCP/IP
98
89. First phase use case
Investigating RINA as an Alternative to TCP/IP
100
90. Single flow echo/bw test
•Validate Stack / Prototype 1
•Validate Ethernet transparency
•Measure goodput
Investigating RINA as an Alternative to TCP/IP
101
91. Multiple flow echo/bw validation
•Validate multiple IPC processes
•Measure goodput
Investigating RINA as an Alternative to TCP/IP
102
92. Concurrent RINA and IP
•Validate concurrency IP and RINA stack
•Measure goodput
Investigating RINA as an Alternative to TCP/IP
103
93. Presented by Leonardo Bergesio
FIRST PHASE RESULTS @ I2CAT
Investigating RINA as an Alternative to TCP/IP
104
94. i2CAT OFELIA Island, EXPERIMENTA
• Experiment == slice
• FlowSpace:
– Arbitrary Topology
– Partition of the
vectorial space of OF
header fields
– Slicing by VLANs
• VMs to be used as
end points or
controllers
• Perfect march:
– SLICE VLAN Shim DIF over Ethernet
Investigating RINA as an Alternative to TCP/IP
105
95. Workflow I
• Access island using OCF. Create or access your
project/slice
Investigating RINA as an Alternative to TCP/IP
106
96. Workflow II
• Select FlowSpace Topology and slice VLAN/s (DIFs)
Investigating RINA as an Alternative to TCP/IP
107
97. Workflow III
• Create VMs Nodes and OpenFlow Controller
Investigating RINA as an Alternative to TCP/IP
108
99. Single flow
Packets are sent over the Ethernet/VLAN bridge
Goodput roughly 60% of Link capacity (iperf tested)
Investigating RINA as an Alternative to TCP/IP
Project: IRATIbasicusecase
Slice:
multivlanslice
111
100. Multiple flows
Flows to shared server (B & C to D)achieved half
the throughput than the single flow (A to B)
Investigating RINA as an Alternative to TCP/IP
Project: IRATIbasicusecase
Slice:
multivlanslice
112
101. Concurrency between IP and RINA
stack
Project: IRATIbasicusecase
Slice:
multivlanslice
UDP
Time Interval
90s
Nº of datagrams
554915
Data sent
778 MB
BW
75.5 Mbps
Investigating RINA as an Alternative to TCP/IP
113
102. FIRST PHASE RESULTS @ IMINDS
Investigating RINA as an Alternative to TCP/IP
114
115. Conclusions from phase I
experimentation
•
•
•
•
IRATI stack and Shim DIF are running
~60% goodput in comparison to iperf
No major performance problems
When running concurrently, the IRATI stack take
precedence over the IP stack
– our stack doesn't loose a packet from syscalls to devs-layer
• ARP in Shim DIF should not reuse 0x0806 ETHERTYPE
because of incompatibility with existing
implementations
• Registration to Shim-DIF over Ethernet should be
explicit
Investigating RINA as an Alternative to TCP/IP
127
116. Thanks for your attention!
Questions?
Investigating RINA as an Alternative to TCP/IP
Hinweis der Redaktion
The shim DIF over Ethernet wraps an Ethernet layer with the RINA API and presents it to the layer above as if it was a regular DIF (usually with restricted capabilities; very seldom current technologies provide a fully-formed layer). The only intended user of an Ethernet shim DIF is a normal IPC Process, as discussed in the shim DIF specification.
A shim DIF over Ethernet maps to a VLANThe DIF name is the VLAN nameThe shim DIF only supports on class of service: unreliableARP can be used to map upper layer IPC Process names to shim DIF addresses (MAC addresses)Spans a single Ethernet segment
The librina package contains all the IRATI stack libraries that have been introduced to abstract from the user all the kernel interactions (such as syscalls and Netlink details). Librina provides its functionalities to user-space RINA programs via scripting language extensions or statically/dynamically linkable libraries (i.e. for C/C++ programs). Librina is more a framework/middleware than a library: it has its own memory model (explicit, no garbage collection), its execution model is event-driven and it uses concurrency mechanics (its own threads) to do part of its work. Rinad instead, contains the IPC Manager and IPC Process daemons as well as a testing application (RINABand). The IPC Manager is the core of IPC Management in the system, acting both as the manager of IPC Processes and a broker between applications and IPC Processes (enforcing access rights, mapping flow allocation or application registration requests to the right IPC Processes, etc.). IPC Process Daemons implement the layer management components of an IPC Process (enrollment, flow allocation, PDU Forwarding table generation or distributed resource allocation functions). For more details on the rationale behind this high-level architecture, interested readers might refer to the relevant sections in D2.1 [3]. Rinad also provides a couple of example/utility applications that serve two purposes: i) provide an example of how an application uses librina and ii) allow testing/experimentation with the IRATI stack by measuring some properties of the IPC service as perceived by the application (flow allocation time, goodput in terms of bytes read/write per second or mean delay)
Model classes: These classes model objects that abstract different concepts related to the services provided by librina, such as: application names, flow specifications, RIB objects, neighbours and connections. Model classes contain information on the modelled objects, but do not provide operations to perform actions other than updating or reading the object’s state. Proxy classes: These classes model ‘active entities’ within librina, meaning that they provide operations to perform actions on these entities. These actions result in the invocation of librina internals either to send a Netlink message to another user spaceprocess or the kernel; or to invoke a system call. For instance, librina-application provides an ‘IPCManager’ proxy class that allows an application process to request the allocation or deallocation of flows to the IPC Manager Daemon. Another example can be found in the ‘IPC Process’ class available at librina-ipcmanager: this proxy class allows the IPC Manager daemon to invoke operations on the user-space or kernel components of an IPC Process. Event classes: librina is event-based. Invocation of proxy classes operations that cause the emission of a Netlink message return right away, without waiting for the Netlink message response. The response will be later obtained as one of the events received through the EventConsumer class. Event classes are the ones that encapsulate the information of the different events, discriminated by event type. Examples of events include results of flow allocation/deallocation operations or results of application registration/unregistration operations, just to name a few. EventProducer: This class allows librina users to access the events originated from the responses to the operations requested through the Proxy classes. The event producer provides blocking, non-blocking and time-bounded blocking operations to retrieve pending events. The librina core components process two types of inputs: operations invoked via Proxy classes at the API level or Netlink messages received via the Netlink socket bounded to librina – created at initialization time. Operations invoked via proxy classes can follow two processing paths that either result in the invocation of a system call or on the generation of a Netlink message. In the former case processing is very simple: invocations of proxy operations are mapped to system call wrappers that make the required system call to the kernel (such as readsdu, writesdu, createipcprocess or allocateportid). The latter case involves more processing, as explained in the following: Concurrency classes: Concurrency classes provide an object-oriented wrapper to the OS threading functionalities. It is internally used by librina, but also exposed to librina users in case they want to use it as a way of avoiding external dependencies or intermixing different threading libraries (as it is the case of the IPC Manager and IPC Process daemons). Message classes: These classes provide an object-oriented model of the different Netlink messages that can be sent or received by librina. The basic message class BaseNetlinkMessage’ models all the information required to generate/parse the header of a Netlink message, including the Netlink header (source port-id, destination port-id and sequence number), the Generic Netlink family header (family and operation-code) and the RINA family header (source and destination IPC Process ids). The different message classes extend the base class by modelling the information that is sent/received as Netlink message attributes in the different messages. NetlinkManager: This class provides an object-oriented wrapper of the functions available at the libnl/libgnl libraries (these libraries provide functions to generate, parse, send and receive Netlink messages). The wrapping is partial since only the functionality required by librina has been wrapped. In the ‘output path’ the NetlinkManager takes a message class, generates a buffer, adds the NL message header to the buffer, passes the message class and the buffer to the NL formatter classes (which will add NL attributes to the buffer) and finally passes the buffer to libnl to send the message. In the ‘input path’ – upon calling the blocking ‘getMessage’ operation – the IPC Manager blocks until libnl returns a buffer containing a NL message, then it parses the header, requests the NL parser classes to parse the NL attributes and return the appropriate message class, and returns. NetlinkMessage Parsers/Formatters: The goal of these classes is either to generate the attributes of a NL message based on the contents of a message class (formatting role) or to create and initialize a message class based on the attributes of a NL message (parsing role). In order to ensure that all the NL messages are received in a timely fashion, librina-core has an internal thread that is continuously calling the blocking NetlinkManager ‘getMessage’ operation. When the operation returns the thread converts the resulting Message class to an Event class, and puts the Event class to an internal events queue. When a librina user calls the EventConsumer to retrieve an event, the EventConsumer tries to retrieve an element from the events queue by invoking the eventPoll (non-blocking), eventWait (blocking) or eventTimedWait (blocking but time-bounded) operation. All librina components use an internal lightweight logging framework instead of an external one in order to minimize librina dependencies, since the goal is to facilitate deploying it within several OS/Linux systems.
The IPC Manager Daemon is the main responsible for managing the RINA stack in the system. It manages the IPC Process lifecycle, acts as the local management agent for the system and is the broker between applications and IPC Processes (filtering the IPC resources available to the different applications in the system). As introduced in section 2.2.2 the first phase prototype of the IPC Manager has been developed in Java, leveraging part of the Alba prototype codebases. Moreover, the current IPC Manager Daemon is not a complete implementation, since it does not implement the local management agent yet (therefore the RINA stack cannot be managed through a centralized DIF Management System).The IPC Process Daemon performs the layer management functions of a single IPC Process. It is therefore “half” of the IPC Process application, while the other half – dealing with data-transfer and data-transfer control related tasks - is located at the kernel. Layer management operations are more complex and do not have such stringent performance requirements as data transfer operations, therefore locating them at user-space is a logical choice, as introduced in D2.1.
Figure 20 shows a schema of the detailed IPC Manager Daemon software design. It is a Java OS process that leverages the operations provided by the librina API through the wrappers generated by SWIG and the Java Native Interface (JNI). In concrete, librina-ipc-manager provides the following proxy classes to the IPC Manager Daemon: IPC Process Factory. Enables the creation, destruction and enumeration of the different types of IPC Processes supported by the system. IPC Process. Allows the IPC Manager to request operations to IPC Processes such as assignment to DIFs, configuration updates, enrolment, registrations of applications or allocations/deallocations of flows. Application Manager. Provides operations to inform applications about the results of pending requests such as allocation of flows or registrations of applications. When the IPC Manager Daemon initializes it reads a configuration file from a well-known location. This configuration file provides default values for system parameters, describes configurations of well-known DIFs and controls the behaviour of the IPC Manager bootstrap process. The latter is achieved by specifying: The IPC Processes that have to be created at system start-up, including their name and type. For each IPC Process to be created, the names of the N-1 DIFs where the IPC Process has to be registered (if any). For each IPC Process to be created, the name of the DIF that the IPC Process is a member of (if any). If the IPC Process is assigned to a DIF it will be initialized with an address and all the other information required to start operating as a member of that DIF (DIF-wide constants, policies, credentials, etc.) When the bootstrapping phase is over the IPC Manager main thread starts executing the event loop forever. The event loop continuously polls librina’sEventProducer (in blocking mode) to get the events resulting from Netlink request messages sent by applications or IPC Processes. When and event happens, the event loop checks its type and delegates the processing of the event to one of the specialized core classes: Flow Manager (flow related events), Application Registration Manager (application-registration related events) or IPC Process Manager (IPC Process lifecycle management related events). The processing performed by these core classes will typically result in the invocation of one of the operations provided by the librina-ipc-process Proxy classes previously described in this section. Local system administrators can interact with the IPC Manager through a Command Line Interface (CLI), accessible via telnet. This console provides a number of commands that allow system administrators to query the status of the RINA stack in the system, as well as performing actions that modify its configuration (such as creating/destroying IPC Processes, assigning them to DIFs, etc.). The IPC Manager supports the CLI console through a dedicated thread that listens at the console port; only one console session at a time is supported at the moment. The current IPC Manager has leveraged the following Alba components, adapting them to the environment of the IRATI stack: Configuration file format, parsing libraries and model classes (the configuration file uses JSON – the JavaScript Object Notation). Command Line Interface Server Thread and related parsing classes. Bootstrapping process.
Figure 20 shows a schema of the detailed IPC Manager Daemon software design. It is a Java OS process that leverages the operations provided by the librina API through the wrappers generated by SWIG and the Java Native Interface (JNI). In concrete, librina-ipc-manager provides the following proxy classes to the IPC Manager Daemon: IPC Process Factory. Enables the creation, destruction and enumeration of the different types of IPC Processes supported by the system. IPC Process. Allows the IPC Manager to request operations to IPC Processes such as assignment to DIFs, configuration updates, enrolment, registrations of applications or allocations/deallocations of flows. Application Manager. Provides operations to inform applications about the results of pending requests such as allocation of flows or registrations of applications. When the IPC Manager Daemon initializes it reads a configuration file from a well-known location. This configuration file provides default values for system parameters, describes configurations of well-known DIFs and controls the behaviour of the IPC Manager bootstrap process. The latter is achieved by specifying: The IPC Processes that have to be created at system start-up, including their name and type. For each IPC Process to be created, the names of the N-1 DIFs where the IPC Process has to be registered (if any). For each IPC Process to be created, the name of the DIF that the IPC Process is a member of (if any). If the IPC Process is assigned to a DIF it will be initialized with an address and all the other information required to start operating as a member of that DIF (DIF-wide constants, policies, credentials, etc.) When the bootstrapping phase is over the IPC Manager main thread starts executing the event loop forever. The event loop continuously polls librina’sEventProducer (in blocking mode) to get the events resulting from Netlink request messages sent by applications or IPC Processes. When and event happens, the event loop checks its type and delegates the processing of the event to one of the specialized core classes: Flow Manager (flow related events), Application Registration Manager (application-registration related events) or IPC Process Manager (IPC Process lifecycle management related events). The processing performed by these core classes will typically result in the invocation of one of the operations provided by the librina-ipc-process Proxy classes previously described in this section. Local system administrators can interact with the IPC Manager through a Command Line Interface (CLI), accessible via telnet. This console provides a number of commands that allow system administrators to query the status of the RINA stack in the system, as well as performing actions that modify its configuration (such as creating/destroying IPC Processes, assigning them to DIFs, etc.). The IPC Manager supports the CLI console through a dedicated thread that listens at the console port; only one console session at a time is supported at the moment. The current IPC Manager has leveraged the following Alba components, adapting them to the environment of the IRATI stack: Configuration file format, parsing libraries and model classes (the configuration file uses JSON – the JavaScript Object Notation). Command Line Interface Server Thread and related parsing classes. Bootstrapping process.
Figure 21 depicts the detailed software design of the IPC Process Daemon. The first phase prototype follows the same approach taken with the IPC Manager Daemon design and implementation: leveraging the Alba stack as much as possible in order to provide a simple but complete enough implementation of the IPC Process Daemon. Therefore the IPC Process Daemon is also a Java OS process that builds on the APIs exposed by librina through SWIG and JNI. The librina proxy classes described below are the more relevant to the IPC Process Daemon operation: IPC Manager. Allows the IPC Process Daemon to communicate with the IPC Manager Daemon, mainly to inform the latter about the results of requested operations; but also to notify about incoming flow requests or flows that have been deallocated. Kernel IPC Process. Provides operations to enable the IPC Process Daemon to communicate with the data-transfer/data-transfer-control related functions of the IPC Process in the kernel. The APIs allow the IPC Process Daemon to modify the kernel IPC Process configuration, to manage the setup and teardown EFCP connections or to modify the PDU forwarding table. IPC Process Daemons are instantiated and destroyed by the IPC Manager Daemon. When the IPC Process Daemon has completed is initialization, the main thread starts executing the event loop. Such a loop is implemented by continuously polling the EventProducer for new events (in blocking mode) and processing them when they arrive. The event processing is delegated to the classes implementing the different layer management functions: Enrollment Task, Resource Allocator, Registration Manager, Flow Allocator and PDU Forwarding Table Generator. Processing performed by these classes typically involves two types of actions: Local actions resulting in communications with the Kernel IPC Process or the IPC Manager, achieved via the librina proxy classes. Remote actions resulting in communications with peer IPC Process Daemons, achieved via the RIB Daemon. The RIB Daemon is an internal component of the IPC Process that provides an abstract, object- oriented schema of all the IPC Process state information. This schema, known as the Resource Information Base or RIB, allows IPC Processes to modify the state of their peers by performing operations on one or more of the RIB objects. The Common Distributed Application Protocol (CDAP) is the application protocol used to exchange the remote RIB operation requests and responses between peer IPC Processes. This protocol allows six remote operations to be performed over RIB objects: create, delete, read, write, start and stop. The objects that are the target of the operation are identified by the following attributes: Object class. Uniquely identifies a certain type of objects. Object name. Uniquely identifies the instance of an object of a certain class. The object class + object name tuple uniquely identify an object within the RIB. Object instance. A shorthand for object class + object name, to uniquely identify an object within the RIB. Scope. Indicates the number of ‘levels’ of the RIB affected by the operation, starting at the specified object (object class + name or instance). This allows a single operation to target multiple objects at once. Filter. Provides a predicate that evaluates to ‘true’ or ‘false’ based on the value of the object attributes. This allows further discriminating to what objects the operation has to be applied. More information about the RIB, RIB Daemon and CDAP can be found at D2.1 [3]. CDAP is implemented as a library that provides a CDAPSessionManager class that manages one or more CDAP sessions. The CDAPSession class implements the logics of the CDAP Protocol state machine as defined in the CDAP specification [33]. CDAP can be encoded in multiple ways, but the IRATI stack follows the approach adopted by the other current RINA protocols to use Google Protocol Buffers (GPB) [31]. This decision will make interoperability possible, and will also provide the benefits of GPB: efficient encoding; proven, mature and scalable technology with good quality open source parsers/generators available. In addition to the information of the operation as well as the identity of the targeted objects, CDAP messages can also transport the actual values of such objects. Therefore the object values also need to be encoded in binary format. Again, GPB is the initial encoding format chosen, although others are also possible (ASN.1, XML, JSON, etc). Object encoding functionalities are implemented by the Encoding support library, which provides an encoding-format-neutral interface. Thus it allows for several encoding implementations to be plugged in/out, specifying which one to use at configuration time. The RIB is implemented as a map of object managers, indexed by object names – current RINA implementations have adopted the convention of making object names unique within the RIB as a simplifying assumption. Each object manager wraps a piece of state information (for example Flows, Application Registrations, QoS Cubes, the PDU Forwarding Table, etc) with the RIBObject interface. This interface abstracts the six operations provided by CDAP: create, delete, read, write, start and stop. When a remote CDAP message reaches the IPC Process Daemon, the message is handled to the RIB Daemon component. The RIB Daemon retrieves the object manager associated to the targeted object name from the RIB map, and invokes the requested CDAP operation. The goal of the object manager is to translate each CDAP operation to the appropriate actions on the layer management function classes. The layer management function classes use the RIB Daemon when they have to invoke a remote operation to a peer IPC Process. The RIB Daemon provides operations to send CDAP messages to neighbour IPC Processes based on its application process name. When such operations are called, the RIB Daemon internally fetches the port-id of the underlying N-1 flow that allows the IPC Process to communicate with the given neighbour, encodes the CDAP message and requests the kernel to write the encoded CDAP message as an SDU to that N-1 flow.
Figure 21 depicts the detailed software design of the IPC Process Daemon. The first phase prototype follows the same approach taken with the IPC Manager Daemon design and implementation: leveraging the Alba stack as much as possible in order to provide a simple but complete enough implementation of the IPC Process Daemon. Therefore the IPC Process Daemon is also a Java OS process that builds on the APIs exposed by librina through SWIG and JNI. The librina proxy classes described below are the more relevant to the IPC Process Daemon operation: IPC Manager. Allows the IPC Process Daemon to communicate with the IPC Manager Daemon, mainly to inform the latter about the results of requested operations; but also to notify about incoming flow requests or flows that have been deallocated. Kernel IPC Process. Provides operations to enable the IPC Process Daemon to communicate with the data-transfer/data-transfer-control related functions of the IPC Process in the kernel. The APIs allow the IPC Process Daemon to modify the kernel IPC Process configuration, to manage the setup and teardown EFCP connections or to modify the PDU forwarding table. IPC Process Daemons are instantiated and destroyed by the IPC Manager Daemon. When the IPC Process Daemon has completed is initialization, the main thread starts executing the event loop. Such a loop is implemented by continuously polling the EventProducer for new events (in blocking mode) and processing them when they arrive. The event processing is delegated to the classes implementing the different layer management functions: Enrollment Task, Resource Allocator, Registration Manager, Flow Allocator and PDU Forwarding Table Generator. Processing performed by these classes typically involves two types of actions: Local actions resulting in communications with the Kernel IPC Process or the IPC Manager, achieved via the librina proxy classes. Remote actions resulting in communications with peer IPC Process Daemons, achieved via the RIB Daemon. The RIB Daemon is an internal component of the IPC Process that provides an abstract, object- oriented schema of all the IPC Process state information. This schema, known as the Resource Information Base or RIB, allows IPC Processes to modify the state of their peers by performing operations on one or more of the RIB objects. The Common Distributed Application Protocol (CDAP) is the application protocol used to exchange the remote RIB operation requests and responses between peer IPC Processes. This protocol allows six remote operations to be performed over RIB objects: create, delete, read, write, start and stop. The objects that are the target of the operation are identified by the following attributes: Object class. Uniquely identifies a certain type of objects. Object name. Uniquely identifies the instance of an object of a certain class. The object class + object name tuple uniquely identify an object within the RIB. Object instance. A shorthand for object class + object name, to uniquely identify an object within the RIB. Scope. Indicates the number of ‘levels’ of the RIB affected by the operation, starting at the specified object (object class + name or instance). This allows a single operation to target multiple objects at once. Filter. Provides a predicate that evaluates to ‘true’ or ‘false’ based on the value of the object attributes. This allows further discriminating to what objects the operation has to be applied. More information about the RIB, RIB Daemon and CDAP can be found at D2.1 [3]. CDAP is implemented as a library that provides a CDAPSessionManager class that manages one or more CDAP sessions. The CDAPSession class implements the logics of the CDAP Protocol state machine as defined in the CDAP specification [33]. CDAP can be encoded in multiple ways, but the IRATI stack follows the approach adopted by the other current RINA protocols to use Google Protocol Buffers (GPB) [31]. This decision will make interoperability possible, and will also provide the benefits of GPB: efficient encoding; proven, mature and scalable technology with good quality open source parsers/generators available. In addition to the information of the operation as well as the identity of the targeted objects, CDAP messages can also transport the actual values of such objects. Therefore the object values also need to be encoded in binary format. Again, GPB is the initial encoding format chosen, although others are also possible (ASN.1, XML, JSON, etc). Object encoding functionalities are implemented by the Encoding support library, which provides an encoding-format-neutral interface. Thus it allows for several encoding implementations to be plugged in/out, specifying which one to use at configuration time. The RIB is implemented as a map of object managers, indexed by object names – current RINA implementations have adopted the convention of making object names unique within the RIB as a simplifying assumption. Each object manager wraps a piece of state information (for example Flows, Application Registrations, QoS Cubes, the PDU Forwarding Table, etc) with the RIBObject interface. This interface abstracts the six operations provided by CDAP: create, delete, read, write, start and stop. When a remote CDAP message reaches the IPC Process Daemon, the message is handled to the RIB Daemon component. The RIB Daemon retrieves the object manager associated to the targeted object name from the RIB map, and invokes the requested CDAP operation. The goal of the object manager is to translate each CDAP operation to the appropriate actions on the layer management function classes. The layer management function classes use the RIB Daemon when they have to invoke a remote operation to a peer IPC Process. The RIB Daemon provides operations to send CDAP messages to neighbour IPC Processes based on its application process name. When such operations are called, the RIB Daemon internally fetches the port-id of the underlying N-1 flow that allows the IPC Process to communicate with the given neighbour, encodes the CDAP message and requests the kernel to write the encoded CDAP message as an SDU to that N-1 flow.