IRATI @ RINA Workshop 2014, Dublin

Project overview, use cases, specifications,
software development and experimental activities
RINA Workshop, Dublin, January 28th –29th 2014

Investigating RINA as an Alternative to TCP/IP

Agenda
• Project overview
• Use cases
– Basic scenarios (Phases 1 and 2)
– Advanced scenarios (Phases 2 and 3)

• Specifications
– Shim DIF over 802.1Q
– PDU Forwarding Table Generator
– Y2 plans

• Software development
–
–
–
–

High level software architecture
User-space
Kernel-space
Wrap-up

• Experimental activities
–
–
–
–

Intro, goals, Y1 experimentation use case
Testbed and results at i2CAT OFELIA island
Testbed and results at iMinds OFELIA island
Conclusions
2

Project at a glance
•

What? Main goals
– To advance the state of the art of RINAtowards an architecture reference
model and specificationsthat are closerto enable implementations
deployable in production scenarios.
– The designand implementation of a RINA prototype on top of Ethernet
will enable the experimentationand evaluation of RINA in comparison to
TCP/IP.

Who?5partners
From 2014

5 activities:
 WP1: Project management
 WP2: Architecture, Use cases and
Requirements
 WP3: Software Design and
Implementation
 WP4: Deployment into OFELIA
testbed, Experimentation and
Validation
 WP5: Dissemination, Standardisation
and Exploitation

Budget
Total Cost

1.126.660 €

EC Contribution

870.000 €

Duration

2 years

Start Date

1st January 2013

External Advisory Board
Juniper Networks, ATOS,
Cisco Systems, Telecom Italia
3

Objectives (I)
• Enhancement of the RINA specifications
– The specification of a shim DIF over Ethernet
– The completion of the specifications that enable DIFs that provide a
level of service similar to the current Internet (low security, best-effort)

– The project use cases

• RINA Open Source Prototype for the Linux Operating System
– Targeting both the user and kernel spaces, allowing RINA to be used
on top of different technologies (Ethernet, TCP, UDP, etc)

– It will provide a solid baseline for further RINA work after the project.
IRATI will setup an initial open source community around the
prototype.

4

Objectives (II)
• Experimentation with RINA and comparison with TCP/IP
– IRATI will follow iterative cycles of research, design, implementation
and experimentation, with the experimental results retrofitting the
research of the next phase
– Experiments will collect and analyse data to compare RINA and
TCP/IP in various aspects like: application API, programmability, cost
of supporting multi-homing, simplicity, etc.

• Interoperability with other RINA prototypes
– The achievement of interoperability between independent
implementations is a good sign that a specification is well done and
complete.
– Current RINA prototypes target different programming platforms
(middleware vs. OS kernel) and work over different underlying
technologies (UDP/IP vs. Ethernet) compared to the IRATI prototype.
5

Objectives (III)
• Provide feedback to OFELIA
– Apart from the feedback to the OFELIA facility in terms of bug reports
and suggestions of improvements, IRATI will actively contribute to
improving the toolset used to run the facility.
– Moreover, the experimentation with a non-IP based solution is an
interesting use case for the OFELIA facility, since IRATI will be the first to
conduct these type of experiments in the OFELIA testbed.

6

Project Outcomes
•

Enhanced RINA architecture reference model and specifications,
contributed to the Pouzin Society for experimentation. IRATI will focus on
advancing the RINA state of the art in the following areas:
–
–
–
–
–

•

DIFs over Ethernet
DIFs over TCP/UDP
DIFs for hypervisors
Routing
Data transfer

Linux OS kernel implementation of the RINA prototype over Ethernet
– By the end of the project an open source community will be setup in order to
allow the research/industrial networking community to use the prototype
and/or contribute to its development

•

Experimental results of the RINA prototype, compared to TCP/IP

•

DIF over TCP/UDP extensions, interoperable with existing RINA prototypes

7

Overview of the project structure

8

Agenda
• Use cases

• Specifications
– Y2 plans

–
–
–
–

User-space
Kernel-space
Wrap-up

–
–
–
–

Conclusions
9

BASIC SCENARIOS
PHASES 1 AND 2

10

Basic use cases
Shim DIF over Ethernet
•

Goal: to ensure that the shim DIF over Ethernet provides the required
functionality. The purpose of a Shim DIF is to provide a RINA interface to
the capability of a legacy technology, rather than give the legacy
technology the full capability of a RINA DIF.

11

Basic use cases
Turing machine DIF
•

Goal: to provide a testing scenario to check a normal DIF complies with
a minimal set of functionality (the “Turing machine” DIF).

12

ADVANCED SCENARIOS
PHASES 2 AND 3

13

Advanced use cases
Introduction

•

RINA applied to a hybrid cloud/network provider
– Mixed offering of connectivity (Ethernet VPN, MPLS IP VPN, Ethernet
Private Line, Internet Access) + computing (Virtual Data Center)

Datacenter Design

Access Network
Wide Area Network

14

Advanced use cases
Modeling

PE

CE

CE

Customer 1 Site A

Customer 1 Site B

PE
CE
MPLS backbone

CE

Customer 1 Site C

PE

Customer 2 Site A

CE
PE

Customer 2 Site B

PE
Internet GW

CE

CE

TOR

TOR

TOR

TOR

HV VM

VM VM

HV VM

VM VM

HV VM

VM VM

HV VM

HV VM

VM VM

HV VM

VM VM

Public Internet

HV VM

VM VM

HV VM

VM VM

VM VM

HV VM

VM VM

HV VM

VM VM

HV VM

VM VM

HV VM

VM VM

HV VM

VM VM

HV VM

VM VM

HV VM

VM VM

HV VM

VM VM

Data Center 1

Customer 2 Site C

End
user

Data Center 2

15

Advanced use cases

Enterprise VPN over operator’s network
Wide Area Network
• Logical separation of customers through: MPLS
encapsulation, BGP-based MPLS VPNS and Virtual
Routing and Forwarding (VRF)

Access network

• Use of Ethernet switching within
metro-area networks
• Logical separation of traffic
belonging to multiple customers
implemented through IEEE 802.1Q

16

Advanced use cases

Enterprise VPN over operator’s network: Applying RINA

•

Backbone DIF: provides the equivalent of the MPLS network. This DIF must be able to
provide flows with “virtual circuit” characteristics, equivalent to MPLS LSPs.

•

Provider top-level DIF: This DIF provides IPC services to the different customers, by
connecting together the CE routers. The DIF may provide different levels of service,
depending on the customer’s requirements. There may be one or more of these DIFs
(one per customer, one for all the provider customers, etc).

•

Intra customer-site DIFs: The DIF whose scope is a single customer site. Its
characteristics will depend on the size and needs of the customer (e.g. could be a
campus network, an enterprise network, etc)

•

Customer A DIF: Can provide connectivity to all the application processes within
customer A’s organization. More specialized DIFs targeting concrete application types
(e.g. voice, file transfer) could be created on top.
17

Advanced use cases

Hypervisor integration: With TCP/IP
Virtual Machine 3

Virtual Machine 2
eth0

eth0

vif3.0
shared
memory

192.168.1.3

192.168.1.2

SW bridge 0
bridge if

eth0
VLAN 2

eth6

192.168.1.3

eth1

Top of Rack Switch
Out of the DC

eth0
shared
memory

eth3

eth0.2

bridge if

SW bridge 1

eth2

Hypervisor Machine

vif3.0

vif2.0
shared
memory

eth1.5

Virtual Machine 1

Out of the DC

eth5

Hypervisor Machine

eth1
eth1

eth0

eth0

VLAN 5

192.168.1.1
eth0

bridge if

SW bridge 0
shared
memory

vif1.0

192.168.1.2
vif2.0

shared
memory

eth0

Virtual Machine 3

eth1.5

bridge if

eth0.2

Virtual Machine 2

Virtual Machine 1

192.168.1.1

SW bridge 1
vif3.0

shared
memory

eth0

18

Advanced use cases
Hypervisor integration: With RINA

Hypervisor

Hypervisor
Green customer DIF

VM

Shim DIF over
802.1q
TOR

Shim DIF for HV
VM

VM

Out of the DC (to customer VPN
or Internet Gateway)

19

Advanced use cases

VDC + Enterprise VPNs over the Internet: With TCP/IP
Green Customer premises
Border router
Customer machines

Switch

Blue Customer premises

Border router
NAT,
Gateway

NAT,
Gateway

Customer machines

Switch

Datacentre
Border router

Public Internet

eth2

eth3

Public Internet

NAT,
Gateway

eth0

eth1

Datacenter
premises

20

Advanced use cases

VDC + Enterprise VPNs over the Internet: With RINA
Hypervisor

Hypervisor
Green customer DIF

VM
Shared
memory

Shim DIF over
802.1Q
VLAN 2

Shim DIF for HV
Shared
memory

VLAN 2

TOR

VM

VLAN 2

VM

Shared
memory
DC Border
router
Server

Shim DIF over
TCP/UDP

Datacenter
premises

Public
Internet

VLAN 10

Green Customer
premises
Customer Border
router

Server

Shim DIF over
802.1Q
VLAN 10

Layer 2
switch

VLAN 10

21

Agenda
• Use cases

• Specifications
– Y2 plans

–
–
–
–

User-space
Kernel-space
Wrap-up

–
–
–
–

Y2 plans
22

General requirements

• The task of a shim DIF is to put a small as possible veneer over a
legacy protocol to allow a RINA DIF to use it unchanged.

• The shim DIF should provide no more service or capability than
the legacy protocol provides.

24

Examining the Ethernet Header
• Ethernet II: specification released by DEC, Intel,
Xerox (hence also called DIX Ethernet)
Preamble

MAC dest

MAC src

802.1q
header
(optional)

Ethertype

Payload

FCS

Interframe
gap

7 bytes

6 bytes

6 bytes

4 bytes

2 bytes

42-1500
bytes

4 bytes

12 bytes

25

Ethertype
• Identifies the syntax of the encapsulated protocol
• Layers below need to know the syntax of the layer
above
• Layer violation!

26

Consequences of using an Ethertype
• Also means only one flow can be distinguished
between an address pair
• The MAC address doubles as the connection
endpoint-id

27

Environment


28

Address Resolution Protocol
• Resolves a network address to a hardware address
– Most ARP implementations do not conform to the standard
– Shim IPC process assumes RFC826 compliant implementation

30

Usage of ARP
• Maps the application process name to a shim IPC
Process address (MAC address)
– Application process name is transformed into a network
protocol address
Process name: My_IPC_Process
Process instance: 1

My_IPC_Process/1/Management/2

Entity name: Management
Entity instance: 2

– Application registration adds an entry in the local ARP cache

• Flow allocation request results in an ARP request/reply
– Instantiates a MAC protocol machine equivalent of DTP (cf.
Flow Allocator)
IRATI - Investigating RINA as an Alternative to TCP/IP

PDU FORWARDING TABLE
GENERATOR
32

PDU Forwarding Table Generator
Requirements and general choices

It’s all policy!
• Every DIF can do it its own
way
• We start with a link-state
routing approach

33


High-level view and relationship to other IPC Process components
IPC Process


Enrollment
Task

Events
N-1 flow allocated
N-1 flow deallocated
N-1 flow down
N-1 flow up

Update
knowledge on N1 flow state

Propagate
knowledge on N1 flow state

Events

Enrollment completed
successfully
Neighbor B invoked write
operation on object X

PDU Forwarding
Table

Recompute
forwarding table

Lookup PDU Forwarding
table to select output N-1
flow for each PDU

Invoke write operation on
object X to neighbor A

Relaying and
Multiplexing Task

RIB Daemon
5

6

7

1

2

3

4

N-1 Flows to nearest neighbors
(Layer management)

CDAP

Incoming CDAP messages
from neighbor IPC Processes

CDAP

Resource
Allocator

Outgoing CDAP messages to
neighbor IPC Processes

N-1 Flows to nearest
neighbors (Data Transfer)

34

Plans for Year 2
• Shim DIF for Hypervisors
– Enable communications between VMs in the same physical
machine without using the networking subsystem

• Updated shim DIF over TCP/UDP
– Current version requires manual discovery of mappings of app
names to IP addresses and TCP/UDP ports, investigate the use of
DNS

• Updated PDU Forwarding Table Generator
– Based on lessons learned from implementation and experimentation

• Feedback to EFCP
– Based on implementation and experimentation experience

• Faux sockets API
35

Agenda
• Use cases

• Specifications
– Y2 plans

–
–
–
–

User-space
Kernel-space
Wrap-up

–
–
–
–

Y2 plans
36

Project’ targets and timeline (SW)

• IRATI SW goals:
•
• fx

•

Release 3 SW prototypes in 2 years
• Each prototype provides incremental functionalities
• 1st prototype: basic functionalities (unreliable flows)
• Comparable to a UDP/IP
• 2nd prototype: “complete” stack (reliable flows + routing)
• Comparable to a TCP/IP
• 3rd prototype: enhancements (hardened proto + RINA over IP + …)
• More product-like than prototype-like
• Glancing at extendibility, portability, performances & usability
The SW components lay at both kernel & user spaces

38

Problems …
• Problems are mostly SW-engineering related
– Time constrained
1.
2.
3.

Ref-specs → HL arch
HL arch → detailed design
Detail design → implementation, debug, integration …

• Since the IRATI stack spans user and kernel spaces…
• User-space problems (as usual):
–
–
–
–
–

Memory (e.g. corruptions, leaks)
Bad logic (e.g. faults)
Concurrency (e.g. dead-locks, starvation)
…
Anything that special (but … time consuming for sure)


39

… and problems
• Kernel space problems are the user-space ones PLUS:
– A harsher environment, e.g.

• The develop, install & test cycle is (a lot) slower

– Huge code-base (takes lot to compile)
– Faults in the kernel code may bring the whole host down
– Reboot s are usually required to test a new “version” (at early stages)

• C is “the” language → less expressive than others in userland
• No “external libraries” …

– The kernel is “cooperative”, e.g.

• Stack & heap handling must be “careful”, e.g.

– Memory corruptions could propagate everywhere

– Different mechanics, e.g.

• Mutex, semaphores, spinlocks, rcus … coupled with un-interruptable
sleeps
– Syscalls may sleep … but spinlocks can’t be held while “sleeping”

• No recursive locking
• Memory allocation is in different flavours: NOWAIT, NOIO, NOFS …

– ... … …


40

Outline
• Introduction
• High level software architecture
• Detailed software architecture
– Kernel space
– User space

• Wrap-up


41

Splitting the spaces: user vs kernel
Fast/slow paths → user vs kernel
• We split the “design” in different “lanes” and placed SW
components there, depending on their timing requirements
– Fast-path → stringent timings → kernel-space
– Slow-path → loose timings → user-space

• ... looking for our optimum

– fiddling with time/easiness/cost/problems/schedule/final-solution etc.

User
Kernel

Kernel
User

43

API & kernel
•

OS Processes request services to the kernel with
syscalls
–
–

•

Modern *NIX systems extend the user/kernel
communication mechanisms
–

•

User OR kernel originated
Multicast/broadcast

We adopted syscalls and Netlink
–

Syscalls (fast-path):
•

–

Application
Application
Application
Application
Application
M

Netlink, uevent, devfs, procfs, sysfs etc.

We wanted a “bus-like” mechanism: 1:1/N:1,
user/kernel & user/user
–
–

•

User originated (user → kernel)
Unicast

Bootstrapping & SDUs R/W (fast-path)

Netlink(mostly slow-path):
•

We introduced a RINA “family” and its related
messages

IPC Process
IPC Process
Daemon
IPC Process
Daemon
Daemon

IPC Manager
Daemon

N

1

User
Kernel
Kernel
1

(*) Bootstrapping needs: Syscalls create kernel components
which will be using Netlink functionalities later on


44

Introducing librina
• Syscalls are “wrapped” by libc (kernel abstraction)
– i.e. syscall(SYS_write, …) → write(…)
– glibc in a OS/Linux

• Changes to the syscalls → changes to glibc
– Breaking glibc could break the whole host
• Sandboxed environments are necessary

– Dependencies invalidation → Time consuming compilations
– That sort of changes are really hard to get approved
upstream
– etc.

• We introduced librina as the initial way to overcome
these problems …
– … use IRATI in a host without breaking the whole system

45

librina
• It is more a framework/middleware than a library
– It has explicit memory allocation (no garbage collection)
– It’s event-based
– It’s threaded

• Completely abstract the interactions with the kernel
– syscalls and Netlink

• Adds functionalities upon them
• Provides them to userland (apps & daemons)
– Static/dynamic linking (i.e. for C/C++ programs)
– Scripting language extensions (i.e. Java)


46

librina interface
• librina contains a set of “components”:
– Internal components
– External components

• And a portable framework to build components on
top, e.g.:
– Patterns: e.g. singletons, observers, factories, reactors
– Concurrency: e.g. threads, mutexes, semaphores, condition
variables
– High level “objects” in its core
• FlowSpecification, QoSCube, RIBObject etc.

• Only the “external “components are “exported” as
classes

47

librina core (HL) SW architecture
• Configure PDU Forwarding Table
• Create / delete EFCP instances
• Allocation of kernel resources to support a flow

• Creation
• Deletion
• Configuration

Application
eventPoll()
eventWait()

• Allocate / deallocate flows
• Read / write SDUs to flows
• Register/unregister to 1+ DIF(s)

eventPost()

common

cdap

faux-sockets

sdu-protection

ipc-process

ipc-manager

application

API

framework

Core components

Event Queue

NetlinkManager

librina
NetlinkSession
NetlinkSession
NetlinkSessions

RINA
Manager

nl_send() / nl_recv()

Syscall wrappers
syscall(SYS_*)

libnl / libnl_genl

User
kernel
RINA Netlink


RINA syscalls

50

How to RAD, effectively ?
• OO was the “natural” way to represent the RINA entities
• We embraced C++ as the “core” language for librina:
– Careful usage produces binaries comparable to C
– The STL reduces the dependencies
• in the plain C vs plain C++ case

– Producing C bindings is possible
– …

…

• There was the ALBA prototype already working …
• … and ALBA has RINABand …
• BUT that prototype is Java based …

51

Interfacing librina to other languages
• We “adopted” SWIG: the Software Wrapper and Interface
Generator
• SWIG “automatically” generates all the code needed to
connect C/C++ programs to scripting languages
– Such as Python, Java and many, many others …

example.h
int fact(int n);

example.c
#include "example.h"

example.i
/* File: example.i */
%module example

SWIG

%{
#include "example.h"
%}

High level
wrapper

int fact(int n);

int fact(int n) { … }

Low level
wrapper

example_wrap.c
GCC

Native
interface
libexample.so


example.py

Python

52

librina wrapping
• Wrapping “cost”:
– The wrappers (.i files) are small: ~480 LOCs
– They produce ~13.5 KLOCS bindings → ~1/28 ratio …

• The wrappers are the only thing needed to obtain the
bindings for a scripting language
– SWIG support vary on the target language, i.e.
• Java: so-so (not all data-types mapped natively)
• Python: good
• …

– Our wrappers contain only the missing data-type mappings for
Java

• Java interface = C++ interface
• Bindings for other languages (i.e. Python) are expected to
be straightforward

53

RINABand HL

RINABand HL

ipcpd

ipcmd

RINABand LL

rinad
(Java)

Language X
imports

Third parties
SW Packages
(Applications)

Java “imports”
SWIG HL wrappers
(Language X)

SWIG HL wrappers (Java)
JNI

Language X “NI”
SWIG LL wrappers
(C++, for language X)

SWIG LL wrappers
(C++, for Java)

librina

API (C)

Static/dynamic
linking

API (C++)
Core (C++)
libnl / libnl-gen
syscalls

Netlink
Kernel


54

DETAILED SOFTWARE ARCHITECTURE
KERNEL SPACE

55

The Linux object model
•

Linux has its “generic” object abstraction: kobject, kref and kset

Garbage collection &SysFS integration
structkref { atomic_trefcount; }

Naming &sysfs

structkobject {
const char *
name;
structkset {
structlist_headentry;
structlist_headlist;
structkobject *
parent;
spinlock_tklist_lock;
structkset *
kset;
structkobjectkobj;
structkobj_type *
ktype;
const structksetset_uevent_ops * uevent_ops;
structsysfs_dirent * sd;
};
structkrefkref;
unsigned int state_initialized:1;
unsigned int state_in_sysfs:1;
Objects (dynamic) [re-]parenting
unsigned int state_add_uevent_sent:1;
unsigned int state_remove_uevent_sent:1;
(loosely typed)
unsigned int uevent_suppress:1;
};

Objects grouping

SysFS integration
•

Generic enough to be applied “everywhere”

References counting (explicit)

– E.g. FS, HW Subsystems, Device drivers


56

kobjects, ksets and krefs in IRATI
• They are the way to go for embracing OOD/OOP kernel-wide

• If the design has a “limite scope” the code gets bloated for:
– Ancillary functions & data structures
– (unnecessary) Resources usage

• We don’t need/want all these functionalities (everywhere):
– Reduced (finite) number of classes

• We don’t have the needs of a “generic kernel”

– Reduced concurrency (can be missing, depending on the object)
– Object parenting is “fixed”(obj x is always bound to obj y)
• E.g. DTP/DTCP are bound to EFCP …

– Not all our objects have to be published into sysfs
– We have different lookups requirements
• No needs to “look-up by name” every object

– Inter-objects bindings shouldn’t loose the object’ type
– …


57

Our OOP/OOD approach
•
•
•
•

We adopted a (slightly) different OOD/OOP approach
(almost) Each “entity” in the stack is an “object”
All our “objects” provide a basic common interface & behavior
They have no implicit embedded locking semantics
structobject_t{ … };

API opaque

structobj_ops_t {
result_x_t (* method_1)(object_t * o, …);
…
result_y_t (* method_n)(object_t * o, …);
};

Static

Dynamic

vtable (if needed)

intobj_init(object_t * o, …);
void
obj_fini(object_t * o);

Interruptablectxt

object_t * obj_create(…);
object_t * obj_create_ni(…);
intobj_destroy(object_t * o);

Non-interruptablectxt

intobj_<method_1>(object_t * o, …);
...
intobj_<method_n>(object_t * o, …);

vtable proxy (if needed)


58

OOD/OOP & the framework
• This approach:

– Reduces the stack (overall) bloating

• no krefs, spinlocks, sysfs etc. where unnecessary
• Only objects requiring sysfs, debugfs and/or uevents embed a kobject

– (or it is comparable)

• E.g. the same bloating related to _init, _fini, _create and _destroy

– Speeds-up the developments
– Helps debugging

• (re-)Parenting is constrained to specific objects
• No loose-typing → type-checking is maintained (no casts)

– Decouples (mildly) from the underlying kernel

• With these assumptions we built our framework

– Basic components: robj, rmem, rqueue, rfifo, rref, rtimer, rwq, rmap,
rbmp
– OOP facilities/Patterns: Factories, singletons, facades, observers,
flyweights, publisher/subscribers, smartpointers, etc.
– Ownership-passing + smart-pointing memory model


59

The HL software architecture (Y1)
rinad
RINABand HL

ipcpd

Third parties
SW Packages

ipcmd

SWIG HL wrappers (Java)

SWIG HL wrappers
(Language X)

SWIG LL wrappers
(C++, for Java)

rinad

SWIG LL wrappers
(C++, for language X)

User
space
librina

librina

Framework

API (C)
API (C++)
Core (C++)
libnl / libnl-gen
syscalls

Netlink
Personality mux/demux

KIPCM
core

RNL

IPCP Factories
Framework

Kernel
space

KFA

kernel

KIPCM

shim-eth-vlan

Normal IPC P.
PFT

RMT

EFCP

shim-dummy

RINA-ARP


62

The API exposed to user-space:
KIPCM + RNL
• Kernel interface = syscalls + Netlink messages
• KIPCM:
– Manages the syscalls
• Syscalls: a small-numbered, well defined set of calls (#8) :
– IPCs: ipc_create and ipc_destroy
– Flows: allocate_portand deallocate_port
– SDUs: sdu_read, sdu_write, mgmt_sdu_read and mgmt_sdu_write

• RNL:
– Manages the Netlink part
• Abstracts message’s reception, sending, parsing & crafting
• Netlink: #36 message types (with dynamic attributes):
– assign_to_dif_req, assign_to_dif_resp, dif_reg_notif, dif_unreg_notif…

• Partitioning:
– Syscalls→ KIPCM → “Fast-path” (read and write SDUs)
– Netlink→ RNL → “Slow-path” (mostly conf and mgmt)


63

KIPCM & KFA
•

The KIPCM:

– Counterpart of the IPC Manager in user-space
– Manages the lifecycle the IPC Processes and KFA
– Abstract IPC Process instances
• Same API for all the IPC Processes regardless the
type
• maps: ipc-process-id → ipc-process-instance

•

KIPCM
KFA

Manages ports and flows

– Ports

• Flow handler and ID
• Port ID Manager

– Flows

• maps: port-id → ipc-process-instance

Normal
IPCP

EFCP

Both “bind” the kernel stack:
–
–

•

syscalls
Netlink

The KFA
–

•

User space

Top: user-interface
Bottom: ipc processes (maps)

–

When KIPCM calls KFA to inject/get SDUs:
• N-IPCP → EFCP → RMT → PDU-FWD → Shim/IPC
Process

Shim
IPCP

RMT

They are the Initial point where “recursion” is
transformed into “iteration”


PDU-FWD-T

OUT

IN

64

The RINA Netlink Layer (RNL)
• Integrates Netlink in the SW framework
– Hides all the configuration, generation and destruction of Netlink sockets and
messages from the user

– Defines a Generic Netlink family (NETLINK_RINA) and its messages


66

The IPC Process Factories
• They are used by IPC Processes to publish/unpublish their availability
– Publish:
• x = kipcm_ipcp_factory_register(…, char * name, …)

– Unpublish:
• kipcm_ipcp_factory_unregister(x)

• The factory name is the way KIPCM can look for a
specific IPC Process type
– It’s published into sysfs too

• There are two “major” types of IPC Processes :
– Normal
– Shims

67

The IPC Process Factories Interface
• Factory operations are the same for both types

• Upon registration
– A factory publishes its hooks

.init
.fini
.create
.destroy
.configure

→
→
→
→
→

x_init
x_fini
x_create
x_destroy
x_configure

• Upon user-request (ipc_create)
– The KIPCM creates a particular IPC Process instance
1.
2.
3.
4.

Looks for the correct factory (by name)
Calls the .create “method”
The factory returns a “compliant” IPC Process object
Binds that object into its data model

• Upon un-registration
– The factory triggers the “destruction” of all the IPC Processes
it “owns”

68

IPC Process Instances
• The .create provided to the factories returns an IPC
Process “object”
• There are two “major” types of IPC Processes:
– Normal
– Shims

• Regardless of its type
– The interface is the same
– Each IPC Process implements its “core” code:
• Shim IPC Process:
– Each Shim IPC Processes provide its implementation

• Normal IPC Process:
– The stack provides an implementation for all of them


69

IPC Process Instances Interface
• The IPC Process “object”

• instance_data
• instance_ops

• The IPC Process Interface is the same for all types,
but each type decides which ops will support
– Some are specific for normal or shim, a few are
common to both
instance_ops

•
•
•
•
•
•
•

.application_register
= x_application_register
.application_unregister = x_application_unregister
.assign_to_dif
= x_assign_to_dif
.sdu_write
= x_sdu_write
.flow_allocate_request = shim_allocate_request
.flow_allocate_response = shim_allocate_response
.flow_deallocate
= shim_deallocate

•
•
•
•
•
•
•

.connection_create
= normal_ connection_create
. connection_update
= normal _ connection_update
. connection_destroy
= normal _ connection_destroy
.connection_create_arrived = normal _connection_arrived
.pft_add
= normal_pft_add
. pft_remove
= normal_pft_remove
. pft_dump
= normal_pft_dump

– They support similar functionalities (except the PFT’s)
– How they translate into ops depends on the type

70

Write operation
sys_sdu_write(sdu, app2)

APP

User space
Kernel space

port_idapp2

kipcm_sdu_write(sdu, app2)

IPCP 2
EFCPC 2

EFCP 2i

efcp_container_write(sdu, 2i)

dtp_write(sdu)
DTP

efcp_write(sdu)

KIPCM

normal_write(sdu, app2)
kfa_flow_sdu_write(sdu, app2)

rmt_send(pdu)

RMT 2

kfa_flow_sdu_write(sdu*, 21)

KFA

port_id 21

IPCP 1

EFCPC 1

EFCP 1j

dtp_write(sdu*)
DTP

efcp_container_write(sdu*, 1j)

efcp_write(sdu*)

normal_write(sdu*, 21)

rmt_send(pdu*)

RMT 1

kfa_flow_sdu_write(sdu**, 10)

Pid10

IPCP 0

SHIM

shim_write(sdu**, 21)

Read operation
sys_sdu_read(app2)

APP
port_idapp2

User space
Kernel space

IPCP 2

EFCPC 2

kipcm_sdu_read(app2)

kfa_sdu_post(sdu, app2)

EFCP 2i

DTP

KIPCM

dtp_receive(pdu)
efcp_receive(pdu)
efcp_container_receive(pdu, 2i)

RMT 2

kfa_flow_sdu_read(app2)

rmt_receive(sdu*, 21)

KFA

port_id 21

IPCP 1

EFCPC 1

EFCP 1j

kfa_sdu_post(sdu*, 21)

DTP
dtp_receive(pdu*)
efcp_receive(pdu*)

efcp_container_receive(pdu*, 1j)

RMT 1

rmt_receive(sdu**, 10)

port_id 10

IPCP 0

SHIM

kfa_sdu_post(sdu**, 10)

Shim IPC Processes
• The shims are the “lowest” components in the kernelspace
• They have two interfaces:
– NB: The same for each shim, represented by hooks published
into KIPCM factories
– SB: Depends on the technology

• There are currently 2 shims:
– shim-dummy:
• Confined into a single host (“loopback”)
• Used for debugging & testing the stack

– shim-eth-vlan:
• As defined in the spec, runs over 802.1Q

73

Shim-dummy
IPC
Process
Daemon

IPC
Manager
Daemon

User-space
Kernel
KIPCM / KFA

shim_dummy_create
shim_dummy_destroy

RINA IPC API

Dummy shim IPC Process

Shim-eth-vlan
IPC
Process
Daemon

IPC
Manager
Daemon

User-space
Kernel
KIPCM / KFA

shim_eth_create

shim_eth_destroy

rinarp_add
RINA IPC API

Shim IPC Process over 802.1Q

rinarp_remove

RINARP
rinarp_resolve

shim_eth_rcv
dev_queue_xmit

Devices layer

RINARP
shim-eth-vlan
ARP826

Maps

Core

Tables
RINARP

API

TX

RX

ARM

Devices
Layer


DETAILED SOFTWARE ARCHITECTURE
USER SPACE

78

Introduction to the user space framework
IPC Manager
Daemon

Main logic

IDD

RIB & RIB
Daemon

Manageme
nt agent

Normal IPC Process
IPC(Layer Management)
Process Daemon
Enrollment
(Layer Management)

librina
Application A
Application A
Application A
Application logic

Netlink
sockets

System
calls

Netlink
sockets

Sysfs
Netlink
sockets

PDU
Forwarding
Table
Generation

Flow
allocation

librina
System calls

RIB & RIB
Daemon

Resource
allocation

librina
System calls

Netlink
sockets

Sysfs

User space
Kernel

•
•
•

IPC Manager Daemon: Broker between apps & IPC Processes, central point of Management
in the system
IPC Process Daemon: Implements the layer management components of an IPC Process
Librina: Abstracts out the communication details between daemons and the kernel
79

Librina software architecture
Perform action

Get event

API (C++)
Message
Message

classes
Proxy
classes
classes

Message
Message

classes
Model
classes
classes

Event
Producer

Message
Message

classes
Event
classes
classes

Events queue

Concurrency
classes

Core (C++)

libpthread

Message
Message

Message
reader
Thread

classes
Message
classes
classes

Netlink Manager

Syscall wrappers

Logging
framework

Netlink Message
Parsers /
Formatters

libnl/libnl-gen

User space
Kernel

80

The IPC Process and IPC Manager
Daemons
• IPC Manager Daemon
–
–
–
–

Manages the IPC Processes lifecycle
Broker between applications and IPC Processes
Local management agent
DIF Allocator client (to search for applications not available through local DIFs)

• IPC Process Daemon
– Layer Management components of the IPC Process
• RIB Daemon, RIB,
• CDAP parsers/generators
• CACEP
• Enrollment

• Flow Allocation
• Resource Allocation
• PDU Forwarding Table Generation
• Security Management
81

IPC Manager Daemon
Message
Message

IPC Manager Daemon (Java)

classes
Console
classes
classes

IPC Manager core classes

IPC Process
Manager

Flow Manager

Application
Registration
Manager

Call operation on IPC
Manager core classes

Command
Line
Interface
Server
Thread

Operation result

Call IPC Process Factory, IPC
Process or Application
Manager

local TCP
Connection

CLI Session

Message

Call operation on IPC
Manager core classes

Main event
loop

Message
Configura
classes
classes
tion
classes

Bootstrapper
Configuration file

EventProducer.eventWait()


SWIG Wrappers (high-level, Java)
Java Native Interface (JNI)
SWIG Wrappers (Low-level, C++)
librina (C++)

IPC
Process

IPC Process
Factory

Message
Message

classes
Model
classes
classes

Message
Message

classes
Event
classes
classes

Event
Producer

Application Manager

System calls

Netlink Messages

83

IPC Process Daemon
IPC Process Daemon (Java)
Supporting classes

Delimite
r

CDAP
parser

Encoder

Layer Management function classes

Enrollment
Task

Flow
Allocator

Resource
Allocator

Registration
Manager

Forwarding
Table
Generator

RIB Daemon

Resource
Information
Base (RIB)
RIBDaemon.
sendCDAPMessage()
RIBDaemon.cdapMessageReceived()

Call IPCManager or
KernelIPCProcess

CDAP
Message
reader
Thread

Main event
loop

KernelIPCProcess.writeMgmtSDU()

KernelIPCProcess.readMgmtSDU()

SWIG Wrappers (high-level, Java)

Java Native Interface (JNI)
SWIG Wrappers (Low-level, C++)
librina (C++)

KernelIP
C
Process

IPC
Manager

System calls

Message
Message

classes
Model
classes
classes

Message
Message

classes
Event
classes
classes

Netlink Messages

Event
Producer

85

Example workflow : IPC Process creation
•

The IPC Manager reads a configuration file with instructions on the IPC
Processes it has to create at startup
–

•

Or the system administrator can request creation through the local console

The configuration file also instructs the IPC Manager to register the IPC
Process in one or more N-1 DIFs, and to make it member of a DIF
3. Initialize librina
4. When completed notify IPC Manager (NL)
local TCP
Connection

10. Update state and forward to Kernel (NL)

5. IPC Process initialized (NL)
CLI Session

OR

8. Notify IPC Process registered (NL)

IPC Manager Daemon

9. Assign to DIF request (NL)

IPC Process
Daemon

13. Assign to DIF response (NL)
Configuration file

1. Create
IPC Process
(syscall)

6. Register
2.
app
Fork(syscall
request(NL)
)

7. Register app
response (NL)

11. Assign to
DIF request
(NL)

12. Assign to
DIF response
(NL)

User space
Kernel

86

Example workflow : Flow allocation
•

An application requests a flow to another application, without
specifying what DIF to use
2. Check app permissions
3. Decide what DIF to use
4. Forward request to adequate IPC Process Daemon
5. Allocate Flow Request (NL)

1. Allocate Flow
Request (NL)

IPC Manager
Daemon
12. Forward response to app

Application A

13. Allocate Flow
Request Result (NL)

14. Read data from the
flow (syscall) or write
data to the flow (syscall)

User space

11. Allocate Flow Request Result (NL)

IPC Process
Daemon
6. Request port-id (syscall)
7. Create connection request (NL)
8. On create connection response
(NL), write CDAP message to N-1
port (syscall)
9. On getting an incoming CDAP
message response (syscall),
update connection (NL)
10. On getting update connection
response (NL) reply to IPC
Manager (NL)

Kernel

87

Y1: Where we are / What do we have…
• 9 months, ~3700 commits and ~214 KLOCs later …
–
–
–
–

~27 KLOCs in the kernel;
~87 KLOCs in the librina (hand-written);
~35 KLOCS in the librina (automatically generated);
~65 KLOCs in rinad

• .. the project released its 1st prototype (internal release):
– User and kernel space components providing unreliable flow
functionalities
– We have the building|configuration|development frameworks
– A testing framework
• A testing application (RINABand, compilation-time)
• A regression framework (ad-hoc, run-time)

• We’re actively working on the 2nd prototype

89

Y2: Plans …
• Prototype 2:
– Reliable flows support
– Shim DIF for HV
• Same schema as shim-dummy/shim-eth-vlan as in prototype 1

– Complete routing
– Public release as FOSS (July 2014)

• Prototype 3:
– Shim DIF over TCP/UDP
• same schema as prototype 2

– Faux sockets API via
1. FI: Functions interposition (dynamic linking)
2. SCI: System calls interposition (static linking)


90

Agenda
• Use cases

• Specifications
– Y2 plans

–
–
–
–

User-space
Kernel-space
Wrap-up

–
–
–
–

Conclusions
92

IRATI EXPERIMENTATION GOALS


93

Experimentation goals
TCP/IP
UDP/IP

RINA
prototype

Use Cases

Specifications


94

IRATI experimentation in a nutshell
Phase I

Phase III

Phase II

PSOC
OFELIA

OFELIA

iLab.t

iLab.t

iLab.t
EXPERI
MENTA

OFELIA

EXPERI
MENTA


OFELIA

EXPERI
MENTA

95

PROTOTYPE STATUS AND TOOLS


96

Available Tools
• Rinaband

RINABand
1

RINABandClient
1

Data
Contr
– Test application for RINA
AE
ol
AE
– Java (user space)
– Requires multiple flows between to Api’s

1 control flow
N data flows

Contr
ol
AE

Data
AE

DIF

• Echoserver/client
– test parameters number and size of SDUs to be sent
– Ping-like operation
– The test completes when either all the SDUs have been sent and
received, or when more than a certain interval of time elapses
without receiving an SDU.
– client and server report statistics
• the number of transmitted and received SDUs
• time the test lasted.

– Single flow between two Api’s

97

First Phase Prototype capabilities
• Capabilities
– Decision to focus on the Shim- ETH-VLAN
– Supports only a single flow between two APi’s
Preamble

MAC dest

MAC src

802.1q
header
(optional)

Ethertype

Payload

FCS

Interframe
gap

7 bytes

6 bytes

6 bytes

4 bytes

2 bytes

42-1500
bytes

4 bytes

12 bytes

• Impact on experiments
– Could not use RinaBand
– Rely on Echoserver/client application


98

FIRST PHASE EXPERIMENTS


99

First phase use case


100

Single flow echo/bw test

•Validate Stack / Prototype 1
•Validate Ethernet transparency
•Measure goodput

101

Multiple flow echo/bw validation

•Validate multiple IPC processes
•Measure goodput


102

Concurrent RINA and IP

•Validate concurrency IP and RINA stack
•Measure goodput


103

Presented by Leonardo Bergesio

FIRST PHASE RESULTS @ I2CAT


104

i2CAT OFELIA Island, EXPERIMENTA
• Experiment == slice
• FlowSpace:
– Arbitrary Topology
– Partition of the
vectorial space of OF
header fields
– Slicing by VLANs

• VMs to be used as
end points or
controllers
• Perfect march:
– SLICE  VLAN  Shim DIF over Ethernet

105

Workflow I
• Access island using OCF. Create or access your
project/slice


106

Workflow II
• Select FlowSpace Topology and slice VLAN/s (DIFs)


107

Workflow III
• Create VMs  Nodes and OpenFlow Controller


108

Resources Mapping
SlicewithtwoVLANsids,
one per DIF: 300, 301


109

Single flow

Packets are sent over the Ethernet/VLAN bridge
Goodput roughly 60% of Link capacity (iperf tested)


Project: IRATIbasicusecase
Slice:
multivlanslice

111

Multiple flows

Flows to shared server (B & C to D)achieved half
the throughput than the single flow (A to B)


Slice:
multivlanslice

112

Concurrency between IP and RINA
stack

Slice:
multivlanslice
UDP

Time Interval
90s

Nº of datagrams
554915

Data sent
778 MB

BW
75.5 Mbps


113

FIRST PHASE RESULTS @ IMINDS


114

iLab.t “Virtual Wall”: Concept

115

Virtual Wall: Topology Control

116

Virtual Wall: Topology Control

117

Virtual wall @ iMinds


118

Emulab: architecture
Internet

Web/DB/SNMP
emulab ArchitectureSwitch Mgmt
Users

PowerCntl

Control Switch/Router
Serial

PC

PC

168

Programmable “Patch Panel”
p.119

Emulab: programmable patch panel

p. 120

Workflow

Experiment
idea
GUI

Emulab runs
the additional
scripts from
ns file

ns script

Hardware
Mapping
and swap in


Additionalscripting

121

Basic Experiment on iMinds island
• Use a LAN for the VLAN bridge


122

Single flow

Packets are sent over the Ethernet/VLAN bridge
Goodput roughly 60% Iperf bandwidth


123

Multiple flows


124

Concurrency between IP and RINA stack

Start Echo Server

UDP


125

CONCLÚIDÍ


126

Conclusions from phase I
experimentation
•
•
•
•

IRATI stack and Shim DIF are running
~60% goodput in comparison to iperf
No major performance problems
When running concurrently, the IRATI stack take
precedence over the IP stack
– our stack doesn't loose a packet from syscalls to devs-layer

• ARP in Shim DIF should not reuse 0x0806 ETHERTYPE
because of incompatibility with existing
implementations
• Registration to Shim-DIF over Ethernet should be
explicit

127

Thanks for your attention!
Questions?


IRATI @ RINA Workshop 2014, Dublin

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (16)

Ähnlich wie IRATI @ RINA Workshop 2014, Dublin

Ähnlich wie IRATI @ RINA Workshop 2014, Dublin (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

IRATI @ RINA Workshop 2014, Dublin

Hinweis der Redaktion