2. Outline
The Infrastructure
• Heterogeneous resources as a service
• Data Bonanza
• Virtual Research Environment
• Software platform
iMarine Catalogue
• StatsCube
• GeosCube
• BiolCube
• ConnectCube
I-MARINE EXTENDED BOARD
2
3. Distinguishing capabilities of the iMarine e-infrastructure and its
enabling software
THE INFRASTRUCTURE
I-MARINE EXTENDED BOARD
3
4. Concepts
The initiative
(the visionary leadership)
The e-infrastructure
(the operational platform)
The system
(the enabling sw system)
I-MARINE EXTENDED BOARD
4
6. Infrastructure: key characteristics
• Efficient and tailored storage technologies
• Computational environments dealing with the volume
of the data
• Elastic management of the resources, monitoring,
alerting, recovery
• Collaborative environment to support scientific
communities
• Rich portfolio of applications to perform access,
validation, enriching, processing, sharing, and mash-up
of data
I-MARINE EXTENDED BOARD
6
7. Infrastructure: Storage as Service
• Secure
• Fault-tolerant
• Replication
• Open source
RDBMS
• Up to 1 TB data
Virtual
Workspace
Relational
Databases
45 TB Currently Used
Spatial
Database
Large and
Active data
storage
• ISO
19115/10139
Metadata
• Catalogue
• Scalability and
high availability
• Across sites
I-MARINE EXTENDED BOARD
7
8. 330 Cores Currently Allocated
Infrastructure: Computing as Service
Hadoop
• MapReduce
Statistical
Manager
• Analysis/clustering/modeling
R clusters
• Windows and Linux
I-MARINE EXTENDED BOARD
8
9. Infrastructure: Management as Service
Operation Machine readable SLAs
Machine readable monitoring, auditing, billing,
reporting, and notification
Machine readable resource/performance capabilities
description
Trust
Privacy, governance, and attribution
Security, trusted network
I-MARINE EXTENDED BOARD
9
10. Infrastructure: Collaborative Environment
The Social Portal offers a familiar view of what is
happening on their VREs
A single place to
• Get status and updates from applications and other users they are interested in;
• Get notifications about messages, jobs completion, new generated products, etc.
I-MARINE EXTENDED BOARD
10
11. Infrastructure: Collaborative Environment
The Social Portal offers a familiar view of what is
happening on their VREs
A single place to
• Manage all the portal extension.
W rk p Ms ags o atio sP e
o s ac es eNtific n ag
e
Se hiny u W rk p e
arc o r o s ac
H m So ial
oe c
I-MARINE EXTENDED BOARD
11
12. Infrastructure: Collaborative Environment
The Social Portal offers a familiar view of what is
happening on their VREs
A single place to
• Manage data, store and preserve them
• Share data
I-MARINE EXTENDED BOARD
12
15. Data Bonanza
SDMX *
- FAO CodeLists
- IRD CodeLists
- FAO Global
Aquaculture
Production
- FAO Global Capture
Production
- FAO Global
Production
- Eurostat
- …
Statistical
Biodiversity
Geospatial
DarwinCore / ISO19139
>35 M Observations (OBIS)
≈ 120 K Observed Species
(OBIS)
≈ 500 K Taxa (WoRMS)
>600 K Scientific Names
(ITIS)
>12 K Species Distribution
Maps (AquaMaps)
≈ 600 Species Extent (FAO)
… FishBase, SeaLifeBase
… CoL, GBIF
> 300
variables
ISO19139 (OGC W*S)
10 years Chemical and Physical variables in 2D space
Ice concentration and velocity, Chlorophyll, Oxygen, Nitrate, Phosphate,
Phytoplankton as carbon, Salinity, Temperature, …
On-demand Chemical and Physical variables in 3D space
Apparent Oxygen Utilization, Dissolved Oxygen, Salinity, Temperature, …
I-MARINE EXTENDED BOARD
15
16. Not Only Access
• Access
– Retrieval of geospatial data as
space/time-varying phenomena
– Direct fine-grained access to feature
and feature property level.
• Validation
– User-defined quality and dissemination level
• Enriching
– Generation metadata, exploitation of reference data, linking to
environmental dataset
• Processing
– Analysis and mining exploiting e.g. R, Weka and RapidMiner
statistical frameworks
• Sharing
– User-driven process to decide how other agents (human / machine)
can access information
I-MARINE EXTENDED BOARD
16
17. Features Clustering with StatsCube
Presence
Points
(FishBase
+
Obis)
Density Based Clustering
DBSCAN
(with outliers)
Other methods are also
available …
K-Means
X-Means
I-MARINE EXTENDED BOARD
17
20. Not Only Access, Validation, Enriching,
Processing, Sharing
• It is always possible to save the discovered
data in various Standard formats
• It is always possible to collaborate with coworkers through a dedicated workspace.
• Mash-up data across diversity
– Accessing statistical datasets in SDMX, geo-referencing
them, describing them in ISO19139, and making them
available via OGC W*S standard protocols
– Accessing species observation datasets in DwC, analysing
their distribution trend via R, and projecting them in
geographical space
– Accessing species taxonomies in DwCA and publishing
them as reference data in SDMX
I-MARINE EXTENDED BOARD
20
21. Data Bonanza: a common vision
Integrate and harmonize crossdisciplinary data and information
across information systems and
workflows to support evidence-based
decision making
iMarine is implementing this vision through the
adoption of Standards, the identification of
common Methods and the implementation of
Tools which enable integration and
harmonization.
I-MARINE EXTENDED BOARD
21
22. Is this enough?
• An ecosystem of
participatory data eInfrastructures
• Regulated by policies
• Enabled by standards
• Promoting not only
access but mash-up of
heterogeneous data
User centric
I-MARINE EXTENDED BOARD
22
23. User-Centric View
User-centric view of an ecosystem of
participatory data e-Infrastructures to
• Cope with the overwhelming amount of data
and capacities
• Promote re-use of data
• Encourage sharing of resulting products
User-centric and workflow-oriented
I-MARINE EXTENDED BOARD
23
24. Virtual Research Environment
iMarine is user-centric and workflow-oriented thanks to
the gCube VRE technology
Virtual Research Environment (VRE) is
• a distributed and dynamically created environment
• where subset of data, services, computational, and
storage resources
• regulated by tailored policies
• are assigned to a subset of users via interfaces
• for a limited timeframe
• at little or no cost for the providers of
the participatory data e-infrastructures
L. Candela, D. Castelli, P. Pagano (2013) Virtual Research Environments: An Overview and a
Research Agenda. Data Science Journal, Vol. 12
I-MARINE EXTENDED BOARD
24
25. Flexible
Software Platform
Software platform to
abstract over differences
in location, protocols,
and models by
keeping failures partial and
temporary,
Storage, Discovery, Indexing,
Search, Execution, …
reacting to and recovering from
a large number of potential
issues.
I-MARINE EXTENDED BOARD
Feature-rich
Feature-rich
scaling no less than the
interfaced resources,
It turns resources and
technologies into a utility
by offering a single
registration, monitoring,
and access facilities
25
27. iMarine Exploitation models
Service
Data hosting
Infrastructure
Unlimited users, Infrastructure support, helpdesk, back-up, security
Validation (records)
Workspace
Hardware
Default Processing (<1MB)
Social Tool
Community Management
Storage 1TB
Cloud Resources
Validation (Datasets)
Custom Data Resources
Custom Processing (> 1MB)
Spatial Data integration
User Management
Large and Active Storage
Unlimited VRE’s
Hour/Day
Month
Year
27
28. Concept map of the products
I-MARINE OFFER
I-MARINE EXTENDED BOARD
28
29. Application Bundles
Management and interpretation of biological and
ecological data in the environment
Complete full life-cycle data framework, from
observational data to aggregated data repositories
enriched with validation and analytical tools
Storage and interpretation of geospatial explicit
information, including WPS processing
Flexible sharing, storage, reporting, search and
retrieval, aggregation and projection facilities
I-MARINE EXTENDED BOARD
A BUNDLE is
a set of
services and
technologie
s grouped
according to
a family of
related
tasks for ac
hieving a
common
objective
29