Comparing Sidecar-less Service Mesh from Cilium and Istio
Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview
1. Big Data Europe : Empowering
Communities with Data Technologies
Dr Hajira Jabeen, University of Bonn
Apache Big Data Europe - Seville
2. Structure
◎Evolution of BDE Architecture
◎BDE Stack
◎Beyond the State of the Art
o Workflows (Support Layer)
o User Interface
o Data Lake (Ontario)
o Semantics Analytics Stack
2
10. Developing a component
◎Base Docker images
o Serve as a template for a (Big Data) technology
o Easily extendable custom algorithm/data
◎Published components
o Responsibilities divided b/w partners
o Image repositories on GitHub
o Automated builds on DockerHub
o Documentation on BDE Wiki
10
11. Deploying a Big Data Stack
◎Stack
collection of communicating components
to solve a specific problem
◎Described in Docker Compose
o Component configuration
o Application topology
◎Orchestrator required for initialization process
o Components may depend on each other
o Components may require manual intervention
11
12. User Interfaces
◎Target: Facilitate use of the platform
o User Interface Adaption
◎Available interfaces
o Workflow UIs
❖ Workflow Builder
❖ Workflow Monitor
o Swarm UI
o Integrator UI
12
19. Orchestration
◎ Goal: flexible composition of Big Data pipelines
◎ Microservice cooperation using shared
vocabulary
◎ Uses HTTP requests from a web based frontend
◎ Meta information stored in a triple store
◎ Microservice uses triple store as service
backend
19
20. Semantic Data Lake (Ontario)
◎Repository of data in its raw format
o Structured, semi-structured, unstructured
◎Schema-less
o No schema is defined on write, it is defined only
on read
◎Open to any kind of processing
20
22. Semantic Data Lake (Ontario)
◎Add a Semantic layer on top of the source
datasets
o Semantic data is handled as-is
o Non-Semantic data is semantically lifted using
existing ontology terms
22
25. SANSA: Motivation
◎ Abundant machine readable structured
information is available (e.g. in RDF)
o Across SCs, e.g. Life Science Data (OpenPhacts)
o General: DBpedia, Google knowledge graph
o Social graphs: Facebook, Twitter
◎ Need for scalable querying, inference and
machine learning
o Link prediction
o Knowledge base completion
o Predictive analytics
25
28. SANSA: Read Write Layer
◎Ingest RDF and OWL data in different
formats using Jena / OWL API style
interfaces
◎Represent data in multiple formats (e.g. RDD,
Data Frames, GraphX, Tensors)
◎Allow transformation among these formats
◎Compute dataset statistics and apply
functions to URIs, literals, subjects, objects →
Distributed LODStats
28
29. SANSA: Query Layer
◎To make generic queries efficient and fast
using:
o Intelligent indexing
o Splitting strategies
o Distributed Storage
o Distributed/ Federated Querying
◎Early work in progress: query evaluation
(SPARQL-to-SQL approaches, Virtual Views)
◎Provision of W3C SPARQL compliant
29
30. SANSA: Inference Layer
◎W3C Standards for Modelling: RDFS and
OWL
◎Parallel in-memory inference via rule-
based forward chaining
◎Beyond state of the art: dynamically
build a rule dependency graph for a rule
set
◎→ Adjustable performance levels
30
31. SANSA: ML Layer
◎Distributed Machine Learning (ML) algorithms
that work on RDF data and make use of its
structure / semantics
◎Work in Progress:
o Tensor Factorization for e.g. KB completion (testing
stage)
o Simple spatiotemporal analytics (idea stage)
o Graph Clustering (testing stage)
o Association rule mining (evaluation stage)
o Semantic Decision trees (idea stage)
31
34. BDE vs Hadoop distributions
Hortonworks Cloudera MapR Bigtop BDE
File System HDFS HDFS NFS HDFS HDFS
Installation Native Native Native Native lightweight
virtualization
Plug & play components (no
rigid schema)
no no no no yes
High Availability Single failure
recovery (yarn)
Single failure
recovery (yarn)
Self healing, mult.
failure rec.
Single failure
recovery (yarn)
Multiple Failure
recovery
Cost Commercial Commercial Commercial Free Free
Scaling Freemium Freemium Freemium Free Free
Addition of custom
components
Not easy No No No Yes
Integration testing yes yes yes yes --
Operating systems Linux Linux Linux Linux All
Management tool Ambari Cloudera manager MapR Control
system
- Docker swarm UI+
Custom
34
35. BDE vs Hadoop distributions
◎BDE is not built on top of existing distributions
◎Targets
o Communities
o Research institutions
◎Bridges scientists and open data
◎Multi Tier research efforts towards Smart
Data
35