SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Downloaden Sie, um offline zu lesen
Big Data Europe : Empowering
Communities with Data Technologies
Dr Hajira Jabeen, University of Bonn
Apache Big Data Europe - Seville
Structure
◎Evolution of BDE Architecture
◎BDE Stack
◎Beyond the State of the Art
o Workflows (Support Layer)
o User Interface
o Data Lake (Ontario)
o Semantics Analytics Stack
2
Architectural design 1
Architectural design 2
4
Architectural design 3 (released)
5
Architecture for SC 7
6
Stack
Supported Components
Search/indexing Data processing
Apache Solr Apache Spark
Data acquisition Apache Flink
Apache Flume Semantic Components
Message passing Strabon
Apache Kafka Sextant
Data storage GeoTriples
Hue Silk
Apache Cassandra SEMAGROW
ScyllaDB LIMES
Apache Hive 4Store
Postgis OpenLink Virtuoso
7
User profiles
8
Platform installation
◎Manual installation guide
◎Using Docker Machine
o On local machine (VirtualBox)
o In cloud (AWS, DigitalOcean, Azure)
o Bare metal
◎Screencasts
9
Developing a component
◎Base Docker images
o Serve as a template for a (Big Data) technology
o Easily extendable custom algorithm/data
◎Published components
o Responsibilities divided b/w partners
o Image repositories on GitHub
o Automated builds on DockerHub
o Documentation on BDE Wiki
10
Deploying a Big Data Stack
◎Stack
collection of communicating components
to solve a specific problem
◎Described in Docker Compose
o Component configuration
o Application topology
◎Orchestrator required for initialization process
o Components may depend on each other
o Components may require manual intervention
11
User Interfaces
◎Target: Facilitate use of the platform
o User Interface Adaption
◎Available interfaces
o Workflow UIs
❖ Workflow Builder
❖ Workflow Monitor
o Swarm UI
o Integrator UI
12
BDE Workflow Builder
13
BDE Workflow Monitor
14
Swarm UI
15
Integrator UI
16
Beyond the State of the Art
17
Smart Big Data
Increase Big Data value by adding meaning to it!
18
Orchestration
◎ Goal: flexible composition of Big Data pipelines
◎ Microservice cooperation using shared
vocabulary
◎ Uses HTTP requests from a web based frontend
◎ Meta information stored in a triple store
◎ Microservice uses triple store as service
backend
19
Semantic Data Lake (Ontario)
◎Repository of data in its raw format
o Structured, semi-structured, unstructured
◎Schema-less
o No schema is defined on write, it is defined only
on read
◎Open to any kind of processing
20
Data Lake
21
Source: Big Data @ Microsoft - Raghu Ramakrishnan
Semantic Data Lake (Ontario)
◎Add a Semantic layer on top of the source
datasets
o Semantic data is handled as-is
o Non-Semantic data is semantically lifted using
existing ontology terms
22
Ontario: Architecture
23
Translate and execute Query via
Source-specific Access Method
Decompose to Source-specific
Entities
Decompose SPARQL Query
Semantic Analytics Stack
(SANSA)
24
SANSA: Motivation
◎ Abundant machine readable structured
information is available (e.g. in RDF)
o Across SCs, e.g. Life Science Data (OpenPhacts)
o General: DBpedia, Google knowledge graph
o Social graphs: Facebook, Twitter
◎ Need for scalable querying, inference and
machine learning
o Link prediction
o Knowledge base completion
o Predictive analytics
25
Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview
SANSA Stack
27
SANSA: Read Write Layer
◎Ingest RDF and OWL data in different
formats using Jena / OWL API style
interfaces
◎Represent data in multiple formats (e.g. RDD,
Data Frames, GraphX, Tensors)
◎Allow transformation among these formats
◎Compute dataset statistics and apply
functions to URIs, literals, subjects, objects →
Distributed LODStats
28
SANSA: Query Layer
◎To make generic queries efficient and fast
using:
o Intelligent indexing
o Splitting strategies
o Distributed Storage
o Distributed/ Federated Querying
◎Early work in progress: query evaluation
(SPARQL-to-SQL approaches, Virtual Views)
◎Provision of W3C SPARQL compliant
29
SANSA: Inference Layer
◎W3C Standards for Modelling: RDFS and
OWL
◎Parallel in-memory inference via rule-
based forward chaining
◎Beyond state of the art: dynamically
build a rule dependency graph for a rule
set
◎→ Adjustable performance levels
30
SANSA: ML Layer
◎Distributed Machine Learning (ML) algorithms
that work on RDF data and make use of its
structure / semantics
◎Work in Progress:
o Tensor Factorization for e.g. KB completion (testing
stage)
o Simple spatiotemporal analytics (idea stage)
o Graph Clustering (testing stage)
o Association rule mining (evaluation stage)
o Semantic Decision trees (idea stage)
31
Thank you
32
jabeen@iai.uni-bonn.de
33
BDE vs Hadoop distributions
Hortonworks Cloudera MapR Bigtop BDE
File System HDFS HDFS NFS HDFS HDFS
Installation Native Native Native Native lightweight
virtualization
Plug & play components (no
rigid schema)
no no no no yes
High Availability Single failure
recovery (yarn)
Single failure
recovery (yarn)
Self healing, mult.
failure rec.
Single failure
recovery (yarn)
Multiple Failure
recovery
Cost Commercial Commercial Commercial Free Free
Scaling Freemium Freemium Freemium Free Free
Addition of custom
components
Not easy No No No Yes
Integration testing yes yes yes yes --
Operating systems Linux Linux Linux Linux All
Management tool Ambari Cloudera manager MapR Control
system
- Docker swarm UI+
Custom
34
BDE vs Hadoop distributions
◎BDE is not built on top of existing distributions
◎Targets
o Communities
o Research institutions
◎Bridges scientists and open data
◎Multi Tier research efforts towards Smart
Data
35

Weitere ähnliche Inhalte

Was ist angesagt?

ER 2016 Tutorial
ER 2016 TutorialER 2016 Tutorial
ER 2016 TutorialRim Moussa
 
Release webinar: Sansa and Ontario
Release webinar: Sansa and OntarioRelease webinar: Sansa and Ontario
Release webinar: Sansa and OntarioBigData_Europe
 
Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019
Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019
Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019Codemotion
 
Ismis2014 dbaas expert
Ismis2014 dbaas expertIsmis2014 dbaas expert
Ismis2014 dbaas expertRim Moussa
 
Hacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsHacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsArangoDB Database
 
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...Miguel Pérez Colino
 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsChristophe Debruyne
 
polystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfpolystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfRim Moussa
 
ARIADNE: progress in the first nine month
ARIADNE: progress in the first nine monthARIADNE: progress in the first nine month
ARIADNE: progress in the first nine monthariadnenetwork
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
Semantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12cSemantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12cMartin Toshev
 
TPC-H analytics' scenarios and performances on Hadoop data clouds
TPC-H analytics' scenarios and performances on Hadoop data cloudsTPC-H analytics' scenarios and performances on Hadoop data clouds
TPC-H analytics' scenarios and performances on Hadoop data cloudsRim Moussa
 
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Sergio Fernández
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
 
Protecting privacy in practice
Protecting privacy in practiceProtecting privacy in practice
Protecting privacy in practiceLars Albertsson
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with GoJames Tan
 
Introduction to DGraph - A Graph Database
Introduction to DGraph - A Graph DatabaseIntroduction to DGraph - A Graph Database
Introduction to DGraph - A Graph DatabaseKnoldus Inc.
 
Xldb2011 tue 1120_youtube_datawarehouse
Xldb2011 tue 1120_youtube_datawarehouseXldb2011 tue 1120_youtube_datawarehouse
Xldb2011 tue 1120_youtube_datawarehouseliqiang xu
 

Was ist angesagt? (20)

ER 2016 Tutorial
ER 2016 TutorialER 2016 Tutorial
ER 2016 Tutorial
 
Release webinar: Sansa and Ontario
Release webinar: Sansa and OntarioRelease webinar: Sansa and Ontario
Release webinar: Sansa and Ontario
 
Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019
Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019
Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019
 
Ismis2014 dbaas expert
Ismis2014 dbaas expertIsmis2014 dbaas expert
Ismis2014 dbaas expert
 
Bicod2017
Bicod2017Bicod2017
Bicod2017
 
Hacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsHacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge Graphs
 
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
 
polystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfpolystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdf
 
ARIADNE: progress in the first nine month
ARIADNE: progress in the first nine monthARIADNE: progress in the first nine month
ARIADNE: progress in the first nine month
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Semantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12cSemantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12c
 
TPC-H analytics' scenarios and performances on Hadoop data clouds
TPC-H analytics' scenarios and performances on Hadoop data cloudsTPC-H analytics' scenarios and performances on Hadoop data clouds
TPC-H analytics' scenarios and performances on Hadoop data clouds
 
ISNCC 2017
ISNCC 2017ISNCC 2017
ISNCC 2017
 
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Protecting privacy in practice
Protecting privacy in practiceProtecting privacy in practice
Protecting privacy in practice
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with Go
 
Introduction to DGraph - A Graph Database
Introduction to DGraph - A Graph DatabaseIntroduction to DGraph - A Graph Database
Introduction to DGraph - A Graph Database
 
Xldb2011 tue 1120_youtube_datawarehouse
Xldb2011 tue 1120_youtube_datawarehouseXldb2011 tue 1120_youtube_datawarehouse
Xldb2011 tue 1120_youtube_datawarehouse
 

Andere mochten auch

Oozie workflow using HUE 2.2
Oozie workflow using HUE 2.2Oozie workflow using HUE 2.2
Oozie workflow using HUE 2.2Uday Vakalapudi
 
データ活用を効率化するHadoop WebUIと権限管理改善事例
データ活用を効率化するHadoop WebUIと権限管理改善事例データ活用を効率化するHadoop WebUIと権限管理改善事例
データ活用を効率化するHadoop WebUIと権限管理改善事例Masahiro Kiura
 
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop MeetupIntegrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetupgethue
 
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016Cloudera Japan
 
Hadoopの標準GUI HUEの最新情報
Hadoopの標準GUI HUEの最新情報Hadoopの標準GUI HUEの最新情報
Hadoopの標準GUI HUEの最新情報Cloudera Japan
 
Cloud Native Hadoop #cwt2016
Cloud Native Hadoop #cwt2016Cloud Native Hadoop #cwt2016
Cloud Native Hadoop #cwt2016Cloudera Japan
 
大規模データに対するデータサイエンスの進め方 #CWT2016
大規模データに対するデータサイエンスの進め方 #CWT2016大規模データに対するデータサイエンスの進め方 #CWT2016
大規模データに対するデータサイエンスの進め方 #CWT2016Cloudera Japan
 
データ分析グループの組織編制とその課題 マーケティングにおけるKPI設計の失敗例 ABテストの活用と、機械学習の導入 #CWT2016
データ分析グループの組織編制とその課題 マーケティングにおけるKPI設計の失敗例 ABテストの活用と、機械学習の導入 #CWT2016データ分析グループの組織編制とその課題 マーケティングにおけるKPI設計の失敗例 ABテストの活用と、機械学習の導入 #CWT2016
データ分析グループの組織編制とその課題 マーケティングにおけるKPI設計の失敗例 ABテストの活用と、機械学習の導入 #CWT2016Tokoroten Nakayama
 

Andere mochten auch (8)

Oozie workflow using HUE 2.2
Oozie workflow using HUE 2.2Oozie workflow using HUE 2.2
Oozie workflow using HUE 2.2
 
データ活用を効率化するHadoop WebUIと権限管理改善事例
データ活用を効率化するHadoop WebUIと権限管理改善事例データ活用を効率化するHadoop WebUIと権限管理改善事例
データ活用を効率化するHadoop WebUIと権限管理改善事例
 
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop MeetupIntegrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
 
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
 
Hadoopの標準GUI HUEの最新情報
Hadoopの標準GUI HUEの最新情報Hadoopの標準GUI HUEの最新情報
Hadoopの標準GUI HUEの最新情報
 
Cloud Native Hadoop #cwt2016
Cloud Native Hadoop #cwt2016Cloud Native Hadoop #cwt2016
Cloud Native Hadoop #cwt2016
 
大規模データに対するデータサイエンスの進め方 #CWT2016
大規模データに対するデータサイエンスの進め方 #CWT2016大規模データに対するデータサイエンスの進め方 #CWT2016
大規模データに対するデータサイエンスの進め方 #CWT2016
 
データ分析グループの組織編制とその課題 マーケティングにおけるKPI設計の失敗例 ABテストの活用と、機械学習の導入 #CWT2016
データ分析グループの組織編制とその課題 マーケティングにおけるKPI設計の失敗例 ABテストの活用と、機械学習の導入 #CWT2016データ分析グループの組織編制とその課題 マーケティングにおけるKPI設計の失敗例 ABテストの活用と、機械学習の導入 #CWT2016
データ分析グループの組織編制とその課題 マーケティングにおけるKPI設計の失敗例 ABテストの活用と、機械学習の導入 #CWT2016
 

Ähnlich wie Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview

Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriDemi Ben-Ari
 
BigDataEurope @BDVA Summit2016 1: The BDE Platform
BigDataEurope @BDVA Summit2016 1: The BDE PlatformBigDataEurope @BDVA Summit2016 1: The BDE Platform
BigDataEurope @BDVA Summit2016 1: The BDE PlatformBigData_Europe
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...Gezim Sejdiu
 
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data FramesACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data FramesWes McKinney
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
ICWE2017 BigDataEurope
ICWE2017 BigDataEuropeICWE2017 BigDataEurope
ICWE2017 BigDataEuropeBigData_Europe
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overviewBigData_Europe
 
Data Integration Solutions Created By Koneksys
Data Integration Solutions Created By KoneksysData Integration Solutions Created By Koneksys
Data Integration Solutions Created By KoneksysKoneksys
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...Felix Gessert
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
BDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical OverviewBDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical OverviewBigData_Europe
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overviewMartin Zapletal
 
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...BigData_Europe
 
Mongo db transcript
Mongo db transcriptMongo db transcript
Mongo db transcriptfoliba
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsOleg Magazov
 
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference KeynoteKingsley Uyi Idehen
 
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان دادهمعرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان دادهWeb Standards School
 

Ähnlich wie Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview (20)

Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
BigDataEurope @BDVA Summit2016 1: The BDE Platform
BigDataEurope @BDVA Summit2016 1: The BDE PlatformBigDataEurope @BDVA Summit2016 1: The BDE Platform
BigDataEurope @BDVA Summit2016 1: The BDE Platform
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
 
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data FramesACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data Frames
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
ICWE2017 BigDataEurope
ICWE2017 BigDataEuropeICWE2017 BigDataEurope
ICWE2017 BigDataEurope
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overview
 
Data Integration Solutions Created By Koneksys
Data Integration Solutions Created By KoneksysData Integration Solutions Created By Koneksys
Data Integration Solutions Created By Koneksys
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
BDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical OverviewBDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical Overview
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overview
 
Spark Workshop
Spark WorkshopSpark Workshop
Spark Workshop
 
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
 
Mongo db transcript
Mongo db transcriptMongo db transcript
Mongo db transcript
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
 
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان دادهمعرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
 

Kürzlich hochgeladen

Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceMartin Humpolec
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 

Kürzlich hochgeladen (20)

Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your Salesforce
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 

Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview

  • 1. Big Data Europe : Empowering Communities with Data Technologies Dr Hajira Jabeen, University of Bonn Apache Big Data Europe - Seville
  • 2. Structure ◎Evolution of BDE Architecture ◎BDE Stack ◎Beyond the State of the Art o Workflows (Support Layer) o User Interface o Data Lake (Ontario) o Semantics Analytics Stack 2
  • 5. Architectural design 3 (released) 5
  • 7. Supported Components Search/indexing Data processing Apache Solr Apache Spark Data acquisition Apache Flink Apache Flume Semantic Components Message passing Strabon Apache Kafka Sextant Data storage GeoTriples Hue Silk Apache Cassandra SEMAGROW ScyllaDB LIMES Apache Hive 4Store Postgis OpenLink Virtuoso 7
  • 9. Platform installation ◎Manual installation guide ◎Using Docker Machine o On local machine (VirtualBox) o In cloud (AWS, DigitalOcean, Azure) o Bare metal ◎Screencasts 9
  • 10. Developing a component ◎Base Docker images o Serve as a template for a (Big Data) technology o Easily extendable custom algorithm/data ◎Published components o Responsibilities divided b/w partners o Image repositories on GitHub o Automated builds on DockerHub o Documentation on BDE Wiki 10
  • 11. Deploying a Big Data Stack ◎Stack collection of communicating components to solve a specific problem ◎Described in Docker Compose o Component configuration o Application topology ◎Orchestrator required for initialization process o Components may depend on each other o Components may require manual intervention 11
  • 12. User Interfaces ◎Target: Facilitate use of the platform o User Interface Adaption ◎Available interfaces o Workflow UIs ❖ Workflow Builder ❖ Workflow Monitor o Swarm UI o Integrator UI 12
  • 17. Beyond the State of the Art 17
  • 18. Smart Big Data Increase Big Data value by adding meaning to it! 18
  • 19. Orchestration ◎ Goal: flexible composition of Big Data pipelines ◎ Microservice cooperation using shared vocabulary ◎ Uses HTTP requests from a web based frontend ◎ Meta information stored in a triple store ◎ Microservice uses triple store as service backend 19
  • 20. Semantic Data Lake (Ontario) ◎Repository of data in its raw format o Structured, semi-structured, unstructured ◎Schema-less o No schema is defined on write, it is defined only on read ◎Open to any kind of processing 20
  • 21. Data Lake 21 Source: Big Data @ Microsoft - Raghu Ramakrishnan
  • 22. Semantic Data Lake (Ontario) ◎Add a Semantic layer on top of the source datasets o Semantic data is handled as-is o Non-Semantic data is semantically lifted using existing ontology terms 22
  • 23. Ontario: Architecture 23 Translate and execute Query via Source-specific Access Method Decompose to Source-specific Entities Decompose SPARQL Query
  • 25. SANSA: Motivation ◎ Abundant machine readable structured information is available (e.g. in RDF) o Across SCs, e.g. Life Science Data (OpenPhacts) o General: DBpedia, Google knowledge graph o Social graphs: Facebook, Twitter ◎ Need for scalable querying, inference and machine learning o Link prediction o Knowledge base completion o Predictive analytics 25
  • 28. SANSA: Read Write Layer ◎Ingest RDF and OWL data in different formats using Jena / OWL API style interfaces ◎Represent data in multiple formats (e.g. RDD, Data Frames, GraphX, Tensors) ◎Allow transformation among these formats ◎Compute dataset statistics and apply functions to URIs, literals, subjects, objects → Distributed LODStats 28
  • 29. SANSA: Query Layer ◎To make generic queries efficient and fast using: o Intelligent indexing o Splitting strategies o Distributed Storage o Distributed/ Federated Querying ◎Early work in progress: query evaluation (SPARQL-to-SQL approaches, Virtual Views) ◎Provision of W3C SPARQL compliant 29
  • 30. SANSA: Inference Layer ◎W3C Standards for Modelling: RDFS and OWL ◎Parallel in-memory inference via rule- based forward chaining ◎Beyond state of the art: dynamically build a rule dependency graph for a rule set ◎→ Adjustable performance levels 30
  • 31. SANSA: ML Layer ◎Distributed Machine Learning (ML) algorithms that work on RDF data and make use of its structure / semantics ◎Work in Progress: o Tensor Factorization for e.g. KB completion (testing stage) o Simple spatiotemporal analytics (idea stage) o Graph Clustering (testing stage) o Association rule mining (evaluation stage) o Semantic Decision trees (idea stage) 31
  • 33. 33
  • 34. BDE vs Hadoop distributions Hortonworks Cloudera MapR Bigtop BDE File System HDFS HDFS NFS HDFS HDFS Installation Native Native Native Native lightweight virtualization Plug & play components (no rigid schema) no no no no yes High Availability Single failure recovery (yarn) Single failure recovery (yarn) Self healing, mult. failure rec. Single failure recovery (yarn) Multiple Failure recovery Cost Commercial Commercial Commercial Free Free Scaling Freemium Freemium Freemium Free Free Addition of custom components Not easy No No No Yes Integration testing yes yes yes yes -- Operating systems Linux Linux Linux Linux All Management tool Ambari Cloudera manager MapR Control system - Docker swarm UI+ Custom 34
  • 35. BDE vs Hadoop distributions ◎BDE is not built on top of existing distributions ◎Targets o Communities o Research institutions ◎Bridges scientists and open data ◎Multi Tier research efforts towards Smart Data 35