SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Downloaden Sie, um offline zu lesen
IATECH MÁLAGA
BIGOWL: Using Semantics to
Develop Big Data Analytics
Solutions
José Manuel García Nieto
jnieto@lcc.uma.es
IATECH MÁLAGA
Outline
• Introduction
• Concepts and Background
• Current practices in Big Data analytics
• Semantic modelling
• Overall approach
• Validation: Case studies
• Discussions
• Conclusions
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions2
IATECH MÁLAGA
Introduction
• Motivation
• Gartner’s report: An emerging challenge in Big
Data is to construct data-driven intelligent
applications that capture and inject domain
knowledge in the analytical processes, including
context and using a standardized format
• Context refers to all the relevant (meta)-information
to support the analysis and to help interpreting its
results
• This will facilitate the integration (in a standardized
way) with third parties’ data, algorithms, business
intelligence (BI) and visualization services
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions3
IATECH MÁLAGA
Introduction
• Motivation
• The use of semantics as contextual information will
enhance the analytical power of the algorithms, as well as the
reuse of single components in data analytics workflows
(Ristoski & Paulheim, 2016)
• The development of ways to make the domain knowledge
explicit and usable is needed to improve the data processing
and analysis tasks
• The Semantic Web technology can be used to annotate not
only the knowledge domain of the data, but also the analytics’
meta-data (Keet, Ławrynowicz et al., 2015)
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions4
IATECH MÁLAGA
Introduction
• Motivation
• Companies have
already realised
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions5
IATECH MÁLAGA
Introduction
• Motivation
• Companies have
already realised
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions6
IATECH MÁLAGA
Introduction
• Motivation
• Companies have
already realised
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions7
IATECH MÁLAGA
Introduction
• Motivation
• Companies have
already realised
• Administration
(European
Commission) too
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions8
IATECH MÁLAGA
Introduction
• Motivation
• Companies have
already realised
• Administration
(European
Commission) too
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions9
https://www.big-data-europe.eu/ http://www.bigdataocean.eu/
http://www.semagrow.eu/
IATECH MÁLAGA
Introduction
• Motivation
• Companies have
already realised
• Administration
(European
Commission) too
• In Academics we
aim at going one
step further
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions10
IATECH MÁLAGA
Introduction
• Motivation
• In Academics we aim at going one step further
• The Semantic Web technologies can be used to annotate not only the knowledge
domain of the data, but also the analytics’ meta-data, including: algorithms’ parameters,
input variables, tuning experiences, expected behaviours and taxonomies
• This will facilitate the reuse and composition of Big Data
analytics in a proper manner
• As well as to enhance the quality of consumed and produced
data
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions11
IATECH MÁLAGA
Introduction
• Hypothesis:
The semantic annotation of Big Data sources, components and algorithms can
acts as a link to capture and incorporate the domain knowledge to guide and
enhance the analytical processes
• In addition, the semantic annotation can provide the background for
reasoning methods based on axiomatic and rule logic recommendations
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions12
IATECH MÁLAGA
Introduction
• Proposal:
• Semantic model: ontology-driven approach to support knowledge
management in Big Data analytics workflows
• The proposed ontology is called BIGOWL (BIG data analytics OWL 2 ontology),
which acts as a formal schema for the representation and consolidation of
knowledge in Big Data analytics
• Knowledge incorporation is in turn beneficial for an efficient algorithmic performance, by
taking part in operator’s design, parameter selection, human interactive and decision-
making strategies
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions13
IATECH MÁLAGA
Concepts and Background
• Different sites and people will talk about everything from artificial
intelligence to natural language processing to linked data and the Semantic
Web
• What are they all?
• How do they relate to each other?
• How do they relate to you?
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions14
IATECH MÁLAGA
Concepts and Background
• Different sites and people will talk about everything from artificial
intelligence to natural language processing to linked data and the Semantic
Web
• What are they all?
• How do they relate to each other?
• How do they relate to you?
The Semantic Web, Web 3.0, the Linked Data Web, the Web of Data…whatever
you call it, the Semantic Web represents the next major evolution in connecting
information
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions15
IATECH MÁLAGA
Concepts and Background
• How is the “Semantic Web” Different?
• The word semantic itself implies meaning or understanding
• Semantic Web is concerned with the
meaning and not the structure of data
(such as, relational databases or
the World Wide Web itself)
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions16
IATECH MÁLAGA
Concepts and Background
• What Standards Apply to the Semantic Web?
• Mainly 4 technical standards:
• An Ontology provides a formal representation of the real world
• It defines an explicit description of concepts in a domain of discourse
(classes or concepts), properties of each concept describing various
features and attributes of the concept (properties) and restrictions on
properties
• RDF (Resource Description Framework): The data modelling language for the
Semantic Web. All Semantic Web information is stored and represented in the
RDF
• SPARQL (SPARQL Protocol and RDF Query Language): The query language of the
Semantic Web. It is specifically designed to query data across various systems
• OWL (Web Ontology Language): The schema language, or knowledge
representation (KR) language, of the Semantic Web
• OWL enables you to define concepts composably so that these concepts
can be reused as much and as often as possible
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions17
IATECH MÁLAGA
Concepts and Background
• What Standards Apply to the Semantic
Web?
• Imagine two relational tables of two
different databases: movies and cinema
rooms
• Imagine a program can automatically query
your Web site and any other site that has
movie scheduling information in order to
show a complete view in one place
The goal of Linked Data is to publish structured
data in such a way that it can be easily
consumed and combined with other Linked
Data
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions18
IATECH MÁLAGA
Concepts and Background
• How is the “Semantic Web” Different?
• Linked Data is the Semantic Web realized via four best practice principles
• Use URIs as names for things.
• An example of a URI is any URL
• Use HTTP URIs so that people can look up those names
• When someone looks up a URI, provide useful information, using the standards such as
RDF* and SPARQL
• Include links to other URIs so that they can discover more things
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions19
IATECH MÁLAGA
Concepts and Background
• How is the “Semantic Web”
Different?
• Once all the rows of our tables
have been uniquely identified,
made dereferenceable through
HTTP, and described with RDF, the
last step is providing links
between different rows across
different tables
• The main aim here is to make
explicit those links that were
implicit before shifting to the
Linked Data approach. In our
example, movies would be linked
to the theatres in which they are
playing
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions20
IATECH MÁLAGA
Concepts and Background
• How is the “Semantic Web” Different?
• Once our tables have been so published, the Linked Data rules do
their magic: people across the Web can start referencing and
consuming the data in our rows easily
• If we go further and link from our movies to external popular data
sets such Wikipedia and IMDB then we make it even easier for
people and computers to consume our data and combine it with
other data
• This provides our data with context
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions21
IATECH MÁLAGA
Current practices in Big Data analytics
• In current Big Data
technology ecosystems,
when facing a specific
data analytic task, it is
usual to support on
already existing tools
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions22
IATECH MÁLAGA
Current practices in Big Data analytics
• Besides technological or commercial aspects, current Big Data platforms still follow
the common procedure when facing data analytics tasks (ACM-SIGKDD, 2014), which
comprises typical steps of classical KDD:
• data collection,
• data transformation,
• data mining,
• pattern evaluation, and
• knowledge presentation (Visualization)
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions23
IATECH MÁLAGA
Semantic modelling
• The semantic model: BIGOWL
• Ontological scheme driving the whole process of Big Data analytics
• It is the terminological box (TBox) that defines the vocabulary with concepts and properties in the
domain of Big Data analysis
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions24
IATECH MÁLAGA
Semantic modelling
• The semantic model: BIGOWL
• It is the terminological box (TBox) that
defines the vocabulary with concepts and
properties (relationships) in the domain of
Big Data analysis
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions25
IATECH MÁLAGA
Semantic modelling
• The semantic model: BIGOWL
• It is the terminological box (TBox)
that defines the vocabulary with
concepts and properties in the
domain of Big Data analysis
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions26
IATECH MÁLAGA
Overall approach
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions27
IATECH MÁLAGA
Validation: Case studies
• Case study 1: Streaming processing of New York City traffic open-data
• Dynamic version of the bi-objective Traveling Salesman Problem (TSP), to minimize the
“travel time” and the “distance” to cover certain routing points in a urban area
• Open Data API provided by the
New York City Department of
Transportation
• Updates traffic information
several times per minute
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions28
IATECH MÁLAGA
Validation: Case studies
• Case study 1: Streaming processing of New York City traffic open-data
• Analyser: Multi-objective metaheuristic NSGA-II provided in jMetalSP. It which allows
parallel processing of evaluation functions in Apache Spark environment
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions29
IATECH MÁLAGA
Validation: Case studies
• Case study 1: Streaming processing of New York City traffic open-data
• Workflow
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions30
IATECH MÁLAGA
Validation: Case studies
• Case study 1: Streaming processing of New York City traffic open-data
• Ontology definition of this workflow
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions31
IATECH MÁLAGA
Validation: Case studies
• Case study 1: Streaming processing of New York City traffic open-data
• Workflow
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions32
IATECH MÁLAGA
Validation: Case studies
• Case study 1: Streaming
processing of New York City traffic
open-data
• Semantic annotation and querying
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions33
IATECH MÁLAGA
Validation: Case studies
• Case study 2: academic problem of Irish flower classification
• Classification algorithm: decision tree J48
• UCI Repository
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions34
IATECH MÁLAGA
Validation: Case studies
• Case study 2: academic problem of Irish flower classification
• For materialization, two different approaches have been used in this case:
• the well-known library for data mining Weka and
• the BigML SaaS API for analysis on-cloud
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions35
IATECH MÁLAGA
Validation: Case studies
• Case study 2: academic problem of Irish flower classification
• Ontology definition of this workflow
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions36
IATECH MÁLAGA
Validation: Case studies
• Case study 2: academic problem of Irish flower classification
• Analytic workflow
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions37
IATECH MÁLAGA
Validation: Case studies
• Case study 2: academic problem
of Irish flower classification
• Semantic annotation and querying
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions38
IATECH MÁLAGA
Validation: Case studies
• Case study 3: Reasoning
• SWRL rules to perform semantic reasoning jobs mainly devoted to check correctness of
workflows, e.i., to discover those components and tasks with (non-)compatible
connectivity of inputs/outputs, execution orders, data domains, data formats, data types,
etc
• SWRL rules are then evaluated by the reasoner after classifying Big Data components in
accordance with axioms
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions39
IATECH MÁLAGA
Validation: Case studies
• Case study 3: Reasoning
• SWRL rules to check correctness of workflows
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions40
IATECH MÁLAGA
Conclusions
• Experience in case studies revealed that BIGOWL approach is useful
when integrating knowledge domain concerning a specific analytic
problem
• Consequently, the integrated knowledge is used for guiding the
design of Big Data analytics workflows, by recommending next
components to be linked, and supporting final validation
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions41
IATECH MÁLAGA
Research agenda
• First phase to provide automatic facilities for ontology population, hence to enrich
the semantic approach
• To generate new and heterogeneous use cases of analytics workflows that would led
us to find and solve new possible deficiencies, as well as to enrich the knowledge
base
BIGOWL: Using Semantics to Develop Big Data Analytics Solutions42
IATECH MÁLAGA
BIGOWL: Using Semantics to
Develop Big Data Analytics
Solutions
José Manuel García Nieto
jnieto@lcc.uma.es

Weitere ähnliche Inhalte

Was ist angesagt?

ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)Piet J.H. Daas
 
Applied Data Science Course Part 2: the data science workflow and basic model...
Applied Data Science Course Part 2: the data science workflow and basic model...Applied Data Science Course Part 2: the data science workflow and basic model...
Applied Data Science Course Part 2: the data science workflow and basic model...Dataiku
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applicationsPadma Metta
 
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...VMware Tanzu
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 
F.A.I.R. Data with Knowledge Graphs & AI
F.A.I.R. Data with Knowledge Graphs & AIF.A.I.R. Data with Knowledge Graphs & AI
F.A.I.R. Data with Knowledge Graphs & AIFredric Landqvist
 
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)Mark Rittman
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesUpXAcademy
 
Data Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Thailand
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo UnstructuredCambridge Semantics
 
FAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
FAME.Q – A Formal approach to Master Quality in Enterprise Linked DataFAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
FAME.Q – A Formal approach to Master Quality in Enterprise Linked DataLinked Enterprise Date Services
 
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...DATAVERSITY
 
SKOS and Linked Data
SKOS and Linked DataSKOS and Linked Data
SKOS and Linked DataAntoine Isaac
 
Big Data Projects Research Ideas
Big Data Projects Research IdeasBig Data Projects Research Ideas
Big Data Projects Research IdeasMatlab Simulation
 
Modern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail BankingModern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail BankingCambridge Semantics
 
Self-service consumption Data Catalog
Self-service consumption Data CatalogSelf-service consumption Data Catalog
Self-service consumption Data CatalogDenodo
 

Was ist angesagt? (20)

ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
Applied Data Science Course Part 2: the data science workflow and basic model...
Applied Data Science Course Part 2: the data science workflow and basic model...Applied Data Science Course Part 2: the data science workflow and basic model...
Applied Data Science Course Part 2: the data science workflow and basic model...
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
 
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...
 
Data Activities in Austria
Data Activities in AustriaData Activities in Austria
Data Activities in Austria
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
F.A.I.R. Data with Knowledge Graphs & AI
F.A.I.R. Data with Knowledge Graphs & AIF.A.I.R. Data with Knowledge Graphs & AI
F.A.I.R. Data with Knowledge Graphs & AI
 
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science roles
 
Data Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk Management
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo Unstructured
 
FAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
FAME.Q – A Formal approach to Master Quality in Enterprise Linked DataFAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
FAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
 
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
 
Big Data
Big DataBig Data
Big Data
 
Introduction to BigData
Introduction to BigData Introduction to BigData
Introduction to BigData
 
SKOS and Linked Data
SKOS and Linked DataSKOS and Linked Data
SKOS and Linked Data
 
Big Data Projects Research Ideas
Big Data Projects Research IdeasBig Data Projects Research Ideas
Big Data Projects Research Ideas
 
Modern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail BankingModern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail Banking
 
Self-service consumption Data Catalog
Self-service consumption Data CatalogSelf-service consumption Data Catalog
Self-service consumption Data Catalog
 

Ähnlich wie Bigowl aitech

Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Trieu Nguyen
 
What is the future of data strategy?
What is the future of data strategy?What is the future of data strategy?
What is the future of data strategy?Denodo
 
How to make your data scientists happy
How to make your data scientists happy How to make your data scientists happy
How to make your data scientists happy Hussain Sultan
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformVMware Tanzu
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Prof.Balakrishnan S
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsDenodo
 
Big Data and Semantic Web in Manufacturing
Big Data and Semantic Web in ManufacturingBig Data and Semantic Web in Manufacturing
Big Data and Semantic Web in ManufacturingNitesh Khilwani
 
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?Denodo
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBMongoDB
 
Big Data Analytics with Microsoft
Big Data Analytics with MicrosoftBig Data Analytics with Microsoft
Big Data Analytics with MicrosoftCaserta
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Denodo
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIDenodo
 

Ähnlich wie Bigowl aitech (20)

Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
 
Big data
Big dataBig data
Big data
 
What is the future of data strategy?
What is the future of data strategy?What is the future of data strategy?
What is the future of data strategy?
 
How to make your data scientists happy
How to make your data scientists happy How to make your data scientists happy
How to make your data scientists happy
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment Options
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Trends in data analytics
Trends in data analyticsTrends in data analytics
Trends in data analytics
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
 
Big Data and Semantic Web in Manufacturing
Big Data and Semantic Web in ManufacturingBig Data and Semantic Web in Manufacturing
Big Data and Semantic Web in Manufacturing
 
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDB
 
Bigdata-Intro.pptx
Bigdata-Intro.pptxBigdata-Intro.pptx
Bigdata-Intro.pptx
 
Big Data Analytics with Microsoft
Big Data Analytics with MicrosoftBig Data Analytics with Microsoft
Big Data Analytics with Microsoft
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 

Kürzlich hochgeladen

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 

Kürzlich hochgeladen (20)

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 

Bigowl aitech

  • 1. IATECH MÁLAGA BIGOWL: Using Semantics to Develop Big Data Analytics Solutions José Manuel García Nieto jnieto@lcc.uma.es
  • 2. IATECH MÁLAGA Outline • Introduction • Concepts and Background • Current practices in Big Data analytics • Semantic modelling • Overall approach • Validation: Case studies • Discussions • Conclusions BIGOWL: Using Semantics to Develop Big Data Analytics Solutions2
  • 3. IATECH MÁLAGA Introduction • Motivation • Gartner’s report: An emerging challenge in Big Data is to construct data-driven intelligent applications that capture and inject domain knowledge in the analytical processes, including context and using a standardized format • Context refers to all the relevant (meta)-information to support the analysis and to help interpreting its results • This will facilitate the integration (in a standardized way) with third parties’ data, algorithms, business intelligence (BI) and visualization services BIGOWL: Using Semantics to Develop Big Data Analytics Solutions3
  • 4. IATECH MÁLAGA Introduction • Motivation • The use of semantics as contextual information will enhance the analytical power of the algorithms, as well as the reuse of single components in data analytics workflows (Ristoski & Paulheim, 2016) • The development of ways to make the domain knowledge explicit and usable is needed to improve the data processing and analysis tasks • The Semantic Web technology can be used to annotate not only the knowledge domain of the data, but also the analytics’ meta-data (Keet, Ławrynowicz et al., 2015) BIGOWL: Using Semantics to Develop Big Data Analytics Solutions4
  • 5. IATECH MÁLAGA Introduction • Motivation • Companies have already realised BIGOWL: Using Semantics to Develop Big Data Analytics Solutions5
  • 6. IATECH MÁLAGA Introduction • Motivation • Companies have already realised BIGOWL: Using Semantics to Develop Big Data Analytics Solutions6
  • 7. IATECH MÁLAGA Introduction • Motivation • Companies have already realised BIGOWL: Using Semantics to Develop Big Data Analytics Solutions7
  • 8. IATECH MÁLAGA Introduction • Motivation • Companies have already realised • Administration (European Commission) too BIGOWL: Using Semantics to Develop Big Data Analytics Solutions8
  • 9. IATECH MÁLAGA Introduction • Motivation • Companies have already realised • Administration (European Commission) too BIGOWL: Using Semantics to Develop Big Data Analytics Solutions9 https://www.big-data-europe.eu/ http://www.bigdataocean.eu/ http://www.semagrow.eu/
  • 10. IATECH MÁLAGA Introduction • Motivation • Companies have already realised • Administration (European Commission) too • In Academics we aim at going one step further BIGOWL: Using Semantics to Develop Big Data Analytics Solutions10
  • 11. IATECH MÁLAGA Introduction • Motivation • In Academics we aim at going one step further • The Semantic Web technologies can be used to annotate not only the knowledge domain of the data, but also the analytics’ meta-data, including: algorithms’ parameters, input variables, tuning experiences, expected behaviours and taxonomies • This will facilitate the reuse and composition of Big Data analytics in a proper manner • As well as to enhance the quality of consumed and produced data BIGOWL: Using Semantics to Develop Big Data Analytics Solutions11
  • 12. IATECH MÁLAGA Introduction • Hypothesis: The semantic annotation of Big Data sources, components and algorithms can acts as a link to capture and incorporate the domain knowledge to guide and enhance the analytical processes • In addition, the semantic annotation can provide the background for reasoning methods based on axiomatic and rule logic recommendations BIGOWL: Using Semantics to Develop Big Data Analytics Solutions12
  • 13. IATECH MÁLAGA Introduction • Proposal: • Semantic model: ontology-driven approach to support knowledge management in Big Data analytics workflows • The proposed ontology is called BIGOWL (BIG data analytics OWL 2 ontology), which acts as a formal schema for the representation and consolidation of knowledge in Big Data analytics • Knowledge incorporation is in turn beneficial for an efficient algorithmic performance, by taking part in operator’s design, parameter selection, human interactive and decision- making strategies BIGOWL: Using Semantics to Develop Big Data Analytics Solutions13
  • 14. IATECH MÁLAGA Concepts and Background • Different sites and people will talk about everything from artificial intelligence to natural language processing to linked data and the Semantic Web • What are they all? • How do they relate to each other? • How do they relate to you? BIGOWL: Using Semantics to Develop Big Data Analytics Solutions14
  • 15. IATECH MÁLAGA Concepts and Background • Different sites and people will talk about everything from artificial intelligence to natural language processing to linked data and the Semantic Web • What are they all? • How do they relate to each other? • How do they relate to you? The Semantic Web, Web 3.0, the Linked Data Web, the Web of Data…whatever you call it, the Semantic Web represents the next major evolution in connecting information BIGOWL: Using Semantics to Develop Big Data Analytics Solutions15
  • 16. IATECH MÁLAGA Concepts and Background • How is the “Semantic Web” Different? • The word semantic itself implies meaning or understanding • Semantic Web is concerned with the meaning and not the structure of data (such as, relational databases or the World Wide Web itself) BIGOWL: Using Semantics to Develop Big Data Analytics Solutions16
  • 17. IATECH MÁLAGA Concepts and Background • What Standards Apply to the Semantic Web? • Mainly 4 technical standards: • An Ontology provides a formal representation of the real world • It defines an explicit description of concepts in a domain of discourse (classes or concepts), properties of each concept describing various features and attributes of the concept (properties) and restrictions on properties • RDF (Resource Description Framework): The data modelling language for the Semantic Web. All Semantic Web information is stored and represented in the RDF • SPARQL (SPARQL Protocol and RDF Query Language): The query language of the Semantic Web. It is specifically designed to query data across various systems • OWL (Web Ontology Language): The schema language, or knowledge representation (KR) language, of the Semantic Web • OWL enables you to define concepts composably so that these concepts can be reused as much and as often as possible BIGOWL: Using Semantics to Develop Big Data Analytics Solutions17
  • 18. IATECH MÁLAGA Concepts and Background • What Standards Apply to the Semantic Web? • Imagine two relational tables of two different databases: movies and cinema rooms • Imagine a program can automatically query your Web site and any other site that has movie scheduling information in order to show a complete view in one place The goal of Linked Data is to publish structured data in such a way that it can be easily consumed and combined with other Linked Data BIGOWL: Using Semantics to Develop Big Data Analytics Solutions18
  • 19. IATECH MÁLAGA Concepts and Background • How is the “Semantic Web” Different? • Linked Data is the Semantic Web realized via four best practice principles • Use URIs as names for things. • An example of a URI is any URL • Use HTTP URIs so that people can look up those names • When someone looks up a URI, provide useful information, using the standards such as RDF* and SPARQL • Include links to other URIs so that they can discover more things BIGOWL: Using Semantics to Develop Big Data Analytics Solutions19
  • 20. IATECH MÁLAGA Concepts and Background • How is the “Semantic Web” Different? • Once all the rows of our tables have been uniquely identified, made dereferenceable through HTTP, and described with RDF, the last step is providing links between different rows across different tables • The main aim here is to make explicit those links that were implicit before shifting to the Linked Data approach. In our example, movies would be linked to the theatres in which they are playing BIGOWL: Using Semantics to Develop Big Data Analytics Solutions20
  • 21. IATECH MÁLAGA Concepts and Background • How is the “Semantic Web” Different? • Once our tables have been so published, the Linked Data rules do their magic: people across the Web can start referencing and consuming the data in our rows easily • If we go further and link from our movies to external popular data sets such Wikipedia and IMDB then we make it even easier for people and computers to consume our data and combine it with other data • This provides our data with context BIGOWL: Using Semantics to Develop Big Data Analytics Solutions21
  • 22. IATECH MÁLAGA Current practices in Big Data analytics • In current Big Data technology ecosystems, when facing a specific data analytic task, it is usual to support on already existing tools BIGOWL: Using Semantics to Develop Big Data Analytics Solutions22
  • 23. IATECH MÁLAGA Current practices in Big Data analytics • Besides technological or commercial aspects, current Big Data platforms still follow the common procedure when facing data analytics tasks (ACM-SIGKDD, 2014), which comprises typical steps of classical KDD: • data collection, • data transformation, • data mining, • pattern evaluation, and • knowledge presentation (Visualization) BIGOWL: Using Semantics to Develop Big Data Analytics Solutions23
  • 24. IATECH MÁLAGA Semantic modelling • The semantic model: BIGOWL • Ontological scheme driving the whole process of Big Data analytics • It is the terminological box (TBox) that defines the vocabulary with concepts and properties in the domain of Big Data analysis BIGOWL: Using Semantics to Develop Big Data Analytics Solutions24
  • 25. IATECH MÁLAGA Semantic modelling • The semantic model: BIGOWL • It is the terminological box (TBox) that defines the vocabulary with concepts and properties (relationships) in the domain of Big Data analysis BIGOWL: Using Semantics to Develop Big Data Analytics Solutions25
  • 26. IATECH MÁLAGA Semantic modelling • The semantic model: BIGOWL • It is the terminological box (TBox) that defines the vocabulary with concepts and properties in the domain of Big Data analysis BIGOWL: Using Semantics to Develop Big Data Analytics Solutions26
  • 27. IATECH MÁLAGA Overall approach BIGOWL: Using Semantics to Develop Big Data Analytics Solutions27
  • 28. IATECH MÁLAGA Validation: Case studies • Case study 1: Streaming processing of New York City traffic open-data • Dynamic version of the bi-objective Traveling Salesman Problem (TSP), to minimize the “travel time” and the “distance” to cover certain routing points in a urban area • Open Data API provided by the New York City Department of Transportation • Updates traffic information several times per minute BIGOWL: Using Semantics to Develop Big Data Analytics Solutions28
  • 29. IATECH MÁLAGA Validation: Case studies • Case study 1: Streaming processing of New York City traffic open-data • Analyser: Multi-objective metaheuristic NSGA-II provided in jMetalSP. It which allows parallel processing of evaluation functions in Apache Spark environment BIGOWL: Using Semantics to Develop Big Data Analytics Solutions29
  • 30. IATECH MÁLAGA Validation: Case studies • Case study 1: Streaming processing of New York City traffic open-data • Workflow BIGOWL: Using Semantics to Develop Big Data Analytics Solutions30
  • 31. IATECH MÁLAGA Validation: Case studies • Case study 1: Streaming processing of New York City traffic open-data • Ontology definition of this workflow BIGOWL: Using Semantics to Develop Big Data Analytics Solutions31
  • 32. IATECH MÁLAGA Validation: Case studies • Case study 1: Streaming processing of New York City traffic open-data • Workflow BIGOWL: Using Semantics to Develop Big Data Analytics Solutions32
  • 33. IATECH MÁLAGA Validation: Case studies • Case study 1: Streaming processing of New York City traffic open-data • Semantic annotation and querying BIGOWL: Using Semantics to Develop Big Data Analytics Solutions33
  • 34. IATECH MÁLAGA Validation: Case studies • Case study 2: academic problem of Irish flower classification • Classification algorithm: decision tree J48 • UCI Repository BIGOWL: Using Semantics to Develop Big Data Analytics Solutions34
  • 35. IATECH MÁLAGA Validation: Case studies • Case study 2: academic problem of Irish flower classification • For materialization, two different approaches have been used in this case: • the well-known library for data mining Weka and • the BigML SaaS API for analysis on-cloud BIGOWL: Using Semantics to Develop Big Data Analytics Solutions35
  • 36. IATECH MÁLAGA Validation: Case studies • Case study 2: academic problem of Irish flower classification • Ontology definition of this workflow BIGOWL: Using Semantics to Develop Big Data Analytics Solutions36
  • 37. IATECH MÁLAGA Validation: Case studies • Case study 2: academic problem of Irish flower classification • Analytic workflow BIGOWL: Using Semantics to Develop Big Data Analytics Solutions37
  • 38. IATECH MÁLAGA Validation: Case studies • Case study 2: academic problem of Irish flower classification • Semantic annotation and querying BIGOWL: Using Semantics to Develop Big Data Analytics Solutions38
  • 39. IATECH MÁLAGA Validation: Case studies • Case study 3: Reasoning • SWRL rules to perform semantic reasoning jobs mainly devoted to check correctness of workflows, e.i., to discover those components and tasks with (non-)compatible connectivity of inputs/outputs, execution orders, data domains, data formats, data types, etc • SWRL rules are then evaluated by the reasoner after classifying Big Data components in accordance with axioms BIGOWL: Using Semantics to Develop Big Data Analytics Solutions39
  • 40. IATECH MÁLAGA Validation: Case studies • Case study 3: Reasoning • SWRL rules to check correctness of workflows BIGOWL: Using Semantics to Develop Big Data Analytics Solutions40
  • 41. IATECH MÁLAGA Conclusions • Experience in case studies revealed that BIGOWL approach is useful when integrating knowledge domain concerning a specific analytic problem • Consequently, the integrated knowledge is used for guiding the design of Big Data analytics workflows, by recommending next components to be linked, and supporting final validation BIGOWL: Using Semantics to Develop Big Data Analytics Solutions41
  • 42. IATECH MÁLAGA Research agenda • First phase to provide automatic facilities for ontology population, hence to enrich the semantic approach • To generate new and heterogeneous use cases of analytics workflows that would led us to find and solve new possible deficiencies, as well as to enrich the knowledge base BIGOWL: Using Semantics to Develop Big Data Analytics Solutions42
  • 43. IATECH MÁLAGA BIGOWL: Using Semantics to Develop Big Data Analytics Solutions José Manuel García Nieto jnieto@lcc.uma.es