SlideShare a Scribd company logo
1 of 24
Open Data for Agriculture
Intro to Big Data
29/11/2013
Athens, Greece
Joint offering by

Supported by EU projects
Intro to Big Data

Antonis Koukourikos
NCSR “Demokritos”
Presentation Outline
• What is Big Data?
• Semantic Web Technologies

• What Semantic Web brings into the picture

Slide 3 of 25
Part 1

WHAT IS BIG DATA?
Big Data Is…

Data whose scale, diversity, and complexity
require new architecture, techniques, algorithms,
and analytics to manage it and extract value and
hidden knowledge from it

Slide 5 of 25
Big Data Sources
• Biomedical Information

• Sensor Data
• Logs
• E-mails
• Satellite images
• Audio and Video Streams
• Social Networks

Slide 6 of 25
Big Data Challenges – “The Three Vs”
…or is it 4…?

Veracity
Volume

Variety
Velocity

…or is it 6… ??

Visualization

Value

Slide 7 of 25
Big Data demand…
• Storage
– Impractical or impossible to use centralized storage
• Distribution
• Federation

– Indexing is a problem of itself

• Computational power
– For discovering
– For searching / retrieving
– For joining

• Human effort and expertise
– Querying can become complex
– Are you sure you exploit all this information?
Slide 8 of 25
Part 2

SEMANTIC WEB TECHNOLOGIES
The Syntactic and the Semantic Web
• The World Wide Web represents information
using natural language, graphics, multimedia...
– Humans can process and combine these
information easily
– However, machines are ignorant!

• The Semantic Web is a Web with a meaning
– A web of data that is understandable by the
machines

Slide 10 of 25
Semantic Web Technologies
• Common formats for integration and combination of data
drawn from diverse sources, whereas the original Web
mainly concentrated on the interchange of documents.
• For defining
– RDFS http://www.w3.org/TR/rdf-schema/
– OWL http://www.w3.org/TR/owl2-overview/

• For describing
– RDF http://www.w3.org/RDF/

• For querying
– SPARQL http://www.w3.org/TR/2013/REC-sparql11-query-20130321/

Slide 11 of 25
What SW can do
• Handle heterogeneity
• Handle evolution / variability
• Elicit inferred knowledge

• Volume is still the challenge

Slide 12 of 25
Part 3

WHAT SEMANTIC WEB BRINGS IN THE BIG
DATA PICTURE
Moving Forward with “Old” Technologies
OAI-PMH Service
Provider #1

OAI-PMH Service
Provider #n

Schema #1

Schema #n

HARVESTER

SPARQL endpoint

SPARQL endpoint

(Data Source #1)

(Data Source #n)

Common Schema

RDF Triple Store

How Many?
Is it
feasible?

Aggregated
XML Repository

INDEXER

AGRIS AP Schema

BigData
Problem!

IEEE LOM Schema

INDEXER
DC Schema
...

SPARQL endpoint

Web Portals

Web Portals

Open AGRIS (FAO)
AgLR/GLN (ARIADNE)
Organic.Edunet (UAH)
VOA3R (UAH)
...

NOW (2012) CASE OF AGRICULTURAL INFRASTRUCTURES

2015 (AgINFRA) CASE OF AGRICULTURAL INFRASTRUCTURES

Slide 14 of 25
What Semantic Web can bring into the picture
• One Data Access Point for One Data AccessClient for the entire Data Cloud
Point
– Enabling Service-Data level agreements with Data providers

• Application-level Vocabularies / Thesauri / Ontologies
SemaGrow
SPARQL endpoint

– Enabling different application facets for different communities of users over the SAME data pool
Query
Resource Discovery

Query Decomposition
query
patterns

Query Decomposer

• Going beyond existing Distributed
Triple Store Implementations
Resource Selector

query
pattern

Set of
query
patterns

Candidate Source(s) List
Instance Statistics
Load Info
Semantic Proximity

equivalent Semantic
patterns Proximity

Query Pattern Discovery
Service

Instance
Statistics

Ctrl

Data Source(s) Selector

Reactivity
parameters

–Link Heterogeneous but Semantically Connected
Data
–Index Extremely Large Information Volumes (Peta
Sizes)
–Improve Information Retrieval response
query fragment,
Source
(#1)

query fragment,
Source
(#n)

Query
results

Ctrl

Load Info

Data Summaries
SPARQL endpoint

Instance Statistics

query fragment,
target Source

POWDER
Inference Layer

Query Transformation
Service

Query Manager

Ctrl

transformed query

query
request #1

Schema
Mappings

query
request #n

•

Instance Statistics

SPARQL
query
query
results

query results schema

Data Summaries

Query Results Merger

P-Store
transformed schema

SPARQL
query
query
results

Federated endpoint Wrapper

Data (+Metadata)
physically stored in Data
Provider

No need for harvesting
•
Vocabularies / Thesauri /
Ontologies of Data Provider
SPARQL endpoint
(Data choice
Source #n)
– No need for aligning
according to common
schemas
SPARQL endpoint
–
(Data Source #1)

Slide 15 of 25
The SemaGrow Solution
• Use POWDER to mass-annotate large-subspaces
– Exploit naming convention regularities to compress
the indexes used by the system

• Partition triple patterns in the original query
• Annotate each fragment with an ordered list of
data sources most likely to contain relevant data
• Distribute and transform the query fragments
• Collect and align the results

Slide 16 of 25
The POWDER W3C Recommendation
• Exploits natural groupings of URIs to annotate all
resources in a subset of the URI space
• Regular expression based grouping

• Allows properties and their values to be
associated with an arbitrary number of subjects
within a fully-defined semantic framework
•
•

POWDER Description Resources: http://www.w3.org/TR/powder-dr/
POWDER Formal Semantics: http://www.w3.org/TR/powder-formal/

Slide 17 of 25
The SemaGrow Stack
• Integrates the components in order to offer a single
SPARQL endpoint that federates a number of
heterogeneous data sources
• Targets the federation of independently provided
data sources

Slide 18 of 25
SemaGrow Architecture
Client

SemaGrow
SPARQL endpoint
Query
Resource Discovery

Query Decomposition
query
patterns

Resource Selector

Resource Discovery
query
pattern

Set of
query
patterns

Candidate Source(s) List
Instance Statistics
Load Info
Semantic Proximity

equivalent Semantic
patterns Proximity

Query Pattern Discovery
Service

Instance
Statistics

Query Decomposer
Ctrl

Query
Decomposition
Data Source(s) Selector

Reactivity
parameters

query fragment,
Source
(#1)

query fragment,
Source
(#n)

Query
results

Ctrl

Load Info

Data Summaries
SPARQL endpoint

Instance Statistics
query fragment,
target Source

Data
Summaries
Endpoint
POWDER
Inference Layer

Query Transformation
Service

Query Manager
Ctrl

transformed query

Federated Endpoint
Wrapper
query
request #1

Schema
Mappings

query
request #n

Instance Statistics

SPARQL
query
query
results

query results schema

Data Summaries

SPARQL endpoint
(Data Source #1)

Query Results Merger

P-Store
transformed schema

SPARQL
query
query
results

SPARQL endpoint
(Data Source #n)

Federated endpoint Wrapper

Slide 19 of 25
Use Cases (DLO)

Heterogeneous Data Collections &
Streams
 Big data:
–
–
–
–

Sensor data: soil data, weather
GIS data: land usage, forest and natural resources management data
Historical data: crop yield, economic data
Forecasts: climate change models

 Problem:
– Combine heterogeneous sources to analyze past food production and
forecast future trends
– Cannot clone and translate: large scale, live data streams
– Cannot immediately and directly affect radical re-design of all sensing
and processing currently in place
3rd Plenary & ESG Meeting

21/10/2013
Slide 24 of 25
Use Cases (FAO)

Reactive Data Analysis
 Big data:
– Document collections: past experiences, analysis and research results
– Databases: climate conditions and crop yield observations, economic
data (land and food prices)

 Problem:
– Retrieving complete and accurate information to compile reports
• Raw data and reports, scientific publications, etc.

– Wastes human resources that could analyze data and synthesize useful
knowledge and advice for food production
• Too much time spent cross-relating responses from different sources

– Too many different organizations and processes rely on the different
schemas to make re-design viable
– Cloning is inefficient: large and constantly updated stores
3rd Plenary & ESG Meeting

21/10/2013
Slide 25 of 25
Use Cases (AK)

Reactive Resource Discovery
 Big data:
– Multimedia content about agriculture and biodiversity

 Problem:
– Real-time retrieval of relevant content
– Used to compile educational activities
– Schema heterogeneity:
• Different providers (Oganic edunet, Europeana, VOA3R, etc.)

– Too many different organizations and processes rely on the different
schema to make re-design viable
– Cloning is inefficient: large and constantly updated stores
3rd Plenary & ESG Meeting

21/10/2013
Slide 26 of 25
Project Info
• SemaGrow: Data intensive techniques to boost the realtime performance of global agricultural data infrastructures
• FP7-ICT-2011.4.4 (Intelligent Information Management)
No.

Name

1

Universidad de Alcala

2

NCSR “Demokritos”

3

Universita Degli Studi di Roma Tor Vergata

4

Semantic Web Company

5

Institut Za Fiziku

6

Stichting Dienst Landbouwkundik Onderzoek

7

Food and Agriculture Organization of the UN

8

Countr
y

Agroknow Technologies
Slide 27 of 25
Thank you!

Antonis Koukourikos
NCSR “Demokritos”
kukurik@iit.Demokritos.gr

More Related Content

What's hot

FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows Carole Goble
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeVince Smith
 
FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIRDOM
 
Towards a Unified PageRank for DBpedia and Wikidata
Towards a Unified PageRank for DBpedia and WikidataTowards a Unified PageRank for DBpedia and Wikidata
Towards a Unified PageRank for DBpedia and WikidataAndreas Thalhammer
 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Carole Goble
 
GeoChronos: An On-line Collaborative Platform for Earth Observation Scientists
GeoChronos: An On-line Collaborative Platform for Earth Observation ScientistsGeoChronos: An On-line Collaborative Platform for Earth Observation Scientists
GeoChronos: An On-line Collaborative Platform for Earth Observation ScientistsGeoChronos
 
Cloudflow – A Framework for MapReduce Pipeline Development in Biomedical Rese...
Cloudflow – A Framework for MapReduce Pipeline Development in Biomedical Rese...Cloudflow – A Framework for MapReduce Pipeline Development in Biomedical Rese...
Cloudflow – A Framework for MapReduce Pipeline Development in Biomedical Rese...Lukas Forer
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...Carole Goble
 
NCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - OverviewNCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - Overviewimgcommcall
 
Introduction to PANGAEA & EURO-BASIN Data Management, by Janine Felden
Introduction to PANGAEA & EURO-BASIN Data Management, by Janine FeldenIntroduction to PANGAEA & EURO-BASIN Data Management, by Janine Felden
Introduction to PANGAEA & EURO-BASIN Data Management, by Janine FeldenDTU - Technical University of Denmark
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Carole Goble
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.FAIRDOM
 
Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher? Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher? Carole Goble
 
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)OpenAIRE
 
Gianluca Correndo, Simon Crowle, Juri Papay and Michael Boniface | Enhancing ...
Gianluca Correndo, Simon Crowle, Juri Papay and Michael Boniface | Enhancing ...Gianluca Correndo, Simon Crowle, Juri Papay and Michael Boniface | Enhancing ...
Gianluca Correndo, Simon Crowle, Juri Papay and Michael Boniface | Enhancing ...semanticsconference
 
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)OpenAIRE
 
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...semanticsconference
 
Near Duplicate Detection for Medical Imaging Data Warehouse Construction
Near Duplicate Detection for Medical Imaging Data Warehouse ConstructionNear Duplicate Detection for Medical Imaging Data Warehouse Construction
Near Duplicate Detection for Medical Imaging Data Warehouse ConstructionPradeeban Kathiravelu, Ph.D.
 
Going for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial MetadataGoing for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial MetadataEDINA, University of Edinburgh
 

What's hot (20)

FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIR data and model management for systems biology.
FAIR data and model management for systems biology.
 
Towards a Unified PageRank for DBpedia and Wikidata
Towards a Unified PageRank for DBpedia and WikidataTowards a Unified PageRank for DBpedia and Wikidata
Towards a Unified PageRank for DBpedia and Wikidata
 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
 
GeoChronos: An On-line Collaborative Platform for Earth Observation Scientists
GeoChronos: An On-line Collaborative Platform for Earth Observation ScientistsGeoChronos: An On-line Collaborative Platform for Earth Observation Scientists
GeoChronos: An On-line Collaborative Platform for Earth Observation Scientists
 
Cloudflow – A Framework for MapReduce Pipeline Development in Biomedical Rese...
Cloudflow – A Framework for MapReduce Pipeline Development in Biomedical Rese...Cloudflow – A Framework for MapReduce Pipeline Development in Biomedical Rese...
Cloudflow – A Framework for MapReduce Pipeline Development in Biomedical Rese...
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
 
Citizen Science Open Data
Citizen Science Open DataCitizen Science Open Data
Citizen Science Open Data
 
NCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - OverviewNCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - Overview
 
Introduction to PANGAEA & EURO-BASIN Data Management, by Janine Felden
Introduction to PANGAEA & EURO-BASIN Data Management, by Janine FeldenIntroduction to PANGAEA & EURO-BASIN Data Management, by Janine Felden
Introduction to PANGAEA & EURO-BASIN Data Management, by Janine Felden
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.
 
Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher? Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher?
 
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
 
Gianluca Correndo, Simon Crowle, Juri Papay and Michael Boniface | Enhancing ...
Gianluca Correndo, Simon Crowle, Juri Papay and Michael Boniface | Enhancing ...Gianluca Correndo, Simon Crowle, Juri Papay and Michael Boniface | Enhancing ...
Gianluca Correndo, Simon Crowle, Juri Papay and Michael Boniface | Enhancing ...
 
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
 
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...
 
Near Duplicate Detection for Medical Imaging Data Warehouse Construction
Near Duplicate Detection for Medical Imaging Data Warehouse ConstructionNear Duplicate Detection for Medical Imaging Data Warehouse Construction
Near Duplicate Detection for Medical Imaging Data Warehouse Construction
 
Going for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial MetadataGoing for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial Metadata
 

Viewers also liked

Khemjira Plongsawai- My Portfolio forblog
Khemjira Plongsawai- My Portfolio forblogKhemjira Plongsawai- My Portfolio forblog
Khemjira Plongsawai- My Portfolio forblogKhemjira_P
 
12m start 2012
12m start 201212m start 2012
12m start 2012mapple2012
 
Jim Ziegler: Alpha Dawg Prosperity and Productivity
Jim Ziegler: Alpha Dawg Prosperity and ProductivityJim Ziegler: Alpha Dawg Prosperity and Productivity
Jim Ziegler: Alpha Dawg Prosperity and ProductivitySean Bradley
 
Scott Pechstein: No Thanks, I'm just looking
Scott Pechstein: No Thanks, I'm just looking Scott Pechstein: No Thanks, I'm just looking
Scott Pechstein: No Thanks, I'm just looking Sean Bradley
 
El conillet Ramonet
El conillet RamonetEl conillet Ramonet
El conillet RamonetAngymor3
 
Curriculum Vitae - Loutfy H. Madkour (2)
Curriculum Vitae - Loutfy H. Madkour (2)Curriculum Vitae - Loutfy H. Madkour (2)
Curriculum Vitae - Loutfy H. Madkour (2)Al Baha University
 
Inv pres q42014_final
Inv pres q42014_finalInv pres q42014_final
Inv pres q42014_finalCNOServices
 
Loctite Soluções 2012
Loctite Soluções 2012Loctite Soluções 2012
Loctite Soluções 2012mapple2012
 
Review articles bio inspired algorithms
Review articles bio inspired algorithmsReview articles bio inspired algorithms
Review articles bio inspired algorithmsJean Carlo Machado
 
Assessment and evaluation
Assessment and evaluationAssessment and evaluation
Assessment and evaluationOly Galvan
 
Limitações do HTML no Desenvolvimento de Jogos Multiplataforma
Limitações do HTML no Desenvolvimento de Jogos MultiplataformaLimitações do HTML no Desenvolvimento de Jogos Multiplataforma
Limitações do HTML no Desenvolvimento de Jogos MultiplataformaJean Carlo Machado
 

Viewers also liked (20)

质量练习
质量练习质量练习
质量练习
 
Khemjira Plongsawai- My Portfolio forblog
Khemjira Plongsawai- My Portfolio forblogKhemjira Plongsawai- My Portfolio forblog
Khemjira Plongsawai- My Portfolio forblog
 
نجّار . . وأعظم
نجّار . . وأعظمنجّار . . وأعظم
نجّار . . وأعظم
 
The Legacy of Alexander
The Legacy of AlexanderThe Legacy of Alexander
The Legacy of Alexander
 
12m start 2012
12m start 201212m start 2012
12m start 2012
 
Jim Ziegler: Alpha Dawg Prosperity and Productivity
Jim Ziegler: Alpha Dawg Prosperity and ProductivityJim Ziegler: Alpha Dawg Prosperity and Productivity
Jim Ziegler: Alpha Dawg Prosperity and Productivity
 
Scott Pechstein: No Thanks, I'm just looking
Scott Pechstein: No Thanks, I'm just looking Scott Pechstein: No Thanks, I'm just looking
Scott Pechstein: No Thanks, I'm just looking
 
Anthony Alagona
Anthony AlagonaAnthony Alagona
Anthony Alagona
 
El conillet Ramonet
El conillet RamonetEl conillet Ramonet
El conillet Ramonet
 
Curriculum Vitae - Loutfy H. Madkour (2)
Curriculum Vitae - Loutfy H. Madkour (2)Curriculum Vitae - Loutfy H. Madkour (2)
Curriculum Vitae - Loutfy H. Madkour (2)
 
Фестивали цветов в Европе
Фестивали цветов в ЕвропеФестивали цветов в Европе
Фестивали цветов в Европе
 
Inv pres q42014_final
Inv pres q42014_finalInv pres q42014_final
Inv pres q42014_final
 
Trabajo 8
Trabajo 8Trabajo 8
Trabajo 8
 
Vhip 2011
Vhip 2011Vhip 2011
Vhip 2011
 
Loctite Soluções 2012
Loctite Soluções 2012Loctite Soluções 2012
Loctite Soluções 2012
 
Review articles bio inspired algorithms
Review articles bio inspired algorithmsReview articles bio inspired algorithms
Review articles bio inspired algorithms
 
Marketing mgt
Marketing mgtMarketing mgt
Marketing mgt
 
Popovich behaviorism
Popovich behaviorismPopovich behaviorism
Popovich behaviorism
 
Assessment and evaluation
Assessment and evaluationAssessment and evaluation
Assessment and evaluation
 
Limitações do HTML no Desenvolvimento de Jogos Multiplataforma
Limitações do HTML no Desenvolvimento de Jogos MultiplataformaLimitações do HTML no Desenvolvimento de Jogos Multiplataforma
Limitações do HTML no Desenvolvimento de Jogos Multiplataforma
 

Similar to Introduction to Big data

Seminaire bigdata23102014
Seminaire bigdata23102014Seminaire bigdata23102014
Seminaire bigdata23102014Raja Chiky
 
Ticer summer school_24_aug06
Ticer summer school_24_aug06Ticer summer school_24_aug06
Ticer summer school_24_aug06SayDotCom.com
 
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesApplication of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesPistoia Alliance
 
Linked Energy Data Generation
Linked Energy Data GenerationLinked Energy Data Generation
Linked Energy Data GenerationFilip Radulovic
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Metadata for Research Objects
Metadata for Research ObjectsMetadata for Research Objects
Metadata for Research Objectsseanb
 
Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharingJisc RDM
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...Carole Goble
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Dan Taylor
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchBlue BRIDGE
 
Jisc Research Data Shared Service - Spring Update
Jisc Research Data Shared Service - Spring UpdateJisc Research Data Shared Service - Spring Update
Jisc Research Data Shared Service - Spring UpdateJisc RDM
 
Delivering biodiversity knowledge in the information age
Delivering biodiversity knowledge in the information ageDelivering biodiversity knowledge in the information age
Delivering biodiversity knowledge in the information ageVince Smith
 
Vince smith-delivering biodiversity knowledge in the information age-notext
Vince smith-delivering biodiversity knowledge in the information age-notextVince smith-delivering biodiversity knowledge in the information age-notext
Vince smith-delivering biodiversity knowledge in the information age-notextVince Smith
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...giuseppe_futia
 
Industry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraftIndustry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraftRuleML
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsVivien Bonazzi
 

Similar to Introduction to Big data (20)

Seminaire bigdata23102014
Seminaire bigdata23102014Seminaire bigdata23102014
Seminaire bigdata23102014
 
Linked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter HaaseLinked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter Haase
 
Ticer summer school_24_aug06
Ticer summer school_24_aug06Ticer summer school_24_aug06
Ticer summer school_24_aug06
 
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesApplication of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
 
Linked Energy Data Generation
Linked Energy Data GenerationLinked Energy Data Generation
Linked Energy Data Generation
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Metadata for Research Objects
Metadata for Research ObjectsMetadata for Research Objects
Metadata for Research Objects
 
Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharing
 
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and ApplicationsSemantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
 
Jisc Research Data Shared Service - Spring Update
Jisc Research Data Shared Service - Spring UpdateJisc Research Data Shared Service - Spring Update
Jisc Research Data Shared Service - Spring Update
 
Delivering biodiversity knowledge in the information age
Delivering biodiversity knowledge in the information ageDelivering biodiversity knowledge in the information age
Delivering biodiversity knowledge in the information age
 
Vince smith-delivering biodiversity knowledge in the information age-notext
Vince smith-delivering biodiversity knowledge in the information age-notextVince smith-delivering biodiversity knowledge in the information age-notext
Vince smith-delivering biodiversity knowledge in the information age-notext
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
 
Industry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraftIndustry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraft
 
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-researchUc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
 

More from cthanopoulos

EGI ENGAGE Fishery & Marine Legal Interoperability
EGI ENGAGE Fishery & Marine Legal InteroperabilityEGI ENGAGE Fishery & Marine Legal Interoperability
EGI ENGAGE Fishery & Marine Legal Interoperabilitycthanopoulos
 
Data requirements of researchers in agri-food sector
Data requirements of researchers in agri-food sectorData requirements of researchers in agri-food sector
Data requirements of researchers in agri-food sectorcthanopoulos
 
Horizon 2020 calls for Organic Research: The role of Organic Eprints
Horizon 2020 calls for Organic Research: The role of Organic EprintsHorizon 2020 calls for Organic Research: The role of Organic Eprints
Horizon 2020 calls for Organic Research: The role of Organic Eprintscthanopoulos
 
OpenMinTeD Requirements Elicitation - Agro-Know
OpenMinTeD Requirements Elicitation - Agro-KnowOpenMinTeD Requirements Elicitation - Agro-Know
OpenMinTeD Requirements Elicitation - Agro-Knowcthanopoulos
 
RDA Wheat Data Interoperability WG Demonstrator
RDA Wheat Data Interoperability WG DemonstratorRDA Wheat Data Interoperability WG Demonstrator
RDA Wheat Data Interoperability WG Demonstratorcthanopoulos
 
Understanding the data requirements of agri-food community
Understanding the data requirements of agri-food communityUnderstanding the data requirements of agri-food community
Understanding the data requirements of agri-food communitycthanopoulos
 
GODAN: A solution, as a customizable dissemination gateway to agri-food resea...
GODAN: A solution, as a customizable dissemination gateway to agri-food resea...GODAN: A solution, as a customizable dissemination gateway to agri-food resea...
GODAN: A solution, as a customizable dissemination gateway to agri-food resea...cthanopoulos
 
Agro know Food Safety Challenge for the Future Food Hack 2015
Agro know Food Safety Challenge for the Future Food Hack 2015Agro know Food Safety Challenge for the Future Food Hack 2015
Agro know Food Safety Challenge for the Future Food Hack 2015cthanopoulos
 
Enhancing sustainable development through better utilization of agricultural ...
Enhancing sustainable development through better utilization of agricultural ...Enhancing sustainable development through better utilization of agricultural ...
Enhancing sustainable development through better utilization of agricultural ...cthanopoulos
 
Intro Course "Big data in Agriculture" Agenda
Intro Course "Big data in Agriculture" AgendaIntro Course "Big data in Agriculture" Agenda
Intro Course "Big data in Agriculture" Agendacthanopoulos
 
Data Products & Problems in Agriculture
Data Products & Problems in AgricultureData Products & Problems in Agriculture
Data Products & Problems in Agriculturecthanopoulos
 
Efita 2013 - FSKN: Towards an open & scalable learning infrastructure
Efita 2013 - FSKN: Towards an open & scalable learning infrastructureEfita 2013 - FSKN: Towards an open & scalable learning infrastructure
Efita 2013 - FSKN: Towards an open & scalable learning infrastructurecthanopoulos
 
Green ideas oer_growers_20121019_ak
Green ideas oer_growers_20121019_akGreen ideas oer_growers_20121019_ak
Green ideas oer_growers_20121019_akcthanopoulos
 
Ag edws2012 v3das_vo2_20121105
Ag edws2012 v3das_vo2_20121105Ag edws2012 v3das_vo2_20121105
Ag edws2012 v3das_vo2_20121105cthanopoulos
 
Green ideas12 agricom_session_ppt_20121025_ak
Green ideas12 agricom_session_ppt_20121025_akGreen ideas12 agricom_session_ppt_20121025_ak
Green ideas12 agricom_session_ppt_20121025_akcthanopoulos
 

More from cthanopoulos (15)

EGI ENGAGE Fishery & Marine Legal Interoperability
EGI ENGAGE Fishery & Marine Legal InteroperabilityEGI ENGAGE Fishery & Marine Legal Interoperability
EGI ENGAGE Fishery & Marine Legal Interoperability
 
Data requirements of researchers in agri-food sector
Data requirements of researchers in agri-food sectorData requirements of researchers in agri-food sector
Data requirements of researchers in agri-food sector
 
Horizon 2020 calls for Organic Research: The role of Organic Eprints
Horizon 2020 calls for Organic Research: The role of Organic EprintsHorizon 2020 calls for Organic Research: The role of Organic Eprints
Horizon 2020 calls for Organic Research: The role of Organic Eprints
 
OpenMinTeD Requirements Elicitation - Agro-Know
OpenMinTeD Requirements Elicitation - Agro-KnowOpenMinTeD Requirements Elicitation - Agro-Know
OpenMinTeD Requirements Elicitation - Agro-Know
 
RDA Wheat Data Interoperability WG Demonstrator
RDA Wheat Data Interoperability WG DemonstratorRDA Wheat Data Interoperability WG Demonstrator
RDA Wheat Data Interoperability WG Demonstrator
 
Understanding the data requirements of agri-food community
Understanding the data requirements of agri-food communityUnderstanding the data requirements of agri-food community
Understanding the data requirements of agri-food community
 
GODAN: A solution, as a customizable dissemination gateway to agri-food resea...
GODAN: A solution, as a customizable dissemination gateway to agri-food resea...GODAN: A solution, as a customizable dissemination gateway to agri-food resea...
GODAN: A solution, as a customizable dissemination gateway to agri-food resea...
 
Agro know Food Safety Challenge for the Future Food Hack 2015
Agro know Food Safety Challenge for the Future Food Hack 2015Agro know Food Safety Challenge for the Future Food Hack 2015
Agro know Food Safety Challenge for the Future Food Hack 2015
 
Enhancing sustainable development through better utilization of agricultural ...
Enhancing sustainable development through better utilization of agricultural ...Enhancing sustainable development through better utilization of agricultural ...
Enhancing sustainable development through better utilization of agricultural ...
 
Intro Course "Big data in Agriculture" Agenda
Intro Course "Big data in Agriculture" AgendaIntro Course "Big data in Agriculture" Agenda
Intro Course "Big data in Agriculture" Agenda
 
Data Products & Problems in Agriculture
Data Products & Problems in AgricultureData Products & Problems in Agriculture
Data Products & Problems in Agriculture
 
Efita 2013 - FSKN: Towards an open & scalable learning infrastructure
Efita 2013 - FSKN: Towards an open & scalable learning infrastructureEfita 2013 - FSKN: Towards an open & scalable learning infrastructure
Efita 2013 - FSKN: Towards an open & scalable learning infrastructure
 
Green ideas oer_growers_20121019_ak
Green ideas oer_growers_20121019_akGreen ideas oer_growers_20121019_ak
Green ideas oer_growers_20121019_ak
 
Ag edws2012 v3das_vo2_20121105
Ag edws2012 v3das_vo2_20121105Ag edws2012 v3das_vo2_20121105
Ag edws2012 v3das_vo2_20121105
 
Green ideas12 agricom_session_ppt_20121025_ak
Green ideas12 agricom_session_ppt_20121025_akGreen ideas12 agricom_session_ppt_20121025_ak
Green ideas12 agricom_session_ppt_20121025_ak
 

Recently uploaded

Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxAneriPatwari
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 

Recently uploaded (20)

Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptx
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 

Introduction to Big data

  • 1. Open Data for Agriculture Intro to Big Data 29/11/2013 Athens, Greece Joint offering by Supported by EU projects
  • 2. Intro to Big Data Antonis Koukourikos NCSR “Demokritos”
  • 3. Presentation Outline • What is Big Data? • Semantic Web Technologies • What Semantic Web brings into the picture Slide 3 of 25
  • 4. Part 1 WHAT IS BIG DATA?
  • 5. Big Data Is… Data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it Slide 5 of 25
  • 6. Big Data Sources • Biomedical Information • Sensor Data • Logs • E-mails • Satellite images • Audio and Video Streams • Social Networks Slide 6 of 25
  • 7. Big Data Challenges – “The Three Vs” …or is it 4…? Veracity Volume Variety Velocity …or is it 6… ?? Visualization Value Slide 7 of 25
  • 8. Big Data demand… • Storage – Impractical or impossible to use centralized storage • Distribution • Federation – Indexing is a problem of itself • Computational power – For discovering – For searching / retrieving – For joining • Human effort and expertise – Querying can become complex – Are you sure you exploit all this information? Slide 8 of 25
  • 9. Part 2 SEMANTIC WEB TECHNOLOGIES
  • 10. The Syntactic and the Semantic Web • The World Wide Web represents information using natural language, graphics, multimedia... – Humans can process and combine these information easily – However, machines are ignorant! • The Semantic Web is a Web with a meaning – A web of data that is understandable by the machines Slide 10 of 25
  • 11. Semantic Web Technologies • Common formats for integration and combination of data drawn from diverse sources, whereas the original Web mainly concentrated on the interchange of documents. • For defining – RDFS http://www.w3.org/TR/rdf-schema/ – OWL http://www.w3.org/TR/owl2-overview/ • For describing – RDF http://www.w3.org/RDF/ • For querying – SPARQL http://www.w3.org/TR/2013/REC-sparql11-query-20130321/ Slide 11 of 25
  • 12. What SW can do • Handle heterogeneity • Handle evolution / variability • Elicit inferred knowledge • Volume is still the challenge Slide 12 of 25
  • 13. Part 3 WHAT SEMANTIC WEB BRINGS IN THE BIG DATA PICTURE
  • 14. Moving Forward with “Old” Technologies OAI-PMH Service Provider #1 OAI-PMH Service Provider #n Schema #1 Schema #n HARVESTER SPARQL endpoint SPARQL endpoint (Data Source #1) (Data Source #n) Common Schema RDF Triple Store How Many? Is it feasible? Aggregated XML Repository INDEXER AGRIS AP Schema BigData Problem! IEEE LOM Schema INDEXER DC Schema ... SPARQL endpoint Web Portals Web Portals Open AGRIS (FAO) AgLR/GLN (ARIADNE) Organic.Edunet (UAH) VOA3R (UAH) ... NOW (2012) CASE OF AGRICULTURAL INFRASTRUCTURES 2015 (AgINFRA) CASE OF AGRICULTURAL INFRASTRUCTURES Slide 14 of 25
  • 15. What Semantic Web can bring into the picture • One Data Access Point for One Data AccessClient for the entire Data Cloud Point – Enabling Service-Data level agreements with Data providers • Application-level Vocabularies / Thesauri / Ontologies SemaGrow SPARQL endpoint – Enabling different application facets for different communities of users over the SAME data pool Query Resource Discovery Query Decomposition query patterns Query Decomposer • Going beyond existing Distributed Triple Store Implementations Resource Selector query pattern Set of query patterns Candidate Source(s) List Instance Statistics Load Info Semantic Proximity equivalent Semantic patterns Proximity Query Pattern Discovery Service Instance Statistics Ctrl Data Source(s) Selector Reactivity parameters –Link Heterogeneous but Semantically Connected Data –Index Extremely Large Information Volumes (Peta Sizes) –Improve Information Retrieval response query fragment, Source (#1) query fragment, Source (#n) Query results Ctrl Load Info Data Summaries SPARQL endpoint Instance Statistics query fragment, target Source POWDER Inference Layer Query Transformation Service Query Manager Ctrl transformed query query request #1 Schema Mappings query request #n • Instance Statistics SPARQL query query results query results schema Data Summaries Query Results Merger P-Store transformed schema SPARQL query query results Federated endpoint Wrapper Data (+Metadata) physically stored in Data Provider No need for harvesting • Vocabularies / Thesauri / Ontologies of Data Provider SPARQL endpoint (Data choice Source #n) – No need for aligning according to common schemas SPARQL endpoint – (Data Source #1) Slide 15 of 25
  • 16. The SemaGrow Solution • Use POWDER to mass-annotate large-subspaces – Exploit naming convention regularities to compress the indexes used by the system • Partition triple patterns in the original query • Annotate each fragment with an ordered list of data sources most likely to contain relevant data • Distribute and transform the query fragments • Collect and align the results Slide 16 of 25
  • 17. The POWDER W3C Recommendation • Exploits natural groupings of URIs to annotate all resources in a subset of the URI space • Regular expression based grouping • Allows properties and their values to be associated with an arbitrary number of subjects within a fully-defined semantic framework • • POWDER Description Resources: http://www.w3.org/TR/powder-dr/ POWDER Formal Semantics: http://www.w3.org/TR/powder-formal/ Slide 17 of 25
  • 18. The SemaGrow Stack • Integrates the components in order to offer a single SPARQL endpoint that federates a number of heterogeneous data sources • Targets the federation of independently provided data sources Slide 18 of 25
  • 19. SemaGrow Architecture Client SemaGrow SPARQL endpoint Query Resource Discovery Query Decomposition query patterns Resource Selector Resource Discovery query pattern Set of query patterns Candidate Source(s) List Instance Statistics Load Info Semantic Proximity equivalent Semantic patterns Proximity Query Pattern Discovery Service Instance Statistics Query Decomposer Ctrl Query Decomposition Data Source(s) Selector Reactivity parameters query fragment, Source (#1) query fragment, Source (#n) Query results Ctrl Load Info Data Summaries SPARQL endpoint Instance Statistics query fragment, target Source Data Summaries Endpoint POWDER Inference Layer Query Transformation Service Query Manager Ctrl transformed query Federated Endpoint Wrapper query request #1 Schema Mappings query request #n Instance Statistics SPARQL query query results query results schema Data Summaries SPARQL endpoint (Data Source #1) Query Results Merger P-Store transformed schema SPARQL query query results SPARQL endpoint (Data Source #n) Federated endpoint Wrapper Slide 19 of 25
  • 20. Use Cases (DLO) Heterogeneous Data Collections & Streams  Big data: – – – – Sensor data: soil data, weather GIS data: land usage, forest and natural resources management data Historical data: crop yield, economic data Forecasts: climate change models  Problem: – Combine heterogeneous sources to analyze past food production and forecast future trends – Cannot clone and translate: large scale, live data streams – Cannot immediately and directly affect radical re-design of all sensing and processing currently in place 3rd Plenary & ESG Meeting 21/10/2013 Slide 24 of 25
  • 21. Use Cases (FAO) Reactive Data Analysis  Big data: – Document collections: past experiences, analysis and research results – Databases: climate conditions and crop yield observations, economic data (land and food prices)  Problem: – Retrieving complete and accurate information to compile reports • Raw data and reports, scientific publications, etc. – Wastes human resources that could analyze data and synthesize useful knowledge and advice for food production • Too much time spent cross-relating responses from different sources – Too many different organizations and processes rely on the different schemas to make re-design viable – Cloning is inefficient: large and constantly updated stores 3rd Plenary & ESG Meeting 21/10/2013 Slide 25 of 25
  • 22. Use Cases (AK) Reactive Resource Discovery  Big data: – Multimedia content about agriculture and biodiversity  Problem: – Real-time retrieval of relevant content – Used to compile educational activities – Schema heterogeneity: • Different providers (Oganic edunet, Europeana, VOA3R, etc.) – Too many different organizations and processes rely on the different schema to make re-design viable – Cloning is inefficient: large and constantly updated stores 3rd Plenary & ESG Meeting 21/10/2013 Slide 26 of 25
  • 23. Project Info • SemaGrow: Data intensive techniques to boost the realtime performance of global agricultural data infrastructures • FP7-ICT-2011.4.4 (Intelligent Information Management) No. Name 1 Universidad de Alcala 2 NCSR “Demokritos” 3 Universita Degli Studi di Roma Tor Vergata 4 Semantic Web Company 5 Institut Za Fiziku 6 Stichting Dienst Landbouwkundik Onderzoek 7 Food and Agriculture Organization of the UN 8 Countr y Agroknow Technologies Slide 27 of 25
  • 24. Thank you! Antonis Koukourikos NCSR “Demokritos” kukurik@iit.Demokritos.gr