SlideShare ist ein Scribd-Unternehmen logo
1 von 55
Downloaden Sie, um offline zu lesen
Linked Data at
Semantic Team
semantica@corp.globo.com
Tatiana Al-Chueyr and Rodrigo D. A. Senra
{tatiana.martins, rodrigo.senra}@corp.globo.com
globo.com
Andréia Bustamante
Ícaro Medeiros
Tatiana Al-Chueyr
Rodrigo Senra
Semantic Team
Franklin Amorim
João Caros Mendes
Alberto Beloni
André Nicodemus
Contributors
BROADCAST MOVIES PAY TV INTERNET
EVENTS MUSIC
PUBLISHING
NEW VENTURES NEWSPAPERRADIO NETWORK
Motivation
Soccer player
Cross-link content from different web products
Politician
MotivationCross-link content from different web products
Celebrity
Motivation
● Cross-link content from different web products
MotivationCross-link content from different web products
Isabella Nardoni foi morta em 29 de março de 2008
na Zona Norte de São Paulo (Foto:Reprodução)
Isabella de Oliveira Nardoni, de 5
anos, foi morta na noite de 29 de
março de 2008. A perícia concluiu
que a menina foi atirada do sexto
andar do prédio onde moravam seu
pai, Alexandre Nardoni, sua
madrasta, Anna Carolina Jatobá, e
dois filhos pequenos do casal, na
Vila Isolina Mazzei, na zona norte de
São Paulo.
Túmulo de Isabella vira local de visitação em SP; casal Nardoni está preso.
Caso Isabella Nardoni
Juliana Cardilli
G1 SP
RDF
FOAF
GEO
Dublin
Core
SKOS
Semantic markup in web pages
Motivation
Recommend annotations to information Producer
Motivation
Suggest related content to information Consumer
Motivation
Suggest related content to information Consumer
Motivation
Suggest related content to information Consumer
Motivation
Outcomes
● Flexible ways to organize content
● Ease to find related issues
● Explicit relations derived from annotated content
● Up-to-date topic pages with little editorial effort
● Linking content across different web products
● Seamless navigation leading to flow state
Status Quo
Used by the main web products of Globo.com
linking, among others:
○ 18,485 organizations
○ 82,386 people
○ 9,129 places
○ 1,000,000+ annotated news
from August 2010 to May 2013
Legacy Architecture
CDA
CMA
triple
store
search
engine
ontology
CDA
CMA
CDA
CMA
CDA
CMA
CDA
CMA
Legacy Architecture
triple
store
search
engine
ontology
Poor data management
○ direct access to triple store (unmanaged)
○ difficulty to share data (distributed DBs)
○ re-sync triple-store and search engine index
○ scalability of triple store
○ high entropy in distributed ontology engineering
Problems
Problems
Ontology Engineering
Domain-driven
(current)
Base
G1 GE EGO TVG
news sports gossip tv
Upper
Person Organization
Music
Politics
Programme Education
Sports
Product-driven
(past)
Place
Possible Solution
Upper
Ontology
Semantic as a library
○ many different versions in production
○ programming language dependent
○ steep learning curve for RDF/OWL/SPARQL
Problems
Create an open semantic data management platform
● Scalable
● Mobile and Web friendly
● Interconnect Globo's data with external data sources
● Automate content extraction (including NER)
Next Step
Brainiak
linked data restful
API
CDA
CMA
CDA
CMA
CDA
CMA
CDA
CMA
Legacy Architecture
triple
store
search
engine
ontology
API
Brainiak
CMA
CDA
CDA
CDA
CDA
triple
store
search
engine
Under Development
Requirements
● Indirect usage of SPARQL
● Programming language independent
● Data management with quality
● Finer-grained authorization and authentication
● Isolate applications from triplestore
● Improve triplestore performance
SPARQL query
DEFINE input:inference <http://data.globo.com/ruleset>
SELECT ?uri ?label
FROM <http://data.globo.com/sports/>
WHERE
{
?uri a <http://data.globo.com/sports/Team>;
rdfs:label ?label .
}
LIMIT 10
OFFSET 0
task: list all sports teams
/sports/Team
Brainiak query
GET
SPARQL response
Brainiak response
Brainiak concepts
● Instance
● Collection (set of instances from a given Class)
● Schema (the Class definition)
● Context
Instance
Collection
Schema
Context
place
State
Brazil
Country
Japan
City
Real example
/placeGET
/place/CountryGET
/place/Country/_schemaGET
/place/Country/BrazilGET
Real example
resource URL→ /place/Country/Brazil
context (graph)→ http://semantica.globo.com/place/
class → http://semantica.globo.com/place/Country
instance → http://semantica.globo.com/place/Country/Brazil
URI Conventions
/place/River
?graph_uri=http://dbpedia.org/resource/classes#
&class_uri=dbpedia:River
Overriden
context (graph) → http://dbpedia.org/resource/classes#
class → http://dbpedia.org/ontology/River
Convention
context (graph)→ http://semantica.globo.com/place/
class → http://semantica.globo.com/place/River
Legacy URIs
Hypermedia
● Flexibility and programmatic adaptation
● Semantic affordances
● Client has to understand what is consumed
● "Hypermedia APIs are not fully baked yet"
Brainiak hypermedia graph
context instance
/ schema
inCollection
item
instances
instances
describedBy
self
replace
delete
self
instances
self
self
self
create
collection
Services
● List Contexts
● List Collections
● Get a Schema
● List Prefixes
● Status of Services
● Create
● Retrieve
● Delete
● Edit
● List
Instances
Features
● JSON-Schema
● JSON-LD
● REST
● Python + Tornado
OPTIONS GET PUT POST DELETE
/sports/Team
Brainiak query
GET
Brainiak response
Brainiak response
Brainiak response
Brainiak response
SPARQL query
SELECT DISTINCT ?class
WHERE {
<http://data.globo.com/place/City> rdfs:subClassOf ?class OPTION
(TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0)) .
?class a owl:Class .
}
task: retrieve all superclasses of a class
SPARQL query
SELECT DISTINCT ?predicate ?predicate_graph ?predicate_comment ?type ?range ?title ?range_graph ?range_label ?super_property
WHERE {
{
GRAPH ?predicate_graph { ?predicate rdfs:domain ?domain_class } .
} UNION {
graph ?predicate_graph {?predicate rdfs:domain ?blank} .
?blank a owl:Class .
?blank owl:unionOf ?enumeration .
OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } .
OPTIONAL { ?list_node rdf:first ?domain_class } .
}
FILTER (?domain_class IN (<http://data.globo.com/place/City>, <http://data.globo.com/place/GeopoliticalDivision>, <http://data.globo.com/place/Place>, <http://data.globo.
com/upper/Object>, <http://data.globo.com/upper/Substance>, <http://data.globo.com/upper/ConcreteEntity>, <http://data.globo.com/upper/Entity>))
{?predicate rdfs:range ?range .}
UNION {
?predicate rdfs:range ?blank .
?blank a owl:Class .
?blank owl:unionOf ?enumeration .
OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } .
OPTIONAL { ?list_node rdf:first ?range } .
}
FILTER (!isBlank(?range))
?predicate rdfs:label ?title .
?predicate rdf:type ?type .
OPTIONAL { ?predicate rdfs:subPropertyOf ?super_property } .
FILTER (?type in (owl:ObjectProperty, owl:DatatypeProperty)) .
FILTER(langMatches(lang(?title), "en") OR langMatches(lang(?title), "")) .
OPTIONAL { ?predicate rdfs:comment ?predicate_comment }
FILTER(langMatches(lang(?predicate_comment), "en") OR langMatches(lang(?predicate_comment), "")) .
OPTIONAL {
GRAPH ?range_graph {
?range rdfs:label ?range_label .
FILTER(langMatches(lang(?range_label), "en") OR langMatches(lang(?range_label), "")) .
}
}
}
task: retrieve all properties of a group of classes
SPARQL query
SELECT DISTINCT ?predicate ?min ?max ?range ?enumerated_value ?enumerated_value_label
WHERE {
<http://data.globo.com/place/City> rdfs:subClassOf ?s OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n,
t_min (0)) .
?s owl:onProperty ?predicate .
OPTIONAL { ?s owl:minQualifiedCardinality ?min } .
OPTIONAL { ?s owl:maxQualifiedCardinality ?max } .
OPTIONAL {
{ ?s owl:onClass ?range }
UNION { ?s owl:onDataRange ?range }
UNION { ?s owl:allValuesFrom ?range }
OPTIONAL { ?range owl:oneOf ?enumeration } .
OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } .
OPTIONAL { ?list_node rdf:first ?enumerated_value } .
OPTIONAL {
?enumerated_value rdfs:label ?enumerated_value_label .
} .
}
}
}
task: retrieve the cardinalities of all properties of a certain class
/place/City/_schema
Brainiak query
GET
● SEO (automatic schema.org)
● Improved annotator (DBpedia Spotlight)
● Richer content relationships (inference)
● Link to open data (e.g. DBPedia, dados.gov.br)
Next steps
Stay tuned
@brainiak_api
... will be soon released
as an open source project !
Semantic Team
semantica@corp.globo.com
globo.com
Thank you
for the attention!

Weitere ähnliche Inhalte

Ähnlich wie Semantic day 2013 linked data at globo.com

InfoSec World 2013 – W4 – Using Google to Find Vulnerabilities in Your IT Env...
InfoSec World 2013 – W4 – Using Google to Find Vulnerabilities in Your IT Env...InfoSec World 2013 – W4 – Using Google to Find Vulnerabilities in Your IT Env...
InfoSec World 2013 – W4 – Using Google to Find Vulnerabilities in Your IT Env...
Bishop Fox
 

Ähnlich wie Semantic day 2013 linked data at globo.com (20)

Rio info 2013 - Linked Data at Globo.com
Rio info 2013 - Linked Data at Globo.comRio info 2013 - Linked Data at Globo.com
Rio info 2013 - Linked Data at Globo.com
 
QCon SP - recommended for you
QCon SP - recommended for youQCon SP - recommended for you
QCon SP - recommended for you
 
Dove sono i tuoi vertici e di cosa stanno parlando?
Dove sono i tuoi vertici e di cosa stanno parlando?Dove sono i tuoi vertici e di cosa stanno parlando?
Dove sono i tuoi vertici e di cosa stanno parlando?
 
Boost your data analytics with open data and public news content
Boost your data analytics with open data and public news contentBoost your data analytics with open data and public news content
Boost your data analytics with open data and public news content
 
Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)
 
Sound cloud - User & Partner Conference - AT Internet
Sound cloud - User & Partner Conference - AT InternetSound cloud - User & Partner Conference - AT Internet
Sound cloud - User & Partner Conference - AT Internet
 
GAB 2019 - Graph as a data store
GAB 2019 - Graph as a data storeGAB 2019 - Graph as a data store
GAB 2019 - Graph as a data store
 
InfoSec World 2013 – W4 – Using Google to Find Vulnerabilities in Your IT Env...
InfoSec World 2013 – W4 – Using Google to Find Vulnerabilities in Your IT Env...InfoSec World 2013 – W4 – Using Google to Find Vulnerabilities in Your IT Env...
InfoSec World 2013 – W4 – Using Google to Find Vulnerabilities in Your IT Env...
 
GeoLinkedData
GeoLinkedDataGeoLinkedData
GeoLinkedData
 
Simple fuzzy name matching in elasticsearch paris meetup
Simple fuzzy name matching in elasticsearch   paris meetupSimple fuzzy name matching in elasticsearch   paris meetup
Simple fuzzy name matching in elasticsearch paris meetup
 
Open Data and News Analytics Demo
Open Data and News Analytics DemoOpen Data and News Analytics Demo
Open Data and News Analytics Demo
 
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
 
diadem-vldb-2015
diadem-vldb-2015diadem-vldb-2015
diadem-vldb-2015
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages
 
Wimmics Overview 2021
Wimmics Overview 2021Wimmics Overview 2021
Wimmics Overview 2021
 
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
 
Data Modelling at Scale
Data Modelling at ScaleData Modelling at Scale
Data Modelling at Scale
 
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy cabral   search marketing summit - scraping data-driven content (1)Jeremy cabral   search marketing summit - scraping data-driven content (1)
Jeremy cabral search marketing summit - scraping data-driven content (1)
 
Data Scientist's Daily Life
Data Scientist's Daily LifeData Scientist's Daily Life
Data Scientist's Daily Life
 
Open Data: a view from the trenches
Open Data: a view from the trenchesOpen Data: a view from the trenches
Open Data: a view from the trenches
 

Kürzlich hochgeladen

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Kürzlich hochgeladen (20)

Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 

Semantic day 2013 linked data at globo.com