SlideShare ist ein Scribd-Unternehmen logo
1 von 66
Downloaden Sie, um offline zu lesen
Linked Data at the BBC
Augustine Kwanashie
Outline
•  Introduction
•  APIs and Tools
•  Validation and Data Quality
•  Performance & Resilience Measures
•  Summary
Introduction
Tagging BBC Content
<http://www.bbc.co.uk/things/2b7ba3ca-32ca…>
a cwork:CreativeWork, cwork:NewsItem ;
cwork:title “Pep Guardiola…” ;
cms:locator <urn:bbc:cps:asset:748947894> ;
cwork:language “en-gb” ;
cwork:primaryFormat cwork:TextualFormat ;
prov:dateCreated "2017-04-07T21:39:23+00:00” .
Tagging BBC Content
<http://www.bbc.co.uk/things/4bdbf2-d1ad…>
a core:Organisation, sport:SportingOrganisation ;
core:label "Manchester City"@en-gb ;
core:sameAs <http://www.wikidata.org/entity/Q50602> ;
sport:competesIn <http://www.bbc../things/f1eb4771…> ;
sport:discipline <http://www.bbc../things/ba6e1118…> ;
sport:hasHome <http://www.bbc../things/0710009f…> .
Tagging BBC Content
Article
_______
_______
Manchester City
<http://www.bbc.co.uk/things/4bdbf21d-d1ad-... >
Video
_______
_______ Stream
_______
_______
about
about
Tagging BBC Content
<http://www.bbc.co.uk/things/2b7ba3ca-32ca…>
a cwork:CreativeWork, cwork:NewsItem ;
cwork:title ”In the future I will be better…" ;
cwork:about <http://www.bbc../things/4bdbf2-d1ad…> .
<http://www.bbc.co.uk/things/4bdbf2-d1ad…>
a core:Organisation, sport:SportingOrganisation ;
core:label "Manchester City"@en-gb ;
core:sameAs <http://www.wikidata.org/entity/Q50602> .
Tagging helps to:
o  Group/aggregate content.
o  Enhance content discovery.
o  Enhance navigation.
o  Improve personalisation and recommendation.
APIs and Tools
Landscape
InfrastructureAPIs and ToolsClients
Tagging a BBC Article
CMS
Content Store
Writer API
Triple Store
Core API
Website
Content
API
LDM
Journalists
Journalists
Content
Metadata
GraphDB Setup
Jolokia
Workbench
Sesame
Tomcat
GraphDB master node
Httpd
Jolokia
Workbench
Sesame
Tomcat
GraphDB worker nodes
Httpd
S3 for storing backups
Opsworks for management
CloudWatch for monitoring
Custom APIs vs. SPARQL endpoints
SPARQL endpoints
Custom APIs
o  Can ensure performance.
o  Ideal for rarely changing use-cases.
o  Can validate writes.
o  Complete flexibility with queries.
o  Ideal for varied/changing use-cases.
Write APIs
Validation
Applies set of validation rules
Security
Authenticate via SSL certificate whitelists
Content-Types
Accepts Turtle, RDF+XML
Persistence
Writes asynchronously to triplestore
PUT: https://ldp-writer.int.api.bbci.co.uk/crea3ve-works	
Content-Type: text/turtle
Body:
<http://www.bbc.co.uk/things/2b7ba3ca-32ca…>
a cwork:CreativeWork, cwork:NewsItem ;
cwork:title "Pep Guardiola…" ;
cwork:about <http://www.bbc../things/4bd…> .
Read APIs
Filters & Mixins
Restrict returned data by type, domain, etc.
Search
Full-text search on labels.
Content-Types
Produces Trig, JSON+LD, JSON, HTML
Security
Authenticate via SSL certificatesGET: https://things.api.bbc.com/things	
	?type	=	core:Person	
	&label_search	=	Theresa	
	&mixin	=	pol	
Accept: json+ld
Documenting APIs
<urn:api:things:documentation> {
<urn:api:things:get-multiple:covered-by> a api:Filter ;
api:collectionFormat "multi"^^xsd:string ;
api:description "Filter for Things with a matching bbc:coveredBy relationship."^^xsd:string ;
api:in "query"^^xsd:string ;
api:name "covered_by"^^xsd:string ;
api:required "false"^^xsd:boolean ;
api:type "array"^^xsd:string .
}
Validation and Data Quality
ThingGraphs
context:123 {
things:635 a core:Thing, sport:SportingOrganisation ;
core:label "Manchester City"@en-gb ;
core:sameAs <http://www.wikidata.org/entity/Q50602> .
context:123 a prov:ThingGraph ;
prov:managedBy cms:LDM ;
prov:provider <mailto:augustine@bbc.co.uk> ;
prov:provided "2014-08-20T10:47:42+00:00"^^xsd:dateTime	.
}
Multiple ThingGraphs for single Thing
context:01 {
things:635 core:preferredLabel "Manchester City" .
context:01 prov:managedBy cms:LDM .
}
context:02 {
things:635 sport:competesIn things:834 .
context:02 prov:managedBy cms:LDM .
}
context:03 {
things:635 biz:listing things:856 .
context:03 prov:managedBy cms:NewsIDM .
}
Thing Response
Status: 200
Content-Type: application/trig
Body:
things:635 {
things:635 core:preferredLabel "Manchester City" ;
sport:competesIn things:834 ;
core:label "Manchester City" ;
biz:listing things:856 .
}
GET: https://things.int.api.bbc.com/things
?mixin = sport
&mixin = biz
Accept: application/trig
Some Validation Rules
Cannot delete a Thing that is used to tag a CreativeWork
things:635 core:preferredLabel "Manchester City" ;
sport:competesIn things:834 ;
DELETE: https://ldp-writer.bbc.com
?guid=things:635
cwork:345 a cwork:CreativeWork ;
tagging:about things:635 .
Some Validation Rules
Cannot update a ThingGraph managed by another CMS
context:02 {
things:635 sport:competesIn things:834 .
context:02 prov:managedBy cms:LDM .
}
PUT: https://ldp-writer.bbc.com	
?guid=things:635
X-ManagedBy: VIVO
Managing Things
Managing Breaking Changes
things:635 a core:Thing, core:Theme;
core:label "Technology" ;
core:sameAs dbpedia:Technology .
things:635 a core:Thing, core:Theme;
core:label "Technology"@en-gb,
"Technologies"@fr,
"Tecnología"@es ;
core:sameAs dbpedia:Technology .
Managing Breaking Changes
things:635 a core:Thing, core:Theme;
core:label "Technology" ;
core:sameAs dbpedia:Technology .
things:635 a core:Thing, core:Theme;
trans-01:label "Technology”@en-gb,
"Technologies"@fr,
"Tecnología"@es ;
core:label "Technology" ;
core:sameAs dbpedia:Technology .
1 Add transition triples
Managing Breaking Changes
things:635 a core:Thing, core:Theme;
core:label "Technology" ;
core:sameAs dbpedia:Technology .
things:635 a core:Thing, core:Theme;
trans-01:label "Technology"@en-gb,
"Technologies"@fr,
"Tecnología"@es ;
core:label "Technology"@en-gb,
"Technologies"@fr,
"Tecnología"@es ;
core:sameAs dbpedia:Technology .
2 Align to new schema
Managing Breaking Changes
things:635 a core:Thing, core:Theme;
trans-01:label "Technology”@en-gb,
"Technologies"@fr,
"Tecnología"@es ;
core:label "Technology”@en-gb,
"Technologies"@fr,
"Tecnología"@es ;
core:sameAs dbpedia:Technology .
3 Remove transition triples
things:635 a core:Thing, core:Theme;
core:label "Technology”@en-gb,
"Technologies"@fr,
"Tecnología"@es ;
core:sameAs dbpedia:Technology .
Tagging out of Context
core:label "The Presidents of the United
States of America";
core:disambiguationHint "Music Group";
Hard to identify and prevent!
Managing Duplicate Things
context:123 {
things:635 a core:Thing, sport:Team;
core:label "Manchester City"@en-gb ;
core:sameAs dbpedia:Manchester_City_FC ;
sport:managedBy things:6372 .
context:123 prov:provided "2014-08-20" .
}
context:124 {
things:636 a core:Thing, sport:Team;
core:label "Man City"@en-gb ;
core:sameAs <http://www.wikidata.org/234> .
context:124 prov:provided "2017-02-12" .
}
2800 CreativeWorks tagged
46 CreativeWorks tagged
Managing Duplicate Things
context:123 {
things:635 a core:Thing, sport:Team;
core:label "Manchester City"@en-gb ;
core:sameAs <http://www.wikidata.org/234>,
dbpedia:Manchester_City_FC ;
sport:managedBy things:6372 .
context:123 prov:provided "2014-08-20" .
}
2846 CreativeWorks tagged
1 Switch tags from things:636 to things:635
2 Compare all triples for both Things
3 Update things:635 with consolidated triples
4 Delete things:636
Merge script to:
Performance
Load-balance reads and writes separately
Master node
WriteELB
Worker nodes
ReadELB1
ReadELB2
Asynchronous writes
Clients Write API Write Pipeline
POST
POST
202
202
Write ELB
POST
200
CREATED
Optimise SPARQL queries
SELECT ?subject ?predicate ?object WHERE {
?subject ?predicate ?object .
{
SELECT ?subject WHERE {
OPTIONAL {
?subject prov:createdBy ?created .
}
}
GROUP BY (?subject)
HAVING BOUND (?subject)
}
}
SELECT ?subject ?predicate ?object WHERE {
?subject ?predicate ?object .
{
SELECT ?subject WHERE {
?subject prov:createdBy ?created .
}
GROUP BY (?subject)
}
}
Load-test against future demand
100%
Increase in the number of
CreativeWorks by 201960%
Increase in 99 percentile response times by 2019
21m
Requests to the CreativeWorks
API daily
94m
Triples in Triplestore
Monitor Everything!
Auto-scaling on API Instances
5
1 2
3
4
1 ELB sends metrics
2 Instances send metrics
3 Alarms trigger autoscaling action
4 New instance is created
5 Instance is added to pool
Caching Responses
Resilience
Queue-based write pipeline
Queued writes across multiple clusters
Writer API Consumer
Primary GraphDB
Cluster
Consumer
Replica GraphDB
Cluster
Event-based write pipeline
Event-based writes improves resilience
Writer API Consumer
Replica GraphDB
Cluster
Primary GraphDB
Cluster
Event store
API
RDS
Notification
Topics
Backup and Recovery
26GB
Per backup
20mins
Recovery time
16Full backups per day
Opsworks recipes to:
²  Switch Primary and Replica cluster roles.
²  Schedule backups.
²  Restore backup to cluster.
S3:
²  Stores backups by date/time.
²  Retires old backups to Glacier.
Replacing Triplestore Clusters
Read ELB
Write ELB
Primary: Cluster 1
Replica: Cluster 2 Cluster 3
1 Create new cluster and load data
Replacing Triplestore Clusters
Read ELB
Write ELB
Primary: Cluster 1
Replica: Cluster 3 Cluster 2
2 Swap new cluster with replica
3 Delete old replica cluster
Replacing Triplestore Clusters
Read ELB
Write ELB
Primary: Cluster 3
Replica: Cluster 1
4 Swap new cluster with primary
5 Repeat steps 1 - 4
Responding to incidents
CPU Utilisation > 90% for 5 mins?
CloudWatch Alarm!!!
Severity: - Warning?
- Critical?
Action: - Email to Dev team?
- Notify 247 support?
- Trigger Autoscaling action?
Summary
Opening up BBC Things
Opening up BBC Things
Opening up BBC Things
Opening up BBC Things
Main points
o  Separating content from metadata
o  APIs powered by Linked Data
o  Monitoring and reacting to incidents
o  Performance for present and future
Thanks...
Augustine Kwanashie
Connections.TechSupport@bbc.co.uk
www.bbc.co.uk/things
Bonus slides…
Filters and Mixins
http://www.bbc.co.uk/things/4bdbf2-d1a1
http://www.bbc.co.uk/things/4bdbf2-d1a2
http://www.bbc.co.uk/things/4bdbf2-d1a3
http://www.bbc.co.uk/things/4bdbf2-d1a4
http://www.bbc.co.uk/things/4bdbf2-d1a5
http://www.bbc.co.uk/things/4bdbf2-d1a6
Filter by type = core:Person Mixin = sport, core
things:4bdbf2-d1b1
a core:Thing, sport:Team ;
core:title "Manchester City" ;
biz:listedIn <http://www.londonstockexchange> ;
sport:managedBy things:4bdbf2-d1a1;
biz:tradingAs "Manchester City PLC” .
Swagger Docs
Handing Data
Scala libraries to enable easy RDF manipulation
Trig
Turtle
etc.
Connections-RDF
²  Import/Export
²  Create triples
²  Compare Graphs
²  Navigate Graphs
²  Manage Datasets
Trig
Turtle
etc.
Handing Data
RDF DSL in Scala
val rdfGraph = (
Iri("http://…") >> Rdf.`type` >>> Core.Thing
>> Sport.`type` >>> Sport.Organisation
>> BBC.coveredBy >>> Iri("urn:bbc:news")
>> Core.label >>> "Manchester City"
)
val label = (rdfGraph Core.label).get[String]
Some Validation Rules
things:635 core:preferredLabel "Manchester City" ;
cms:locator <urn:bbc:cps:asset:39715040>,
<urn:bbc:cps:asset:39715040> .
Thing locators must be unique
things:635 core:preferredLabel "Manchester City" ;
cms:locator <urn:bbc:cps:asset:01>,
<urn:bbc:cps:asset:02> .
<urn:bbc:cps:asset:01> a cms:CPSLocator .
<urn:bbc:cps:asset:02> a cms:CPSLocator .
Locator Types must be unique
Some Validation Rules
things:635 cms:locator <urn:bbc:cps:asset:01> .
things:636 cms:locator <urn:bbc:cps:asset:01> .
Multiple Things with the same
locator
things:635 cms:sameAs dbpedia:01 .
things:636 cms:sameAs dbpedia:01 .
Multiple Things with the same
sameAs
things:635 core:label "Manchester City"
rdf:type owl:Class .
Blacklisted URIs present
Ordering Thing Updates Correctly
create:1
update:2
update:3
delete:4
Document
Writer
Primary GraphDB
Cluster
1 Fetch events from Event store
1 2
34
2
Execute task on Triplestore
(only if task id is newer)
3 Errors? Put on Retry queue
4
Fetch and process events from
Retry queue
Search: Creating an Index
INSERT DATA {
luc:index luc:setParam "uris" .
luc:include luc:setParam "literals" .
luc:includePredicates luc:setParam "core:label rdf:label core:shortLabel" .
luc:moleculeSize luc:setParam "1" .
luc:labelIndex luc:createIndex "true" .
}
Search: Creating an Index
:manutd
:fclub
:manc
type
label Football Club
label
Manchester United
locatedIn label Manchester
locatedIn
:uk label United Kingdom
RDF Module for :manutd
RDF Module for :manc
Search: Full & Incremental Re-index
INSERT DATA {
luc:labelIndex luc:addToIndex <http://www.bbc.co.uk/things/2b7ba3ca-32ca…> .
}
Run incremental re-index after each Thing update
INSERT DATA {
luc:labelIndex luc:updateIndex _:b1 .
}
Run full re-index once daily
Search: Full Text Search Query
SELECT ?thing ?score WHERE {
?thing a tagging:TagConcept .
?thing luc:score ?score .
?thing luc:labelIndex " (Manchester OR *Manchester OR *Manchester*) " .
}
o  Index available during the re-index process
Searching logs
service="triple-store" and env="live" and "Error"
Instance logs
S3 bucket
CloudWatch
Logs

Weitere ähnliche Inhalte

Ähnlich wie Linked data at the BBC

Behavior Driven Development and Automation Testing Using Cucumber
Behavior Driven Development and Automation Testing Using CucumberBehavior Driven Development and Automation Testing Using Cucumber
Behavior Driven Development and Automation Testing Using Cucumber
KMS Technology
 

Ähnlich wie Linked data at the BBC (20)

Bringing JAMStack to the Enterprise
Bringing JAMStack to the EnterpriseBringing JAMStack to the Enterprise
Bringing JAMStack to the Enterprise
 
Optimizing a React application for Core Web Vitals
Optimizing a React application for Core Web VitalsOptimizing a React application for Core Web Vitals
Optimizing a React application for Core Web Vitals
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
 
Saving Money by Optimizing Your Cloud Add-On Infrastructure
Saving Money by Optimizing Your Cloud Add-On InfrastructureSaving Money by Optimizing Your Cloud Add-On Infrastructure
Saving Money by Optimizing Your Cloud Add-On Infrastructure
 
Supercharging your Organic CTR
Supercharging your Organic CTRSupercharging your Organic CTR
Supercharging your Organic CTR
 
Learning to rank search results
Learning to rank search resultsLearning to rank search results
Learning to rank search results
 
MongoDB.local Dallas 2019: Pissing Off IT and Delivery: A Tale of 2 ODS's
MongoDB.local Dallas 2019: Pissing Off IT and Delivery: A Tale of 2 ODS'sMongoDB.local Dallas 2019: Pissing Off IT and Delivery: A Tale of 2 ODS's
MongoDB.local Dallas 2019: Pissing Off IT and Delivery: A Tale of 2 ODS's
 
Digital Analytic & SEO Acceleration
Digital Analytic & SEO AccelerationDigital Analytic & SEO Acceleration
Digital Analytic & SEO Acceleration
 
Criteo Infrastructure (Platform) Meetup
Criteo Infrastructure (Platform) MeetupCriteo Infrastructure (Platform) Meetup
Criteo Infrastructure (Platform) Meetup
 
Supercharge your app with Cloud Functions for Firebase
Supercharge your app with Cloud Functions for FirebaseSupercharge your app with Cloud Functions for Firebase
Supercharge your app with Cloud Functions for Firebase
 
Going Headless with Craft CMS 3.3
Going Headless with Craft CMS 3.3Going Headless with Craft CMS 3.3
Going Headless with Craft CMS 3.3
 
Behavior Driven Development and Automation Testing Using Cucumber
Behavior Driven Development and Automation Testing Using CucumberBehavior Driven Development and Automation Testing Using Cucumber
Behavior Driven Development and Automation Testing Using Cucumber
 
Developing Well-Architected Android Apps with AWS (MOB302) - AWS re:Invent 2018
Developing Well-Architected Android Apps with AWS (MOB302) - AWS re:Invent 2018Developing Well-Architected Android Apps with AWS (MOB302) - AWS re:Invent 2018
Developing Well-Architected Android Apps with AWS (MOB302) - AWS re:Invent 2018
 
How to recover from an unsuccessful SEO relaunch by activating your data (SMX...
How to recover from an unsuccessful SEO relaunch by activating your data (SMX...How to recover from an unsuccessful SEO relaunch by activating your data (SMX...
How to recover from an unsuccessful SEO relaunch by activating your data (SMX...
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
 
The Magic of LINE 購物 Testing
The Magic of LINE 購物 TestingThe Magic of LINE 購物 Testing
The Magic of LINE 購物 Testing
 
Front End Development for Backend Developers - GIDS 2019
Front End Development for Backend Developers - GIDS 2019Front End Development for Backend Developers - GIDS 2019
Front End Development for Backend Developers - GIDS 2019
 
Novedades de MongoDB 3.6
Novedades de MongoDB 3.6Novedades de MongoDB 3.6
Novedades de MongoDB 3.6
 
Best Practices for Architecting a Pragmatic Web API.
Best Practices for Architecting a Pragmatic Web API.Best Practices for Architecting a Pragmatic Web API.
Best Practices for Architecting a Pragmatic Web API.
 

Mehr von Connected Data World

The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is here
Connected Data World
 
In Search of the Universal Data Model
In Search of the Universal Data ModelIn Search of the Universal Data Model
In Search of the Universal Data Model
Connected Data World
 
Graph Realities
Graph RealitiesGraph Realities
Graph Realities
Connected Data World
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
Connected Data World
 
Elegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsElegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property Graphs
Connected Data World
 

Mehr von Connected Data World (20)

Systems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenSystems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van Harmelen
 
Graph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaGraph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora Lassila
 
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
 
How to get started with Graph Machine Learning
How to get started with Graph Machine LearningHow to get started with Graph Machine Learning
How to get started with Graph Machine Learning
 
Graphs in sustainable finance
Graphs in sustainable financeGraphs in sustainable finance
Graphs in sustainable finance
 
The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is here
 
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
 
From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3
 
In Search of the Universal Data Model
In Search of the Universal Data ModelIn Search of the Universal Data Model
In Search of the Universal Data Model
 
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseGraph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
 
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
 
Graph Realities
Graph RealitiesGraph Realities
Graph Realities
 
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
 
Semantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scaleSemantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scale
 
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
 
Schema, Google & The Future of the Web
Schema, Google & The Future of the WebSchema, Google & The Future of the Web
Schema, Google & The Future of the Web
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Elegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsElegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property Graphs
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
 
Graph for Good: Empowering your NGO
Graph for Good: Empowering your NGOGraph for Good: Empowering your NGO
Graph for Good: Empowering your NGO
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Linked data at the BBC

  • 1. Linked Data at the BBC Augustine Kwanashie
  • 2. Outline •  Introduction •  APIs and Tools •  Validation and Data Quality •  Performance & Resilience Measures •  Summary
  • 4. Tagging BBC Content <http://www.bbc.co.uk/things/2b7ba3ca-32ca…> a cwork:CreativeWork, cwork:NewsItem ; cwork:title “Pep Guardiola…” ; cms:locator <urn:bbc:cps:asset:748947894> ; cwork:language “en-gb” ; cwork:primaryFormat cwork:TextualFormat ; prov:dateCreated "2017-04-07T21:39:23+00:00” .
  • 5. Tagging BBC Content <http://www.bbc.co.uk/things/4bdbf2-d1ad…> a core:Organisation, sport:SportingOrganisation ; core:label "Manchester City"@en-gb ; core:sameAs <http://www.wikidata.org/entity/Q50602> ; sport:competesIn <http://www.bbc../things/f1eb4771…> ; sport:discipline <http://www.bbc../things/ba6e1118…> ; sport:hasHome <http://www.bbc../things/0710009f…> .
  • 6. Tagging BBC Content Article _______ _______ Manchester City <http://www.bbc.co.uk/things/4bdbf21d-d1ad-... > Video _______ _______ Stream _______ _______ about about
  • 7. Tagging BBC Content <http://www.bbc.co.uk/things/2b7ba3ca-32ca…> a cwork:CreativeWork, cwork:NewsItem ; cwork:title ”In the future I will be better…" ; cwork:about <http://www.bbc../things/4bdbf2-d1ad…> . <http://www.bbc.co.uk/things/4bdbf2-d1ad…> a core:Organisation, sport:SportingOrganisation ; core:label "Manchester City"@en-gb ; core:sameAs <http://www.wikidata.org/entity/Q50602> .
  • 8. Tagging helps to: o  Group/aggregate content. o  Enhance content discovery. o  Enhance navigation. o  Improve personalisation and recommendation.
  • 11. Tagging a BBC Article CMS Content Store Writer API Triple Store Core API Website Content API LDM Journalists Journalists Content Metadata
  • 12. GraphDB Setup Jolokia Workbench Sesame Tomcat GraphDB master node Httpd Jolokia Workbench Sesame Tomcat GraphDB worker nodes Httpd S3 for storing backups Opsworks for management CloudWatch for monitoring
  • 13. Custom APIs vs. SPARQL endpoints SPARQL endpoints Custom APIs o  Can ensure performance. o  Ideal for rarely changing use-cases. o  Can validate writes. o  Complete flexibility with queries. o  Ideal for varied/changing use-cases.
  • 14. Write APIs Validation Applies set of validation rules Security Authenticate via SSL certificate whitelists Content-Types Accepts Turtle, RDF+XML Persistence Writes asynchronously to triplestore PUT: https://ldp-writer.int.api.bbci.co.uk/crea3ve-works Content-Type: text/turtle Body: <http://www.bbc.co.uk/things/2b7ba3ca-32ca…> a cwork:CreativeWork, cwork:NewsItem ; cwork:title "Pep Guardiola…" ; cwork:about <http://www.bbc../things/4bd…> .
  • 15. Read APIs Filters & Mixins Restrict returned data by type, domain, etc. Search Full-text search on labels. Content-Types Produces Trig, JSON+LD, JSON, HTML Security Authenticate via SSL certificatesGET: https://things.api.bbc.com/things ?type = core:Person &label_search = Theresa &mixin = pol Accept: json+ld
  • 16. Documenting APIs <urn:api:things:documentation> { <urn:api:things:get-multiple:covered-by> a api:Filter ; api:collectionFormat "multi"^^xsd:string ; api:description "Filter for Things with a matching bbc:coveredBy relationship."^^xsd:string ; api:in "query"^^xsd:string ; api:name "covered_by"^^xsd:string ; api:required "false"^^xsd:boolean ; api:type "array"^^xsd:string . }
  • 18. ThingGraphs context:123 { things:635 a core:Thing, sport:SportingOrganisation ; core:label "Manchester City"@en-gb ; core:sameAs <http://www.wikidata.org/entity/Q50602> . context:123 a prov:ThingGraph ; prov:managedBy cms:LDM ; prov:provider <mailto:augustine@bbc.co.uk> ; prov:provided "2014-08-20T10:47:42+00:00"^^xsd:dateTime . }
  • 19. Multiple ThingGraphs for single Thing context:01 { things:635 core:preferredLabel "Manchester City" . context:01 prov:managedBy cms:LDM . } context:02 { things:635 sport:competesIn things:834 . context:02 prov:managedBy cms:LDM . } context:03 { things:635 biz:listing things:856 . context:03 prov:managedBy cms:NewsIDM . }
  • 20. Thing Response Status: 200 Content-Type: application/trig Body: things:635 { things:635 core:preferredLabel "Manchester City" ; sport:competesIn things:834 ; core:label "Manchester City" ; biz:listing things:856 . } GET: https://things.int.api.bbc.com/things ?mixin = sport &mixin = biz Accept: application/trig
  • 21. Some Validation Rules Cannot delete a Thing that is used to tag a CreativeWork things:635 core:preferredLabel "Manchester City" ; sport:competesIn things:834 ; DELETE: https://ldp-writer.bbc.com ?guid=things:635 cwork:345 a cwork:CreativeWork ; tagging:about things:635 .
  • 22. Some Validation Rules Cannot update a ThingGraph managed by another CMS context:02 { things:635 sport:competesIn things:834 . context:02 prov:managedBy cms:LDM . } PUT: https://ldp-writer.bbc.com ?guid=things:635 X-ManagedBy: VIVO
  • 24. Managing Breaking Changes things:635 a core:Thing, core:Theme; core:label "Technology" ; core:sameAs dbpedia:Technology . things:635 a core:Thing, core:Theme; core:label "Technology"@en-gb, "Technologies"@fr, "Tecnología"@es ; core:sameAs dbpedia:Technology .
  • 25. Managing Breaking Changes things:635 a core:Thing, core:Theme; core:label "Technology" ; core:sameAs dbpedia:Technology . things:635 a core:Thing, core:Theme; trans-01:label "Technology”@en-gb, "Technologies"@fr, "Tecnología"@es ; core:label "Technology" ; core:sameAs dbpedia:Technology . 1 Add transition triples
  • 26. Managing Breaking Changes things:635 a core:Thing, core:Theme; core:label "Technology" ; core:sameAs dbpedia:Technology . things:635 a core:Thing, core:Theme; trans-01:label "Technology"@en-gb, "Technologies"@fr, "Tecnología"@es ; core:label "Technology"@en-gb, "Technologies"@fr, "Tecnología"@es ; core:sameAs dbpedia:Technology . 2 Align to new schema
  • 27. Managing Breaking Changes things:635 a core:Thing, core:Theme; trans-01:label "Technology”@en-gb, "Technologies"@fr, "Tecnología"@es ; core:label "Technology”@en-gb, "Technologies"@fr, "Tecnología"@es ; core:sameAs dbpedia:Technology . 3 Remove transition triples things:635 a core:Thing, core:Theme; core:label "Technology”@en-gb, "Technologies"@fr, "Tecnología"@es ; core:sameAs dbpedia:Technology .
  • 28. Tagging out of Context core:label "The Presidents of the United States of America"; core:disambiguationHint "Music Group"; Hard to identify and prevent!
  • 29. Managing Duplicate Things context:123 { things:635 a core:Thing, sport:Team; core:label "Manchester City"@en-gb ; core:sameAs dbpedia:Manchester_City_FC ; sport:managedBy things:6372 . context:123 prov:provided "2014-08-20" . } context:124 { things:636 a core:Thing, sport:Team; core:label "Man City"@en-gb ; core:sameAs <http://www.wikidata.org/234> . context:124 prov:provided "2017-02-12" . } 2800 CreativeWorks tagged 46 CreativeWorks tagged
  • 30. Managing Duplicate Things context:123 { things:635 a core:Thing, sport:Team; core:label "Manchester City"@en-gb ; core:sameAs <http://www.wikidata.org/234>, dbpedia:Manchester_City_FC ; sport:managedBy things:6372 . context:123 prov:provided "2014-08-20" . } 2846 CreativeWorks tagged 1 Switch tags from things:636 to things:635 2 Compare all triples for both Things 3 Update things:635 with consolidated triples 4 Delete things:636 Merge script to:
  • 32. Load-balance reads and writes separately Master node WriteELB Worker nodes ReadELB1 ReadELB2
  • 33. Asynchronous writes Clients Write API Write Pipeline POST POST 202 202 Write ELB POST 200 CREATED
  • 34. Optimise SPARQL queries SELECT ?subject ?predicate ?object WHERE { ?subject ?predicate ?object . { SELECT ?subject WHERE { OPTIONAL { ?subject prov:createdBy ?created . } } GROUP BY (?subject) HAVING BOUND (?subject) } } SELECT ?subject ?predicate ?object WHERE { ?subject ?predicate ?object . { SELECT ?subject WHERE { ?subject prov:createdBy ?created . } GROUP BY (?subject) } }
  • 35. Load-test against future demand 100% Increase in the number of CreativeWorks by 201960% Increase in 99 percentile response times by 2019 21m Requests to the CreativeWorks API daily 94m Triples in Triplestore
  • 37. Auto-scaling on API Instances 5 1 2 3 4 1 ELB sends metrics 2 Instances send metrics 3 Alarms trigger autoscaling action 4 New instance is created 5 Instance is added to pool
  • 40. Queue-based write pipeline Queued writes across multiple clusters Writer API Consumer Primary GraphDB Cluster Consumer Replica GraphDB Cluster
  • 41. Event-based write pipeline Event-based writes improves resilience Writer API Consumer Replica GraphDB Cluster Primary GraphDB Cluster Event store API RDS Notification Topics
  • 42. Backup and Recovery 26GB Per backup 20mins Recovery time 16Full backups per day Opsworks recipes to: ²  Switch Primary and Replica cluster roles. ²  Schedule backups. ²  Restore backup to cluster. S3: ²  Stores backups by date/time. ²  Retires old backups to Glacier.
  • 43. Replacing Triplestore Clusters Read ELB Write ELB Primary: Cluster 1 Replica: Cluster 2 Cluster 3 1 Create new cluster and load data
  • 44. Replacing Triplestore Clusters Read ELB Write ELB Primary: Cluster 1 Replica: Cluster 3 Cluster 2 2 Swap new cluster with replica 3 Delete old replica cluster
  • 45. Replacing Triplestore Clusters Read ELB Write ELB Primary: Cluster 3 Replica: Cluster 1 4 Swap new cluster with primary 5 Repeat steps 1 - 4
  • 46. Responding to incidents CPU Utilisation > 90% for 5 mins? CloudWatch Alarm!!! Severity: - Warning? - Critical? Action: - Email to Dev team? - Notify 247 support? - Trigger Autoscaling action?
  • 48. Opening up BBC Things
  • 49. Opening up BBC Things
  • 50. Opening up BBC Things
  • 51. Opening up BBC Things
  • 52. Main points o  Separating content from metadata o  APIs powered by Linked Data o  Monitoring and reacting to incidents o  Performance for present and future
  • 55. Filters and Mixins http://www.bbc.co.uk/things/4bdbf2-d1a1 http://www.bbc.co.uk/things/4bdbf2-d1a2 http://www.bbc.co.uk/things/4bdbf2-d1a3 http://www.bbc.co.uk/things/4bdbf2-d1a4 http://www.bbc.co.uk/things/4bdbf2-d1a5 http://www.bbc.co.uk/things/4bdbf2-d1a6 Filter by type = core:Person Mixin = sport, core things:4bdbf2-d1b1 a core:Thing, sport:Team ; core:title "Manchester City" ; biz:listedIn <http://www.londonstockexchange> ; sport:managedBy things:4bdbf2-d1a1; biz:tradingAs "Manchester City PLC” .
  • 57. Handing Data Scala libraries to enable easy RDF manipulation Trig Turtle etc. Connections-RDF ²  Import/Export ²  Create triples ²  Compare Graphs ²  Navigate Graphs ²  Manage Datasets Trig Turtle etc.
  • 58. Handing Data RDF DSL in Scala val rdfGraph = ( Iri("http://…") >> Rdf.`type` >>> Core.Thing >> Sport.`type` >>> Sport.Organisation >> BBC.coveredBy >>> Iri("urn:bbc:news") >> Core.label >>> "Manchester City" ) val label = (rdfGraph Core.label).get[String]
  • 59. Some Validation Rules things:635 core:preferredLabel "Manchester City" ; cms:locator <urn:bbc:cps:asset:39715040>, <urn:bbc:cps:asset:39715040> . Thing locators must be unique things:635 core:preferredLabel "Manchester City" ; cms:locator <urn:bbc:cps:asset:01>, <urn:bbc:cps:asset:02> . <urn:bbc:cps:asset:01> a cms:CPSLocator . <urn:bbc:cps:asset:02> a cms:CPSLocator . Locator Types must be unique
  • 60. Some Validation Rules things:635 cms:locator <urn:bbc:cps:asset:01> . things:636 cms:locator <urn:bbc:cps:asset:01> . Multiple Things with the same locator things:635 cms:sameAs dbpedia:01 . things:636 cms:sameAs dbpedia:01 . Multiple Things with the same sameAs things:635 core:label "Manchester City" rdf:type owl:Class . Blacklisted URIs present
  • 61. Ordering Thing Updates Correctly create:1 update:2 update:3 delete:4 Document Writer Primary GraphDB Cluster 1 Fetch events from Event store 1 2 34 2 Execute task on Triplestore (only if task id is newer) 3 Errors? Put on Retry queue 4 Fetch and process events from Retry queue
  • 62. Search: Creating an Index INSERT DATA { luc:index luc:setParam "uris" . luc:include luc:setParam "literals" . luc:includePredicates luc:setParam "core:label rdf:label core:shortLabel" . luc:moleculeSize luc:setParam "1" . luc:labelIndex luc:createIndex "true" . }
  • 63. Search: Creating an Index :manutd :fclub :manc type label Football Club label Manchester United locatedIn label Manchester locatedIn :uk label United Kingdom RDF Module for :manutd RDF Module for :manc
  • 64. Search: Full & Incremental Re-index INSERT DATA { luc:labelIndex luc:addToIndex <http://www.bbc.co.uk/things/2b7ba3ca-32ca…> . } Run incremental re-index after each Thing update INSERT DATA { luc:labelIndex luc:updateIndex _:b1 . } Run full re-index once daily
  • 65. Search: Full Text Search Query SELECT ?thing ?score WHERE { ?thing a tagging:TagConcept . ?thing luc:score ?score . ?thing luc:labelIndex " (Manchester OR *Manchester OR *Manchester*) " . } o  Index available during the re-index process
  • 66. Searching logs service="triple-store" and env="live" and "Error" Instance logs S3 bucket CloudWatch Logs