overview of the RDF graph database-as-a-service (GraphDB based) on the Self-Service Semantic Suite (S4)
http://s4.ontotext.com
presentation for the AKSW Group of the University of Leipzig
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Â
RDF Database-as-a-Service with S4
1. RDF Database-as-a-Service with S4
Marin Dimitrov, CTO of Ontotext
Apr 27th, 2015
RDF DBaaS with S4 / AKSW Colloquium #1Apr 2015
2. ⢠Self-Service Semantic Suite (S4)
⢠RDF DBaaS on AWS
⢠Demo
Contents
#2RDF DBaaS with S4 / AKSW Colloquium Apr 2015
3. About Ontotext
⢠Provides products & solutions for content
enrichment and metadata management
â 70 employees, headquarters in Sofia (Bulgaria)
â Sales presence in London, Washington & Boston
⢠Major clients and industries
â Media & Publishing
â Health Care & Life Sciences
â Cultural Heritage & Digital Libraries
â Government
â Education
#3RDF DBaaS with S4 / AKSW Colloquium Apr 2015
5. ⢠On-demand capabilities for text analytics, content
enrichment and metadata management
â Text analytics for news, life sciences and social media
â RDF graph database as-a-service
â Access to large open knowledge graphs
⢠Available anytime, anywhere
â Simple RESTful services
⢠Simple, pay-per-use pricing
â No upfront commitments
What is S4?
#5RDF DBaaS with S4 / AKSW Colloquium Apr 2015
6. ⢠Enables quick prototyping
â Instantly available, no provisioning & operations
required
â Focus on building applications, donât worry about
infrastructure
⢠Free tier
â Even bigger free quotas for research groups & projects
⢠Easy to start, shorter learning curve
â Various add-ons, SDKs and demo code
⢠Based on enterprise semantic technology by
Ontotext
S4 benefits
#6RDF DBaaS with S4 / AKSW Colloquium Apr 2015
7. ⢠Text analytics services
â News annotation
â News categorisation
â Biomedical
â Twitter
⢠Entity linking & disambiguation
â Mappings to DBpedia & GeoNames instances
â Mappings to biomedical data sources (LinkedLifeData)
⢠HTML, MS Word, XML, plain text input
⢠Simple JSON output
Text analytics with S4
#7RDF DBaaS with S4 / AKSW Colloquium Apr 2015
9. ⢠Available from AWS Marketplace
⢠Variety of hardware configurations
â 2 to 8 CPU cores / 8 to 61 GB RAM
â IOPS performance & encryption (EBS)
⢠Manage large data volumes
⢠Pay-per-hour pricing
Self-managed RDF DB in the Cloud
#9RDF DBaaS with S4 / AKSW Colloquium Apr 2015
10. ⢠Low-cost DBaaS available 24/7
⢠Ideal for small & moderate data volumes
⢠Instantly deploy new databases when needed
⢠Zero administration: automated operations,
maintenance & upgrades
⢠Users pay only for the actual database utilisation
â Number of triples stored + number of queries per month
Fully managed RDF DB in the Cloud
#10RDF DBaaS with S4 / AKSW Colloquium Apr 2015
11. ⢠SPARQL query endpoint to the FactForge
knowledge graph
â 500 million entities / 5 billion triples
⢠Key LOD datasets integrated
â DBpedia, Freebase, GeoNames, WordNet
â Dublin Core, SKOS, PROTON ontologies and
vocabularies
Knowledge graphs with S4
#11RDF DBaaS with S4 / AKSW Colloquium Apr 2015
12. ⢠(available soon)
⢠Knowledge Graph bundles
â DBpedia, Wikidata, GeoNames, âŚ
â GraphDB RDF database (self-managed @ AWS)
â 3rd party interactive data exploration tool (faceted
search, data navigation, dynamic charts)
⢠Get instant & reliable access to KGs without
dealing with provisioning, data import,
maintenance, âŚ
Knowledge graphs with S4
#12RDF DBaaS with S4 / AKSW Colloquium Apr 2015
13. ⢠Java & C# SDKs
⢠Sample code
â Java, C#, NodeJS, JavaScript, Python, PHP, Groovy
â Curl examples for the most impatient
⢠GATE & UIMA plugins
⢠Firefox & Chrome add-ons
⢠Online documentation
S4 for developers
#13RDF DBaaS with S4 / AKSW Colloquium Apr 2015
14. ⢠DaPaaS & ProDataMarket
â Goal: Open Data / Linked Data publishing & hosting
â S4 role: scalable Linked Data hosting infrastructure
⢠KConnect
â Goal: semantic annotation, search & analytics for
healthcare data
â S4 role: scalable text analytics & RDF data management
infrastructure
Research projects using S4
#14RDF DBaaS with S4 / AKSW Colloquium Apr 2015
16. ⢠Elastic
â dynamically adapt to data & query volumes
⢠High availability & resilience
â no SPFs, âgraceful degradationâ of performance upon
failures
⢠Cost efficient
â cost aware architecture
â Key aspect for Open Data scenarios like DaPaaS &
ProDataMarket
⢠Isolation of the multi-tenant databases
⢠Fair use of shared resources
Requirements
#16RDF DBaaS with S4 / AKSW Colloquium Apr 2015
17. ⢠Micro DB
â Up to 1M triples
â FREE, available now
⢠Extra Small DB (10M triples)
⢠Small DB (50M)
⢠Medium DB (250M)
⢠Large DB (1B)
RDF DBaaS options on S4
#17RDF DBaaS with S4 / AKSW Colloquium Apr 2015
18. ⢠AWS based
â Storage, compute, load balancing, integration servicesâŚ
⢠Ontotext GraphDB for the database instances
⢠OpenRDF REST services
⢠Docker for containerisation
⢠Network-attached volumes (EBS) for data storage
⢠A DBaaS on S4 isâŚ
â A GraphDB instance
â Running within a Docker container
â With a private EBS data volume
Implementation
#18RDF DBaaS with S4 / AKSW Colloquium Apr 2015
19. ⢠Routing nodes
â Expose OpenRDF RESTful services to apps
â Access control & quota checks
â Forward client requests to the proper data node
â Temporarily queue requests when necessary
⢠Data nodes
â Multiple Docker containers (GDB+EBS) per node
⢠Coordinator (single)
â Distribute DB initialisation / creation tasks to data
nodes
⢠Management Console
S4 DBaaS architecture
#19RDF DBaaS with S4 / AKSW Colloquium Apr 2015
21. ⢠CRUD
â Router node receives a request
â Routes it to the proper data node & container
â Receives a response, forwards it back to client app
⢠Routing updates
â Data nodes push notification via SNS â âhearbeatsâ +
changes regarding the hosted DBs (if any)
â Each routing node receives the notifications (via SNS)
and updates its routing tables
â Coordinator also receives notifications, learns which
DBs are operational / down for maintenance
Normal operations
#21RDF DBaaS with S4 / AKSW Colloquium Apr 2015
22. Failure case #1 â data node crash
#22RDF DBaaS with S4 / AKSW Colloquium Apr 2015
REST apps
3rd party RDF
tools
Quota&AccessControl
routers
data nodes
coordinator
EBS
SNS
Docker
Repository
12
2
2
3
23. Recovery from a data node crash
#23RDF DBaaS with S4 / AKSW Colloquium Apr 2015
REST apps
3rd party RDF
Visualisation
Quota&AccessControl
routers
data nodes
Coordinator
EBS
SNS
Docker
Repository
1
2
3+4
56
6
6
7
Auto Scaling
24. Failure case #2 â router crash &
recovery
#24RDF DBaaS with S4 / AKSW Colloquium Apr 2015
REST apps
3rd party RDF
tools
Quota&AccessControl
routers
data nodes
coordinator
EBS
SNS
Docker
Repository
13
Auto Scaling
4
5
6
7
8
2
25. ⢠(open connections from client apps to the node
are terminated)
⢠Auto-scaler starts a new router node
â New router subscribes to SNS for heartbeats & updates
⢠Load balancer starts sending new client requests
to router
â Router puts them in the local queue (if routing table is
still incomplete)
⢠Heartbeats from data nodes are received
â Routing information is now complete
â Router starts sending the queued requests to data
nodes
Recovery from a router crash
#25RDF DBaaS with S4 / AKSW Colloquium Apr 2015
26. Failure case #3 â coordinator crash &
recovery
#26RDF DBaaS with S4 / AKSW Colloquium Apr 2015
REST apps
3rd party RDF
tools
Quota&AccessControl
routers
data nodes
coordinator
EBS
SNS
Docker
Repository
2
Auto Scaling
4
5
6
6
3
Create DB 1
27. ⢠Routers can route requests to data nodes as usual
â ⌠but new DBs cannot be created temporarily
â ⌠and data nodes with free container slots canât get
info on DBs waiting for initialisation
⢠AWS Auto-scaler starts a new Coordinator node
â Coordinator reads a list of all registered DBs from the
metadata store & subscribes to SNS
⢠Coordinator starts receiving heartbeats & updates
from data nodes
â ⌠learns which DBs are operational / pending
â ⌠and resumes distributing new / pending DBs
initialisation tasks to the data nodes with free slots
Failure case #3 â coordinator crash &
recovery
#27RDF DBaaS with S4 / AKSW Colloquium Apr 2015
28. ⢠Combination of coordinator + data node + routing
node crash â same as #1 + #2 + #3
⢠Routers depend on data nodes
⢠Data nodes depend on Coordinator
⢠Coordinator does not depend on other nodes
â No heartbeats coming, means all DBs are down
â Start distributing DB initialisation tasks whenever a
request comes from a working data node
â Eventually, all data nodes are up, DBs initialised,
heartbeats & routing updates start coming
â ⌠and routers can start routing client requests
Composite failure & recovery
#28RDF DBaaS with S4 / AKSW Colloquium Apr 2015
29. Management interface
#29RDF DBaaS with S4 / AKSW Colloquium Apr 2015
Micro, XS, S, M, or L
I/O performance
R/O access to Open
Data services or
open knowledge
graphs
30. Management interface
#30RDF DBaaS with S4 / AKSW Colloquium Apr 2015
DBaaS endpoint
DB details summary
Backup, export, change
settings, delete
Run a test query
31. ⢠Gradually introduce XS, S, M and L instances
⢠Integration with the GraphDB Workbench
management UI
⢠LDF based containers
⢠Multi-datacenter deployment
⢠Replication across datacenters (single master)
Roadmap
#31RDF DBaaS with S4 / AKSW Colloquium Apr 2015
32. ⢠âOn-demand Text Analytics and Metadata
Management with S4â (ESaaSA @ CLOSERâ2015)
⢠âText Analytics and Linked Data Management As-
a-Service with S4â (Wasabi @ ESWCâ2015)
⢠âLow-cost Open Data As-a-Service in the Cloudâ
(SemDev @ ESWCâ2015)
More Details
#32RDF DBaaS with S4 / AKSW Colloquium Apr 2015
34. ⢠(create an account & generate an API key pair)
⢠Create a new DB
⢠Create a new repository in the DB
â via the REST API / OpenRDF Java SDK / curl
â âŚor via UI tools like the OpenRDF Workbench
⢠Import sample data (REST / OpenRDF Workbench)
⢠Run a query through the public SPARQL endpoint
Demo scenario
#34RDF DBaaS with S4 / AKSW Colloquium Apr 2015
35. Demo data â Universities in Saxony
#35RDF DBaaS with S4 / AKSW Colloquium Apr 2015
36. #1 Create a database
#36RDF DBaaS with S4 / AKSW Colloquium Apr 2015
47. ⢠S4 provides an enterprise RDF DBaaS
⢠Resilient design, high availability
⢠Instantly available whenever needed, easy to use,
OpenRDF REST services
⢠Zero administration: automated operations,
maintenance & upgrades
⢠Free DBs up to 1M triples (even more for research
teams & projects)
⢠Check out http://s4.ontotext.com
Key takeaways
#47RDF DBaaS with S4 / AKSW Colloquium Apr 2015