Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Stardog 1.1: Easier, Smarter, Faster RDF Database
1. stardog.com
Stardog 1.1
An Easier, Smarter,
Faster RDF Database
Michael Grove, Clark & Parsia LLC
mike@clarkparsia.com
@mikegrovesoft, @stardog_db, @candp
1
2. stardog.com
About C&P
• We build semantic technology tools
for enterprise solutions
• Proud bootstrappers since 2005
• Offices in DC and Cambridge, MA
• Government & enterprise customers
2
3. stardog.com
What is Stardog?
• a pure Java RDF database
• full-service, feature rich
• focus on query performance
• standards compliant
• scalable (up first, out next)
3
4. stardog.com
History
• Development started summer 2010
• Stardog 0.5 alpha - 2 May 2011
• Stardog 1.0 final - 19 June 2012
• Total of 32 releases, ~500 tickets,
100s of email on the mailing list
• Stardog 1.0.7 presently
• Stardog 1.1 real soon now...
4
6. stardog.com
What is easy?
• What’s “easy” in an RDF database?
• Configuration
• Maintenance
• User Experience
• i.e., rationally predictable
• Easier for whom? Not a simple
question.
6
7. stardog.com
Configuration
• Convention, not configuration
• “Quick Start” is shortest page in the
docs
• 4 steps to querying
• Predictable, sane defaults throughout
• Adapted to Java, Unix, Semtech cultures
• Culture is key to convention
• Very good (!) documentation
7
8. stardog.com
Maintenance
• Nothing is easier than doing nothing
• RDF & OWL are ideally schema
flexible
• Job scheduler: search, indexes, etc.
• Data migration tools since < 1.0
• Multi-tenancy, online & offline DBs
• Just add data...Automatic data
quality*
• NoSQL == Anti-jobs program for DBAs
8
9. stardog.com
Except that...
• Every DB has to be admin’d &
maintained
• Matter of degree, not kind
• Stardog Enterprise Server Management
• audit logging
• JMX monitoring
• web console
• online backups (coming soon!)
9
10. stardog.com
User Experience
• Client-server & Embeddable
• Jena, Sesame, SNARL, HTTP
• SPARQL query simplifications
• ACID transactions
• Idiomatic Java & Unix interfaces
• Great CLI & shell…
• Windows has gotten much better! :>
• Rich security model
10
12. stardog.com
Okay...that’s BS.
• “Smarter” is market speak
• But Stardog 1.1 has rich feature set
• Reasoning, including UDR
• Integrity Constraint Validation (ICV)
• Semantic Search
• Security
• Spring
• Linked Data Platform
12
13. stardog.com
Reasoning
• OWL 2 DL, QL, EL, and RL
• Query-time, no materialization
• Only pay for what you eat
• Embarrassingly parallel in part
• Pellet 3 embedded for OWL 2 DL
schema reasoning only
• Very flexible re: NGs & schemas
13
14. stardog.com
User-defined Rules
• New in 1.1!
• Using SWRL syntax
• Including all SWRL builtins
• Which are also available to SPARQL
• Recently added new individual builtin
• Create new individuals in your rules
• Beware of non-termination!
• Executed at query time like everything else
14
15. stardog.com
ICV?
• Integrity Constraint Validation
• Automated data quality
• Closed world semantics
• Transactional
• High-level & declarative
• ICs can be OWL, SWRL, or SPARQL
15
16. stardog.com
Example...
Only employees who are US citizens can
work on projects that receive funding from a
US government agency.
Class:
Project and
(receivesFundsFrom some USGovAgency)
SubClassOf:
inverse(worksOn) only
(Employee and nationality value "US")
More examples: http://stardog.com/docs/
16
17. stardog.com
Semantic Search
• Uses Waldo, our deep adaptation of
Lucene
• Text index from RDF literals
• Search for resources or literals
• Integrated with SPARQL query
evaluation
• Auto-managed search indexes
17
18. stardog.com
Security
• Rich security model
• Based on standard RBAC model
• Applies at database-level
• Will extend to Named Graphs in 1.x
• Easy CLI admin tools (& Java API)
18
19. stardog.com
Spring
• Love it or not, Spring isn’t going away
• Support Batch, Data Import, etc.
• Open Source: http://github.com/
clark-parsia/spring-stardog
• Developed by an early adopter who
needed it; supported/maintained by
C&P
19
20. stardog.com
Linked Data
• Stardog fills a hole in our Linked
Data Platform
• HTML5, pure JS, client side web
framework (based on backbone.js)
• Linked Data publishing suite
• Stardog Linked Data
Catalog...Enterprise Linked Data
management app
20
22. stardog.com
Finally...
• Now we can talk about something
that’s objective, context-free, and
measurable
• Yes!
• But no…#include <std_disclaim.h>
• Your data & your queries are the
only things that really matter
22
23. stardog.com
That said...
• Two de facto benchmarks for
SPARQL:
• BSBM, OLTP-style, query mixes
per hour (QMpH · 25)
• SP2B, OLAP-style (torture test), set
of queries within a timeout, T, at a
data size D
23
24. stardog.com
SP2B
• Stardog completes SP2B at 5M,
10M, and 25M (except q5a)
• No other RDF database completes >
5M. (As of the most recent report.
Things change.)
• Considerable performance
differential
• Pushing this out to 100M+ in 1.x
24
25. stardog.com
BSBM
• A throughput test, primarily. Not
necessarily simple queries
• On modest machine, 255 clients, 10M
triples, we sustain 7m queries per hour
(277k QMpH)
• At 100M, 255 clients, sustain 3m
queries per hour (125k QMpH)
• Among the top 2 or 3 RDF DBs for BSBM
performance
• We will tackle BSBM BI next...
25
26. stardog.com
Data Loading
• Two indexing modes
• Triples only indexing
• Faster loading, slower NG query
• Up to 250,000 triples per second
• Quads indexing
• Slower loading, faster NG query
• Up to 150,000 triples per second
• More improvements coming in the future
• Customized RDF parser
• Will look at user-defined index subsets
26
27. stardog.com
What’s new in 1.1
• Aforementioned user defined rules
• But most notably, SPARQL 1.1
• Our most requested feature in a
survey
• Oh, we also made it faster
27
28. stardog.com
SPARQL 1.1
• Latest revision of the SPARQL query
language
• Put off implementing until spec finalized
• It’s still in flux, but we decided to go for it
• Adds useful new features to SPARQL
• Aggregates, grouping, sub-query,
negation
• Oh, and the entailment regimes
28
29. stardog.com
SPARQL 1.1
• Rewrite of query planner & engine for 1.0.5
• Changes needed to support SPARQL 1.1
• Tested by users for the past 3 releases
• With great power comes great responsibility...
• New features are not without cost
• Query planning & optimization more crucial
than ever
• Majority of development time
29
34. stardog.com
Feature Rich
• Support for RDFS, OWL2 profiles (EL, RL, QL) & OWL2 DL
via schema only queries
• Semantic Search
• ICV
• Transactions
• Rich security model
• Support for major APIs
• Jena & Sesame, and our own SNARL
• SPARQL HTTP protocol, Graph Store protocol
• Also includes a CLI & Shell environment
34