1) The document provides a brief history of databases from punch cards and tapes for record keeping in the 1960s-1970s to modern relational and NoSQL databases.
2) It discusses how current databases struggle with complex data and queries that artificial intelligence systems require, noting that a "hyper-relational database" like Grakn is needed to power AI.
3) Grakn is presented as a knowledge base and database that can represent complex domains, perform real-time inference, and enable automated large-scale analytics through its query language.
GRAKN.AI: The Hyper-Relational Database for Knowledge-Oriented Systems
1. T H E D A T A B A S E F O R A I
Join our community at grakn.ai/community
The Hyper-Relational Database for
Knowledge-Oriented Systems
By Haikal Pribadi
Founder and CEO of GRAKN.AI
2. Follow us @GraknLabs
1960 1970 1980 1990 2000 2010 2020 2030
Punch cards
& Tapes
Record Keeping
A BRIEF HISTORY OF DATABASES
3. Follow us @GraknLabs
1960 1970 1980 1990 2000 2010 2020 2030
Punch cards
& Tapes
Navigational
Databases
SCALE
Record Keeping
A BRIEF HISTORY OF DATABASES
4. Follow us @GraknLabs
1960 1970 1980 1990 2000 2010 2020 2030
Business Intelligence (BI)
Punch cards
& Tapes
Navigational
Databases
SCALE
Record Keeping
âWouldnât it be nice if you could express the question at a higher level
and let the system figure out how to do the navigation?â
Edgar F. Codd, Inventor of Relational Databases
A BRIEF HISTORY OF DATABASES
5. Follow us @GraknLabs
1960 1970 1980 1990 2000 2010 2020 2030
Relational/SQL
Databases
Business Intelligence (BI)
SCALE
COMPLEXITY
RELATIONAL DB WAS INVENTED TO SOLVE COMPLEXITY
âWouldnât it be nice if you could express the question at a higher level
and let the system figure out how to do the navigation?â
Edgar F. Codd, Inventor of Relational Databases
Punch cards
& Tapes
Navigational
Databases
Record Keeping
6. Follow us @GraknLabs
1960 1970 1980 1990 2000 2010 2020 2030
Relational/SQL
Databases
Business Intelligence (BI)
Web Applications
COMPLEXITY
A BRIEF HISTORY OF DATABASES
Punch cards
& Tapes
Navigational
Databases
Record Keeping
SCALE
7. Follow us @GraknLabs
1960 1970 1980 1990 2000 2010 2020 2030
Relational/SQL
Databases
NoSQL & NewSQL
Databases
SCALE
Business Intelligence (BI)
Web Applications
COMPLEXITY
A BRIEF HISTORY OF DATABASES
Punch cards
& Tapes
Navigational
Databases
Record Keeping
SCALE
8. Follow us @GraknLabs
1960 1970 1980 1990 2000 2010 2020 2030
Relational/SQL
Databases
NoSQL & NewSQL
Databases
SCALE
Business Intelligence (BI)
Web Applications
Artificial Intelligence (AI)
COMPLEXITY
A BRIEF HISTORY OF DATABASES
Punch cards
& Tapes
Navigational
Databases
Record Keeping
SCALE
9. Follow us @GraknLabs
1960 1970 1980 1990 2000 2010 2020 2030
Relational/SQL
Databases
NoSQL & NewSQL
Databases
SCALE
COMPLEXITY
COMPLEXITY
Business Intelligence (BI)
Web Applications
Artificial Intelligence (AI)
?
AI S YST EMS P RO CES S KN OW L EDG E T HAT I S TO O CO MP L EX FO R CURREN T DATABAS ES
Punch cards
& Tapes
Navigational
Databases
Record Keeping
SCALE
10. Follow us @GraknLabs
1960 1970 1980 1990 2000 2010 2020 2030
Relational/SQL
Databases
NoSQL & NewSQL
Databases
Business Intelligence (BI)
Web Applications
Artificial Intelligence (AI)
SCALE
COMPLEXITY
SCALE
COMPLEXITY
WHAT RELATIONAL DID FOR BI, IS WHAT GRAKN WILL DO FOR AI
Punch cards
& Tapes
Navigational
Databases
Record Keeping
11. Follow us @GraknLabs
What is the problem with complex data?
Too complex to model
Current modelling
techniques only based on
binary relationships
Could not model complex
domains
Too complex to query
Current languages only allow
you to query for explicitly
stored data
Could not simplify verbose
queries
Too expensive analytics
Automated distributed
algorithms (BSP) expensive
and not reusable
Could not reuse analytics
algorithms
DB QLs are too low-level
Strong abstraction over low-
level constructs and
complex relationships
Difficult to work with
complex data
12. Follow us @GraknLabs
GRAKN.AI is a hyper-relational database
for knowledge-oriented systems
i.e.
GRAKN.AI is a knowledge baseKnowledge Storage System
Novel Knowledge Representation System based on
Hypergraph Theory
Knowledge Inference
OLTP Reasoning Engine
Knowledge Analytics
OLAP Distributed Analytics
13. Follow us @GraknLabs
What is a hyper-relational database?
Hyper-expressive schema
Flexible Entity-Relationship
concept-level schema to
build knowledge models
Model complex
domains
Real-time inference
Automated deductive
reasoning of data points
during runtime (OLTP)
Derive implicit facts &
simplification
Analytics as a Language
Automated distributed
algorithms (BSP) as a
language (OLAP)
Automated large scale
analytics
High-level query language
Strong abstraction over low-
level constructs and
complex relationships
Easier to work with
complex data
14. Follow us @GraknLabs
THE HYPER-EXRESSIVE SCHEMA
A knowledge base needs to be able to model the real world and all the
type hierarchies, hyper-relationships and rules contained within it.
15. Follow us @GraknLabs
Schema Example: Basic Model
Employ-
ment
Person CompanyName
Employee Employer
has has
relates relates
plays plays
16. Follow us @GraknLabs
Schema Example: Type-Hierarchy
Employ-
ment
Person
Customer
Company
Startup
Name
Employee Employer
has has
sub sub
relates relates
plays plays
plays plays
17. Follow us @GraknLabs
Schema Example: Type-Hierarchy
Employ-
ment
Person
Customer
Company
Startup
Name
Employee Employer
has has
sub sub
relates relates
plays plays
Husband
Wife
Marriage
plays
plays
relates
relates
18. Follow us @GraknLabs
Valid Data Insertion
Alice Bob
IBM
Grakn
mar
emp
emp
employer
employer
wife husband
â Write commit success
customerperson
startup
19. Follow us @GraknLabs
Invalid Data insertions â [intelligent] Schema Constraints are Back!
Charlie Applemar
husband wife
companyperson
â Write commit fails
â Invalid relationship
22. Follow us @GraknLabs
Rule Example: Transitive Relationship
Kings
Cross London
loc
countryward
UK
loc
city
loc
23. Follow us @GraknLabs
Rule Example: Simple Business Rule
Schedule A
Schedule B
A Start B Start A End B end
24. Follow us @GraknLabs
THE INFERENCE OLTP LANGUAGE
A knowledge-oriented query language should not only be able to
retrieve explicitly stored data, but also implicitly derived information.
25. Follow us @GraknLabs
Complex Query Example
drive
drive
drive
travel
travel
travel
Alice
Full-time Emp
Bob
Part-time Emp
Charlie
Temporary Emp
AB123
Bus
BC234
Van
CD345
Truck
Kings
Cross
Ward
London
City
UK
Country
loc
loc
Who are all the
drivers that will be
arriving in the UK?
The query would be very
long and complex in SQL,
NoSQL or even Graphs
26. Follow us @GraknLabs
Complex Query Example: Type and Relationship Inference
drive
drive
drive
travel
travel
travel
Alice
Full-time Emp
Bob
Part-time Emp
Charlie
Temporary Emp
AB123
Bus
BC234
Van
CD345
Truck
Kings
Cross
Ward
London
City
UK
Country
loc
loc
Who are all the
drivers that will be
arriving in the UK?
27. Follow us @GraknLabs
THE ANALYTICS OLAP LANGUAGE
Large-scale analytics is like teenage sex: everyone talks about it,
nobody really knows how to do it, everyone thinks everyone else is
doing it, so everyone claims they are doing it too.
At the end of the day, very few people know how to code it.
28. Follow us @GraknLabs
Example of a Distributed Analytics Algorithm
For each vertex V,
Superstep 1:
V sends its own id via both out going and incoming edges
V sets its own id as cluster label
Do superstep n:
For every received message m of V, compare it to its current cluster label L:
If m > L, set the label to m;
If the cluster label has not changed in this super step, vote to halt;
Else, send the new cluster label via all edges;
Global operation:
While not every vertex votes to halt, and n < N, do another superstep n + 1.
Connected Component: a clustering algorithm (pseudocode)
An efficient implementation
of this algorithm is about
200 lines of code in Java
29. Follow us @GraknLabs
Example of a Distributed Analytics Algorithm
For each vertex V,
Superstep 1:
V sends its own id via both out going and incoming edges
V sets its own id as cluster label
Do superstep n:
For every received message m of V, compare it to its current cluster label L:
If m > L, set the label to m;
If the cluster label has not changed in this super step, vote to halt;
Else, send the new cluster label via all edges;
Global operation:
While not every vertex votes to halt, and n < N, do another superstep n + 1.
Connected Component: a clustering algorithm (pseudocode)
An efficient implementation
of this algorithm is about
200 lines of code in Java
30. Follow us @GraknLabs
Graql Distributed Analytics Queries
And weâll continue to add more
algorithms into the language,
such as PageRank, K-Core, Triangle
Count, Density, Cliques, Centrality,
and so on
32. Follow us @GraknLabs
G R A K N
G R A Q L
Grakn is the distributed knowledge base to store complex data. It contains a knowledge
representation system built on top of distributed computing technology stacks.
Graql is a query language that uses machine reasoning to interpret complex relationships &
retrieve implicitly derived knowledge from Grakn. It has a reasoning and analytics engine.
Reasoning Engine
Real-time inference for OLTP
Analytics Engine
Distributed analytics for OLAP
Knowledge Representation System
Novel approach based on hypergraph theory
Automated Reasoning OLTP query language
Interprets complex relationships and infer implicit information
Guarantees logical integrity, like SQL
Real time validation of data wrt. a more expressive schema constraint
Distributed Analytics OLAP query language
Interprets complex relationships and infer implicit information
Expressive Knowledge Representation System
Contains types, subtypes, hyper-relations, rules and instances
High Scale of Relationships, like Graph DBs
Relationships are first class citizens and easy to query without joins
Scales Horizontally, like NoSQL
Scaling by sharding and replication, with linear query throughput
What makes Grakn a Knowledge Base?
33. Follow us @GraknLabs
âFor a computer to pass a Turing
Test, it needs to possess: Natural
Language Processing, Knowledge
Representation, Automated
Reasoning and Machine Learningâ
Peter Norvig (Research Director, Google) and
Stuart J. Russell (CS Professor, UC Berkeley),
âArtificial Intelligence: A Modern Approachâ, 1994
Wait, why do we need a knowledge base?
34. Follow us @GraknLabs
The Architecture of Cognition
Comprehension and production of
language: communication
Natural Language Processing
Reasoning, problem solving, logical
deduction, and decision making
Automated Reasoning
Expression, Conceptualisation,
memory and understanding
Knowledge Representation
Judgment and evaluation:
To adapt to new
circumstances and to
detect and extrapolate
new patterns
Machine Learning
Information Retrieval, Natural
Language Understanding:
User data, Enterprise data,
Financial data, Web data, etc.
Knowledge Acquisition
COGNITION is "the mental action or
process of acquiring knowledge and
understanding through thought,
experience, and the senses."
35. Follow us @GraknLabs
Knowledge Base
The Architecture of Cognition
Comprehension and production of
language: communication
Judgment and evaluation:
To adapt to new
circumstances and to
detect and extrapolate
new patterns
Information Retrieval, Natural
Language Understanding:
User data, Enterprise data,
Financial data, Web data, etc.
Storage of knowledge (i.e.
complex information), and
retrieval of explicitly stored data
and derive new conclusions.
Natural Language Processing
Machine LearningKnowledge Acquisition
COGNITION is "the mental action or
process of acquiring knowledge and
understanding through thought,
experience, and the senses."
36. Follow us @GraknLabs
THE ARCHITECTURE OF A COGNITIVE SYSTEM
Natural Language Processing
Knowledge Base Machine LearningKnowledge Acquisition
38. Follow us @GraknLabs
GRAKN IS ENABLING DEVELOPMENTS OF AI IN FINANCE & LIFE SCIENCE
FINANCIAL MARKET
KNOWLEDGE BASE
Building a financial market knowledge
base by aggregating information of real
world events to predict the price
movements of different asset classes
CROP SCIENCE
KNOWLEDGE BASE
Building a crop science knowledge base
from half a million field crop trials data
to understand the performance of
different crop varietals and strains
HUMAN GENOMICS
KNOWLEDG BASE
Building a life science knowledge base
by aggregating public & proprietary bio
datasets to drive scientific discovery in
the fields of human genomics
39. Follow us @GraknLabs
VALUE TO AI: BE THE UNIFIED REPRESENTATION OF KNOWLEDGE
Inference of low-level patterns and
automation of analytics algorithms
Machine translation for parsed
query interpretation
Expressive and extensible
knowledge model
INPUT SYSTEMS
e.g. Information Retrieval, Entity Extraction,
Natural Language Understanding
LEARNING SYSTEMS
e.g. Neural Networks, Bayesian Networks,
Kernel Machines, Genetics Programming
OUTPUT SYSTEMS
e.g. Natural Language Query,
Natural Language Generation