The Synergy Between the Object Database, Graph Database, Cloud Computing and NoSQL Paradigms
1. ICOODB 2010 - Frankfurt, Deutschland
The Synergy Between the Object
Database, Graph Database, Cloud
Computing and NoSQL Paradigms
Leon Guzenda - Objectivity, Inc.
1 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
2. AGENDA
⢠Historical Overview
⢠Inherent Advantages of ODBMSs
⢠Technology Evolution
⢠Leveraging Technologies
⢠Graph Databases
⢠Summary
2 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
4. The ODBMS Players
1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2009 2010
Matisse
VBase (Ontologic) ď´ Ontos
GemStone (Servio Logic ď´ Servio ď´ GemStone)
Objectivity/DB
InfiniteGraph
ObjectStore (Object Design ď´ Excelon ď´ Progress)
GBase (Graphael)
Versant Object Database (Versant)
O2 (ď´ Ardent ď´ Ascential)
UniSQL
Poet ď´ FastObjects
db4objects
Note that many of these
companies were founded
earlier, e.g. Objectivity, Inc.
Berkeley DB (Sleepycat ď´ Oracle)
was founded in June 1988.
4 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
5. ODBMS Evolution
⢠1980s
â âPerformance, Performance, Performance!â
DATA
â Primarily scientiďŹc and engineering applications MANIPULATION
⢠1990s Applications
â Reliability and Scalability tended to generate
data and
â New languages and Operating Systems
relationships
â Large deployments in the scientiďŹc domain
⢠2000s RELATIONSHIP
â Ease of use and instrumentation ANALYTICS
â Query languages Applications
â Performance and scalability ingest and
correlate data and
â Grids and Clouds relationships
â Embedded systems, government and more...
5 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
7. Faster Navigation
PROBLEM:
Find all of the Suspects
linked to a chosen Incident
Incident_Table Join_Table Suspect_Table
Relational solution:
N * 2 B-Tree lookups
N * 2 logical reads
B-Trees
Incident Suspects
Objectivity/DB solution:
1 B-Tree lookup
1 + N logical reads
SigniďŹcantly faster navigation of relationships
7 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
8. Lower Query Latency
Relational
Receive &
O/R Send Access Read Create Return
Interpret Optimize Qualify Loop
Mapping Request Indices Data View Result
Request
Objectivity/DB
QualiďŹed objects are
Initialize Start Access
Qualify
Open
Loop
returned as soon as
Iterator Loop Indices Object
they are found.
Time
8 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
10. Grids and Clouds
⢠1996 - CERN started looking for a DBMS for the LHC.
⢠RD45 team veriďŹed that a distributed ODBMS could
handle complexity, performance and Petabyte+ scale.
⢠Lead to grid deployments at SLAC and Brookhaven
⢠Meanwhile, developers migrated from CORBA to SOA.
⢠Then, our grid and SOA experience made the migration
to cloud environments very easy.
10 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
11. The âNoSQLâ Movement
⢠Some web application developers found RDBMSs too
restrictive.
⢠They were dealing with:
â Huge parallel ingest streams
â Applications that scan or navigate rather than query
â Unconventional transaction models.
⢠So... they re-invented the wheel!
â Sharding (Hadoop and Big Table)
â Key-Value tables (Big Table)
â Dynamo...
11 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
12. Sharding
Splits large tables into groups of related rows and puts them onto separate servers. They each have a schema.
Schema 1
Client Split/Combine Server Africa & Antarctica
Product Catalog
⢠Big tables can be split Schema 1
Server Americas
⢠Small âhotâ tables may be replicated Product Catalog
Schema 1
⢠The client has to ďŹgure out Server Australasia and Asia
which server to use.
Product Catalog
⢠Data has to be compatible with the Server ETC.
client hardware, OS and language.
12
Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
13. Hadoopâs HDFS
⢠Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop
applications.
⢠HDFS creates multiple replicas of 64+ Megabyte data blocks and distributes them on
compute nodes throughout a cluster to enable reliable, extremely rapid computations.
⢠Each data block is replicated 3 times - twice on the same rack and once on another rack.
Client HDFS client Name Node Directory
Data Node Data block 1a Data block 1b
⢠The client needs to know Data block 1c
the name of the block to be
Data block 2
operated on. Data block 3
⢠The block is copied to the
Data Node Data block 4
client for processing.
13 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
15. Objectivity/DB
⢠Federated object-oriented database platform
⢠Single Logical View across distributed persistent objects
⢠Eliminates the OO language to database mapping layer
⢠Ultra-fast object navigation
⢠Customizable, distributed query engine
⢠Dynamic schema changes and low administration
overheads
⢠Interoperability across multiple languages and platforms
⢠High Availability
⢠Grid and cloud environment enabled
15 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
16. Distributed Architecture
Client Simple, Distributed Servers
Application Lock Server
Lock Server
Smart Lock Server
Cache Query Server
Objectivity/DB Lock Server
Data Server
Enhances scalability and availability
16 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
17. Parallel Query Engine
Application Lock Server
Lock Server
Smart Lock Server Filter/
Cache Query Server
Query Server Gateway
PQE
Objectivity/DB Task Lock Server
Splitter Data Server
The Task Splitter aims queries at speciďŹc Filters can run complex qualiďŹcation methods.
databases and containers Gateways can access other databases or search engines.
Replaceable components for smarter optimization
17 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
18. Objectivity/DB Advantages
Fully Distributed with Client-Side Smart Caching
Highly efďŹcient storage and navigation of relationships
Flexible Clustering
Scalable Collections
Customizable Parallel Query Engine
Quorum-based Replication and High Availability
Flexible, Multi-mode Indexing
Fully Interoperable Across Platforms and Languages
18 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
19. Objectivity/DB 10.1
⢠User-replaceable Parallel Query Engine
search agents
⢠Page level and partial backups
⢠Eclipse RCP
⢠Visual Studio 2010 support
⢠Mac OS X support
19 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
20. Graph Databases
20 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
21. The Link Hunter
21 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
22. Graph Databases
⢠Nodes are represented as Vertices
Vertex
⢠Relationships are represented as Edges
⢠Edges may be weighted
Edge
⢠Both are regular objects
â Properties
â Methods
â Inheritance
22 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
23. InfiniteGraph...
â˘Dedicated Graph API
â˘Easy to use and deploy
â˘Java now, C# soon...
â˘Built on Objectivity/DB
â˘Fully interoperable
23 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
24. ...InfiniteGraph...
⢠Create and update graph structures
⢠Traversal with constraints:
â Return only designated Edge types
â Return if not an excluded Edge type
â Direction matches intent
â Properties do not match a provided predicate
â Maximum path depth has been reached...
⢠Optional Event notiďŹcation
â Target vertex was found
â Path is being abandoned
24 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
25. ...InfiniteGraph
⢠Search and Query Processing
â Supports Objectivity/DB and Lucene indexing
â Key and range queries
â Full text searching
â Regular expression search of string-based keys
⢠Path Finding
â Start and end vertices
â Maximum depth
â Vertex type inclusion and exclusion lists
â Edge type list
25 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
26. InfiniteGraph Licensing
⢠60-day free trial
⢠Free GoGrid cloud development and
deployment environment for qualiďŹed
startups and non-proďŹts
âLicenses must be procured after a pre-agreed
annual revenue level is reached
⢠Open Source framework licenses
⢠Standard commercial licenses
26 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
27. Case Studies
27 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
28. Objectivity/DB & Hadoop...
HDFS Weaknesses
⢠HDFS cannot be directly mounted by an existing operating system
⢠Getting data into and out of the HDFS ďŹle system, an action that often needs to be
performed before and after executing a job, can be inconvenient.
⢠A ďŹlesystem in userspace has been developed to address this problem, at least for
Linux and some other Unix systems.
⢠It moves 64+ MB blocks and doesnĘźt support POSIX ďŹle operations
⢠Replicating data three times is costly.
⢠However, there is a version that uses a parity block, decreasing the physical
storage requirements from 3x to around 2.2x.
⢠The Name Node is a single point of failure
⢠If the name node goes down, the ďŹlesystem is ofďŹine. When it comes back
up the name node must replay all outstanding operations. This replay process can
take over half an hour for a big cluster.
28 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
29. ...Objectivity/DB & Hadoop...
64+ MB Container
ďŹle/blocks
PAGE OOFS
Objectivity/DB + HDFS
Client SERVER
Security
(HDFS
client)
âAMS PAGE SERVERâ Cache
⢠Problem: HDFS works best when transferring 64+ MB blocks of data.
Objectivity/DB works best with 16-64KB blocks.
⢠Solution: Implement a memory cache for the HDFS blocks.
29 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
30. ...Objectivity/DB & Hadoop
Name Node FDâ
Client HDFS client Objectivity/DB Name Node FD
Data Node Data block 1a Data block 1b
⢠Removes the single point Data block 1c
of failure by using an
Data block 2
Objectivity/DB Name Node Data block 3
Federation (replicated).
Data Node Data block 4
⢠Could provide other services
too, such as content tagging
and connectivity.
30 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
31. InfiniteGraph and Cassandra
⢠Apache Cassandra provides a structured key-value
store with eventual consistency.
⢠It is a resilient, distributed DBMS.
⢠The prototype uses social network data.
⢠It extracts data from Cassandra, then ďŹnds the shortest
paths between people.
Application
InfiniteGraph Cassandra
31 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
32. Summary
32 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
33. NoSQL?
â˘All of these features could have been obtained
from âCommercial Off The Shelfâ ODBMSs:
⢠Unique object/document IDs ⢠Flexible object clustering
⢠Sharding ⢠Effectivity (data in a relationship)
⢠Shared-Nothing ⢠Geospatial and multi-dimensional indexing
⢠Fully distributed ⢠Hash table (key-value) lookups
⢠No lock and novel transaction modes ⢠Hyperspace = = single logical view of a federation
⢠Iterators and fast, predictable traversals ⢠Multi-way replication
⢠Fast scans ⢠High Availability
⢠Optimization for random access ⢠Text searching
⢠In Memory Database conďŹgurability
33 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
34. Summary
⢠We need to re-examine the reasons that the
NoSQL movement didnât just use ODBMSs.
⢠The NoSQL movement helps strengthen our
argument that RDBMSs arenât always the
best choice.
⢠Graph DBMSs can supplement RDBMS,
NoSQL and ODBMS technologies.
⢠The best Graph DBMSs are built on ODBMSs.
⢠ODBMSs are here to stay.
34 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010
35. Questions?
⢠objectivity.com - White papers, downloads etc.
⢠EMAIL: info @ objectivity.com
⢠inďŹnitegraph.com - WPs, downloads etc.
⢠EMAIL: info @ inďŹnitegraph.com
⢠Presenter: leon @ objectivity.com
If they give you paper with lines on, write sideways!
35 Leon Guzenda at ICOODB 2010 Š Objectivity, Inc. 2010