SlideShare a Scribd company logo
1 of 43
Computational Research Division Lawrence Berkeley National Laboratory Dan Gunter
Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Terminology: NOSQL and “Schemaless” ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
NOSQL past and present Pre-RDBMS RDBMS era NOSQL
Pre-relational structured storage systems ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Computer Systems News , 11/28/83
The relational model ,[object Object],[object Object],[object Object],A 1 ... A n Value 1 ... Value n R Relation (Table) Relation variable (Table name) Attribute (Column) {unordered} Heading Tuple (Row) {unordered}
Recent NOSQL database products Columnar  or  Extensible record  Google BigTable HBase Cassandra HyperTable SimpleDB Document Store CouchDB MongoDB Lotus Domino Graph DB Neo4j FlockDB InfiniteGraph Key/Value Store Mnesia Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak
Why NOSQL? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
CAP Theorem ,[object Object],[object Object],[object Object],[object Object],[object Object],All robust distributed systems live here Forfeit partition-tolerance Forfeit availability Forfeit consistency Single-site databases, cluster databases, LDAP  Distributed databases w/pessimistic locking, majority protocols Coda, web caching, DNS,  Dynamo
CAP, ACID, and BASE ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ACID BASE
Pioneers ,[object Object],[object Object],These implementations are  not  publicly available, but the distributed-system techniques that they integrated to build huge databases have been imitated, to a greater or lesser extent, by every implementation that followed.
Google BigTable ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
BigTable’s Data Model Google’s Bigtable is essentially a massive, distributed 3-D spreadsheet. It doesn’t do SQL, there is limited support for atomic transactions, nor does it support the full relational database model. In short, in these and other areas, the Google team made design trade-offs to enable the scalability and fault-tolerance Google apps require. - Robin Harris, StorageMojo (blog), 2006-09-08 t 6 t 5 t 3 name contents: anchor:cnnsi.com ... anchor:my.look.ca ... “ com.cnn.www” “ CNN” ... “ CNN.com” ... “ <html>...” “ <html>...” “ <html>...”
Tablets and SSTables ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Use of Bloom Filters to optimize lookups ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],w  is not in {  x, y, z  } because it hashes to one position with a 0 1 1 1 0 0 1 0 1 0 1 0 0 1 0 { x,  w y, z }
Chubby and Paxos ,[object Object],Each “DB” is a replica Each server runs on its own host Google tends to run 5 servers, with only one being the “master” at any one time Chubby server  DB Chubby server  DB Chubby server  DB Chubby server  DB Chubby server  DB Master
What about CAP? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Amazon’s Dynamo ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dynamo data partitioning and replication Virtual node Host “node” Host “node” Virtual node Virtual node Virtual node Virtual node Virtual node Virtual node . . Hash ring using consistent hashing Host “node” Virtual node Virtual node Virtual node Virtual node 4 4 3 Item Hashes to this spot coordinator node replicas
Eventual consistency and sloppy quorum ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Replica synchronization with Merkle trees ,[object Object],[object Object],[object Object],For Dynamo, the “data” are the keys stored in a given virtual node Each node is a hash of its children If two top hashes match, then the trees are the same
Infrastructure (at scale) is fractal ,[object Object],[object Object],[object Object]
The Gold Rush Columnar  or  Extensible record  Google BigTable HBase Cassandra HyperTable SimpleDB Document Store CouchDB MongoDB Lotus Domino Graph DB Neo4j FlockDB InfiniteGraph Key/Value Store Mnesia Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak Hibari
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Key/Value Store Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak Hibari
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Project Voldemort Type Key/Value Store License Apache 2.0 Language Java Company Linked-In Web project-voldemort.com
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],result = self.client.add(bucket.get_name()).map(&quot;Riak.mapValuesJson” .reduce(&quot;Riak.reduceSum”.run() Riak Example: Map/reduce with the Python API Type Key/Value Store License Open-Source Language Erlang Company Basho Web wiki.basho.com/display/RIAK/Riak/
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Hibari Type Key/Value Store  License Open-Source Language Erlang Company Gemini Mobile Web sourceforge.net/projects/hibari/
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Columnar  or  Extensible record  Google BigTable HBase Cassandra HyperTable
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Cassandra ,[object Object],Type Extensible column store License Apache 2.0 Language Java Company Apache Software Foundation Web cassandra.apache.org
[object Object],[object Object],[object Object],[object Object],[object Object],SimpleDB Document Store CouchDB MongoDB Lotus Domino Mnesia
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],CouchDB Type Document store License Apache 2.0 Language Erlang Company Apache Software Foundation Web couchdb.org
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],MongoDB ,[object Object],http://www.slideshare.net/mongodb/mongodb-replica-sets Type Document store License GPL Language C++ Company 10gen Web mongodb.org
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Mnesia * Mozilla Public License modified to conform with laws of Sweden (more herring) Type Document store License EPL* Language Erlang Company Ericsson Web www.erlang.org Papers http://www.erlang.se/publications/mnesia_overview.pdf
Why do we care about Mnesia / OTP? ,[object Object],[object Object],females() -> F = fun() -> Q = query [E.name || E <- table(employee),     E.sex = female] end, mnemosyne:eval(Q) end, mnesia:transaction(F).  Erlang query for “all females” in company* *I know, but it’s not  my  example. This is right out of the manual.
Comparison of MongoDB and CouchDB ,[object Object],[object Object],[object Object],[object Object],[object Object],Database Inserts/sec MongoDB 16,000 CouchDB 70 CouchDB, batch 1,800
Schemaless data modeling http://labs.mudynamics.com/2010/04/01/why-nosql-is-bad-for-startups/
Example from distributed monitoring ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],All of these are data modeling “anti-patterns” for relational DBs
What’s wrong with EAV? ,[object Object],[object Object]
What about queries?
SQL vs. M/R and other models ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Selected references ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Simplilearn
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 

What's hot (20)

Consistency in NoSQL
Consistency in NoSQLConsistency in NoSQL
Consistency in NoSQL
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
Polyglot Persistence
Polyglot Persistence Polyglot Persistence
Polyglot Persistence
 
Data models in NoSQL
Data models in NoSQLData models in NoSQL
Data models in NoSQL
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 

Similar to Schemaless Databases

No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
sankarapu posibabu
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
Bhupesh Bansal
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
Igor Moochnick
 
No sql distilled-distilled
No sql distilled-distilledNo sql distilled-distilled
No sql distilled-distilled
rICh morrow
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
J Singh
 

Similar to Schemaless Databases (20)

05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt
 
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
 
No SQL Databases.ppt
No SQL Databases.pptNo SQL Databases.ppt
No SQL Databases.ppt
 
NoSQL Basics - A Quick Tour
NoSQL Basics - A Quick TourNoSQL Basics - A Quick Tour
NoSQL Basics - A Quick Tour
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
 
No sql
No sqlNo sql
No sql
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?
 
Datastores
DatastoresDatastores
Datastores
 
No sql
No sqlNo sql
No sql
 
Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
 
No sql distilled-distilled
No sql distilled-distilledNo sql distilled-distilled
No sql distilled-distilled
 
NOSQL
NOSQLNOSQL
NOSQL
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Schemaless Databases

  • 1. Computational Research Division Lawrence Berkeley National Laboratory Dan Gunter
  • 2.
  • 3.
  • 4. NOSQL past and present Pre-RDBMS RDBMS era NOSQL
  • 5.
  • 6.
  • 7. Recent NOSQL database products Columnar or Extensible record Google BigTable HBase Cassandra HyperTable SimpleDB Document Store CouchDB MongoDB Lotus Domino Graph DB Neo4j FlockDB InfiniteGraph Key/Value Store Mnesia Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. BigTable’s Data Model Google’s Bigtable is essentially a massive, distributed 3-D spreadsheet. It doesn’t do SQL, there is limited support for atomic transactions, nor does it support the full relational database model. In short, in these and other areas, the Google team made design trade-offs to enable the scalability and fault-tolerance Google apps require. - Robin Harris, StorageMojo (blog), 2006-09-08 t 6 t 5 t 3 name contents: anchor:cnnsi.com ... anchor:my.look.ca ... “ com.cnn.www” “ CNN” ... “ CNN.com” ... “ <html>...” “ <html>...” “ <html>...”
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19. Dynamo data partitioning and replication Virtual node Host “node” Host “node” Virtual node Virtual node Virtual node Virtual node Virtual node Virtual node . . Hash ring using consistent hashing Host “node” Virtual node Virtual node Virtual node Virtual node 4 4 3 Item Hashes to this spot coordinator node replicas
  • 20.
  • 21.
  • 22.
  • 23. The Gold Rush Columnar or Extensible record Google BigTable HBase Cassandra HyperTable SimpleDB Document Store CouchDB MongoDB Lotus Domino Graph DB Neo4j FlockDB InfiniteGraph Key/Value Store Mnesia Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak Hibari
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36. Schemaless data modeling http://labs.mudynamics.com/2010/04/01/why-nosql-is-bad-for-startups/
  • 37.
  • 38.
  • 40.
  • 41.
  • 42.
  • 43.

Editor's Notes

  1. This talk really comes out of my attempt to orient myself in this space. Background is in monitoring distributed systems, concerned with scalable collection and data analysis. But also want to know what I can use for semi-structured data “in the small”.
  2. Where it applies, the distinction between relatively fixed schemas and dynamic ones is more technically significant than what query syntax is used to access the data, as has been shown by a number of products that provided a dialect of SQL as an alternative query language either alongside or on top of their native syntax.
  3. PICK -- MultiValue (aka PICK) databases are developed at TRW in 1965. M[umps] -- According to comment from Scott Jones M[umps] is developed at Mass General Hospital in 1966. It is a programming language that incorporates a hierarchical database with B+ tree storage. IBM IMS -- IBM IMS, a hierarchical database, is developed with Rockwell and Caterpillar for the Apollo space program in 1966. ISM -- InterSystems develops the ISM product family succeeded by the Open M product, all M[umps] implementations. See comment from Scott Jones below. ANSI M -- M[umps] is approved as a ANSI standard language in 1977. AT&amp;T DBM -- in 1979 Ken Thompson creates DBM which is released by AT&amp;T. At it&apos;s core it is a file-based hash. TDBM -- TDBM supporting atomic transactions NDBM -- NDBM was the Berkeley version of DBM supporting having multiple databases open at the same time. SDBM -- SDBM - another clone of DBM mainly for licensing reasons. GT.M -- GT.M is the first version of a key-value store with focus on high performance transaction processing. It is open sourced in 2000. BerkeleyDB -- BerkeleyDB is created at Berkeley in the transition from 4.3BSD to 4.4BSD. Sleepycat software is started as a company in 1996 when Netscape needed new features for BerkeleyDB. Later acquired by Oracle which still sell and maintain BerkeleyDB. Lotus Domino -- Lotus Notes or rather the server part, Lotus Domino, which really is a document database has it&apos;s initial release in 1989, now sold by IBM. It has evolved a lot from the early versions and is now a full office and collaboration suite. GDBM -- GDBM is the Gnu project clone of DBM Mnesia -- Mnesia is developed by Ericsson as a soft real-time database to be used in telecom. It is relational in nature but does not use SQL as query language but rather Erlang itself. Cache -- InterSystems CachÈ launched in 1997 and is a hybrid so-called post-relational database. It has object interfaces, SQL, PICK/MultiValue and direct manipulation of data structures. It is a M[umps] implementation. See Scott Jones comment below for more on the history of InterSystems Metakit -- Metakit is started in 1997 and is probably the first document oriented database. Supports smaller datasets than the ones in vogue nowadays. Neo4j -- Graph database Neo4j is started in 2000. db4o -- db4o an object database for java and .net is started in 2000 QDBM -- QDBM is a re-implementation of DBM with better performance by Mikio Hirabayashi. Memcached -- Memcached is started in 2003 by Danga to power Livejournal. Memcached isn&apos;t really a database since it&apos;s memory-only but there is soon a version with file storage called memcachedb. Infogrid graph DB -- Infogrid graph database is started as closed source in 2005, open sourced in 2008 CouchDB -- CouchDB is started in 2005 and provides a document database inspired by Lotus Notes. The project moves to the Apache Foundation in 2008. Google BigTable -- Google BigTable is started in 2004 and the research paper is released in 2006. JackRabbit -- JackRabbit is started in 2006 as an implementation of JSR 170 and 283. Tokyo Cabinet -- Tokyo Cabinet is a successor to QDBM by (Mikio Hirabayashi) started in 2006 Dynamo -- The research paper on Amazon Dynamo is released in 2007. MongoDB -- The document database MongoDB is started in 2007 as a part of a open source cloud computing stack and first standalone release in 2009. Cassandra -- Facebooks open sources the Cassandra project in 2008 Voldemort -- Project Voldemort is a replicated database with no single point-of-failure. Started in 2008. Dynomite -- Dynomite is a Dynamo clone written in Erlang. Terrastore -- Terrastore is a scalable elastic document store started in 2009 Redis -- Redis is persistent key-value store started in 2009 Riak -- Riak Another dynamo-inspired database started in 2009. HBase -- HBase is a BigTable clone for the Hadoop project while Hypertable is another BigTable type database also from 2009. Vertexdb -- Vertexdb another graph database is started in 2009 Term: NOSQL -- Eric Evans of Rackspace, a committer on the Cassandra project, introduces the term NoSQL often used in the sense of Not only SQL to describe the surge of new projects and products.
  4. Both of these systems are still used. An open-source version of M, called GT.M, is available (since 2000). M is still used by the US Dept of Veterans Affairs, and also by Ameritrade (Cache’: 12B transactions a day), ING Direct, and others in the financial industry. The IBM IMS system is still very actively used today, in particular for the US Federal Reserve. According to Wikipedia, odds are good your ATM transaction hits an IMS database. Chinese banks have purchased IMS technology. IMS includes a separate “transaction management” (TM) system.
  5. E. F. Codd’s seminal 1970 paper, “ A Relational Model of Data for Large Shared Data Banks” laid out a solid mathematical basis for databases in contrast to the hierarchical and network models of the time, relational algebra, an offshoot of first-order logic, provided a declarative means of reasoning about the data that did not depend on the implementation SQL is “loosely based” on relational algebra
  6. This taxonomy will be explored in more detail later, the point for now is that there are several different types of datastores and a number of examples of each and, referring back to the timeline, most of these implementations have occurred in the past few years..
  7. Corporations (once again) found themselves at the forefront of systems research. But what was that research? (Read on..)
  8. If nothing else, being able to refer to the “CAP theorem” the next time your networked demo breaks..
  9. In his talk, Brewer said “there is almost no work in this area”. I think that the existence of scalable (schemaless) database systems is proof that this has changed.
  10. Pictured is Parliament, pioneers of funk!
  11. Trivia: what major movie was about producing a script called “Chubby Rain”?
  12. Example of a BIgTable that stores web pages (directly out of the paper). The row names are reversed URLs (so sorted rows tend to group things by the same domain) There are two column families, “contents” and “anchor” In this example, each anchor cell has one version, and the contents column has 3
  13. Paxos is an old and well-known algorithm. The Chubby “Database” is really a set of directories with small “lockfiles”. Each tablet server gets one Chubby directory, and each of its tablets is a lockfile.
  14. These core services included the Amazon e-commerce shopping cart.
  15. Each virtual node is responsible for keys between itself and its predecessor on the ring. The mapping of a single node to a variable number of virtual nodes on the hash ring accounts for heterogeneity (host “power”) in the system.
  16. The quorum is “sloppy” because R and W refer to the number of healthy nodes, which may change between the write and subsequent read of the key.
  17. (Who knows what this is?) The picture is a close-up of a vegetable: the “ Chou Romanesco&amp;quot; cauliflower
  18. Particularly appropriate analogy because of the industry’s tendency to rush towards shiny new technologies! Following sections will examine each of these categories and walk through one publicly available product (or more) for each. With the exception of graph databases, which I simply haven’t taken the time to grok yet.
  19. Both Voldemort and the next database, Riak, claim they were “inspired” by the early Dynamo paper
  20. In the diagram, the green nodes are head; orange middle; red are tails. The white arrows are write requests, grey read requests, and red are (all) replies.
  21. Developed by former engineers from BigTable and Dynamo projects, in heavy use at Facebook. For consistency level, zero = totally async.; Any= 1 node, including hinted handoff; Quorum = R/2+1 where R = #replicas Reads of 0 or Any don’t make sense. 0=no data, Any=wrong node; can’t do read-repairs, just the handed-off version
  22. Has a nice Web UI called “Futon”. Yes, everything is a reclining furniture pun.
  23. Obviously, this is at best a micro-benchmark. YCSB stands for Yahoo! Cloud Serving Benchmark
  24. I won’t attempt to actually cover Map/Reduce, and don’t know Erlang. Instead: what impact do these databases have on data modeling efforts?