The document discusses NoSQL databases and their different classes, including column stores, document stores, and key-value stores. It provides examples of column store databases BigTable and HBase, and notes that document stores like CouchDB allow data to be stored without a predefined schema. The document also discusses object databases and their advantages over relational databases in avoiding the object-relational impedance mismatch.
1. 2 December 2005
Introduction to Databases
NoSQL Databases
Prof. Beat Signer
Department of Computer Science
Vrije Universiteit Brussel
http://www.beatsigner.com
2. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 2May 19, 2017
NoSQL Databases
Recently, the term NoSQL databases has been
introduced for different non-RDBMS solutions
non-relational, horizontally scalable, distributed, ...
often ACID properties not fully guaranteed
- e.g. eventual consistency
many solutions driven by web application requirements
different classes of NoSQL solutions
- object databases (db4o, ObjectStore, Objectivity, Versant, ...)
- column stores (BigTable, HBase, ...)
- document stores (CouchDB, MongoDB, ...)
- key-value (tuple) stores (Membase, Redis, ...)
- graph databases (Neo4j, …)
- XML databases (Tamino, BaseX, ...)
- ...
3. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 3May 19, 2017
Column Stores
Solutions for large scale distributed storage systems
very large "tables" with billions of rows and millions of columns
petabytes of data across thousands of servers
BigTable
distributed storage solution for structured data used by Google
HBase
distributed open source database (similar to BigTable)
part of the Apache Hadoop project
use MapReduce framework for processing
- map step
• master node divides problem into subproblems and delegates them to child nodes
- reduce step
• master mode integrates solutions of subproblems
4. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 4May 19, 2017
Document Stores
Data no longer stored in tables
Each record (document) might have a different format
(number and size of fields)
Apache's CoucheDB is an example of a free
and open source document-oriented database
5. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 5May 19, 2017
Impedance Mismatch Revisited
Combination of SQL with a host language
mix of declarative and procedural programming paradigms
two completely different data models
different set of data types
Interfacing with SQL is not straightforward
data has to be converted between host language and SQL due to
the impedance mismatch
~30% of the code and effort is used for this conversion!
The problem gets even worse if we would like to use an
object-oriented host language
two approaches to deal with the problem
- object databases (object-oriented databases)
- object-relational databases
6. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 6May 19, 2017
Impedance Mismatch Revisited ...
Note that it would be easier to use the SQL AVG operator
public float getAverageCDLength() {
float result = 0.0;
try {
Connection conn = this.openConnection();
Statement s = conn.createStatement();
ResultSet set = s.executeQuery("SELECT length FROM CD");
int i = 0;
while (set.next()) {
result += set.getInt(1);
i++;
}
return result/i;
} catch (SQLException e) {
System.out.println("Calculation of average length failed.");
return 0;
}
}
7. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 7May 19, 2017
Object Databases
ODBMSs use the same data model as object-oriented
programming languages
no object-relational impedance mismatch (due to uniform model)
An object database combines the features of an object-
oriented language and a DBMS (language binding)
treat data as objects
- object identity
- attributes and methods
- relationships between objects
extensible type hierarchy
- inheritance, overloading and overriding as well as customised types
declarative query language
8. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 8May 19, 2017
Persistent Programming Languages
Several approaches have been proposed to make
transient programming language objects persistent
persistence by class
- declare that a class is persistent
- all objects of a persistent class are persistent whereas objects of
non-persistent classes are transient
- not very flexible; we would like to have persistent and transient objects
from a single class
- many ODBMSs provide a mechanism to make classes persistence capable
persistence by creation
- introduce new syntax to create persistent objects
- object is either persistent or transient depending on how it was created
persistence by marking
- mark objects as persistent after creation but before the program terminates
9. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 9May 19, 2017
Persistent Programming Languages ...
persistence by reachability
- one or more objects are explicitly declared as persistent objects (root objects)
- all the other objects are persistent if they are reachable from a root object via
a sequence of one or more references
- easy to make entire data structures persistent
10. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 10May 19, 2017
ObjectStore Example
Persistence by reachability via specific database roots
Persistence capable classes
post-processor makes specific classes persistent capable
Persistent aware classes
can access and manipulate persistent objects (not persistent)
Three states after a persistent object has been loaded
hollow: proxy with load on demand (lazy loading)
active: loaded in memory and flag set to clean
stale: no longer valid (e.g. after a commit)
Person ariane = new Person("Ariane Peeters")
db.createRoot("Persons", ariane);
11. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 11May 19, 2017
ObjectStore Example ...
Post processing
(1) compile all source files
(2) post-process the class files to generate annotated versions of
the class files
(3) run the post-processed main class
javac *.java
osjcfp –dest . –inplace *.class
java mainClass
12. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 12May 19, 2017
ODBMS History
First generation ODBMS
1984
- George P. Copeland and David Maier,
Making Smalltalk a Database System,
SIGMOD 1984
1986
- G-Base (Graphael, F)
1987
- GemStone (Servio Corporation, USA)
1988
- Vbase (Ontologic)
- Statice (Symbolics)
David MaierGeorge P. Copeland
13. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 13May 19, 2017
ODBMS History ...
Second generation ODBMS
1989
- Ontos (Ontos)
- ObjectStore (Object Design)
- Objectivity (Objectivity)
- Versant ODBMS (Versant Object Technology)
1989
- The Object-Oriented Database System Manifesto
Third generation ODBMS
1990
- Orion/Itasca (Microelectronis and Computer Technology Cooperation, USA)
- O2 (Altaïr, F)
- Zeitgeist (Texas Instruments)
14. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 14May 19, 2017
ODBMS History ...
Further developments
1991
- foundation of the Object Database Management Group (ODMG)
1993
- ODMG 1.0 standard
1996
- PJama (Persistent Java)
1997
- ODMG 2.0 standard
1999
- ODMG 3.0 standard
2001
- db4o (database for objects)
...
15. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 15May 19, 2017
The Object-Oriented Database Manifesto
Malcolm Atkinson, François Bancilhon,
David DeWitt, Klaus Dittrich,
David Maier and Stanley Zdonik,
The Object-Oriented Database
System Manifesto, 1989
Malcolm Atkinson François Bancilhon David DeWitt Klaus Dittrich
David Maier Stanley Zdonik
16. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 16May 19, 2017
The Object-Oriented Database Manifesto ...
The Object-Oriented Database System Manifesto by
Atkinson et al. was an attemp to define object-oriented
databases
defines 13 mandatory features that an
object-oriented database system must have
- 8 object-oriented system features
- 5 DBMS features
optional features
- multiple inheritance, type checking, versions, ...
open features
- points where the designer can make a number of choices
17. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 17May 19, 2017
The Object-Oriented Database Manifesto ...
Object-oriented system features
complex objects
- complex objects built from simple ones by constructors (e.g. set, tuple and list)
- constructors must be orthogonal
object identity
- two objects can be identical (same object) or equal (same value)
encapsulation
- distinction between interface (public) and implementation (private)
types and classes
- type defines common features of a set of objects
- class as a container for objects of the same type
type and class hierarchies
overriding, overloading and late binding
18. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 18May 19, 2017
The Object-Oriented Database Manifesto ...
computational completeness
- should be possible to express any computable function using the DML
extensibility
- set of predefined types
- no difference in usage of system and user-defined types
DBMS features
persistence
- orthogonal persistence (persistence capability does not depend on the type)
secondary storage management
- index management, data clustering, data buffering, access path selection and
query optimisation
concurrency
- atomicity, consistency, isolation and durability (ACID)
- serialisability of operations
19. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 19May 19, 2017
The Object-Oriented Database Manifesto ...
recovery
- in case of hardware or software failures, the system should recover
ad hoc query facility
- high-level declarative query language
The OODBMS Manifesto lead to discussion and
reactions from the RDBMS community
Third-Generation Database System Manifesto, Stonebraker et al.
The Third Manifesto, Darwen and Date
Issues not addressed in the manifesto
database evolution
constraints
object roles
...
20. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 20May 19, 2017
Object Data Management Group (ODMG)
Object Database Management
Group (ODMG) was founded in 1991
by Rick Cattel
standardisation body including all major
ODBMS vendors
Defines a standard to increase the porta-
bility accross different ODBMS products
Object Model
Object Definition Language (ODL)
Object Query Language (OQL)
language bindings
- C++, Smalltalk and Java bindings
Rick Cattell
21. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 21May 19, 2017
ODMG Object Model
ODMG object model is based on the OMG object model
Basic modelling primitives
object: unique identifier
literal: no identifier
An object's state is defined by the values it carries for a
set of properties (attributes or relationships)
An object's behaviour is defined by the set of operations
that can be executed
Objects and literals are categorised by their type
(common properties and common behaviour)
22. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 22May 19, 2017
Object Definition Language (ODL) Example
Assistant Professor
Employee Salary
Lecture Exercise
Session
Course
StudentI
Student
teaches
isTaughtBy
leads
isLeadBy
hasPrerequisites
isPrerequisiteFor
attends
isAttendedBy
hasSessions
isSessionOf
one-to-one
many-to-many
one-to-many
is-a
extends
27. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 27May 19, 2017
Object Databases
Many ODBMS also implement a versioning mechanism
Many operations are performed by using a navigational
rather than a declarative interface
following pointers
In addition, an object query language (OQL) can be used
to retrieve objects in a declarative way
some systems (e.g. db4o) also support native queries
Faster access than RDBMS for many tasks
no join operations required
However, object databases lack a formal mathematical
foundation!
28. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 28May 19, 2017
Object-Relational Mapping (ORM)
"Automatic" mapping of object-oriented model to
relational database
developer has to deal less with persistence-related programming
Hibernate
mapping of Java types to SQL types
generates the required SQL statements behind the scene
standalone framework
Java Persistence API (JPA)
Enterprise Java Beans Standard 3.0
use annotations to define mapping
javax.persistence package
29. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 29May 19, 2017
Object-Relational Databases
The object-relational data model extends the relational
data model
introduces complex data types
object-oriented features
extended version of SQL to deal with the richer type system
Complex data types
new collection types including multisets and arrays
attributes can no longer just contain atomic values (1NF) but also
collections
nest and unnest operations for collection type attributes
ER concepts such as composite attributes or multivalued
attributes can be directly represented in the object-relational data
model
30. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 30May 19, 2017
Object-Relational Databases ...
Since SQL:1999 we can define user-defined types
Type inheritance can be used for inheriting attributes of
user-defined types
31. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 31May 19, 2017
Object vs. Object-Relational Databases
Object databases
complex datatypes
tight integration with an object-oriented programming language
(persistent programming language)
high performance
Object-relational databases
complex datatypes
powerful query languages
good protection of data from programming errors
32. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 32May 19, 2017
Homework
Study the following chapters of the
Database System Concepts book
chapter 22
- sections 22.1-22.11
- Object-based Databases
33. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 33May 19, 2017
Exercise 11
Transaction Management
34. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 34May 19, 2017
References
A. Silberschatz, H. Korth and S. Sudarshan,
Database System Concepts (Sixth Edition),
McGraw-Hill, 2010
Malcolm Atkinson, François Bancilhon, David DeWitt,
Klaus Dittrich, David Maier and Stanley Zdonik,
The Object-Oriented Database System Manifesto, 1989
Seven Databases in Seven Weeks: A Guide to
Modern Databases and the NoSQL Movement,
Eric Redmond and Jim Wilson, Pragmatic Book-
shelf, May, 2012, ISBN-13: 978-1934356920