Is emergence of NoSQL killed RDBMS and SQL? This slide discusses what is NoSQL and it's history. This also discusses briefly about polyglot persistence.
Multiple time frame trading analysis -brianshannon.pdf
To SQL or NoSQL, that is the question
1. To SQL or
NoSQL?
That is the
question
Krishnakumar S
D E V C O N Kochi
28th March 2015
2. History of Database Systems
1960’s : Hierarchical and Network (IMS, CODASYL etc.)
1970’s : Beginning of theory of relational model of database
1980’s : Rise of RDBMS and SQL
1990’s : Spreadsheets and MySQL; evolution of web
2000’s : Large enterprise & open source; Google & Amazon
2010’s : Emergence of NoSQL systems
2020’s : NewSQL?
CAP
theorem
3. RDBMS
Strong foundation – Relational Model
Highly Structured – rows, columns, data types
Structured Query Language - standardized
ACID properties – all or nothing
Joins – new views from relationships
4. RDBMS – Weakness
Joins – Not scalable
Transactions – Read & write operations will
be slow because of locking resources
Fixed definitions – Difficult to work with
highly variable data
Document integration – difficult create reports
based on structured & unstructured data
9. What drives to NoSQL?
Velocity Agility
Volume
Variability
10. Any existing solution?
• Data partitioning
• Replication
• Clustering
• Query distribution
• Load balancing
• Consistency/Syncing
• Latency/Concurrency
• Network bottle neck
• Multiple data centers
• Distributed backups
• Node failures
• Voting algorithms for failure detection
• Administration of many systems
• Monitoring
RDBMS is scalable only if designed & administered correctly (Period)
11. NoSQL! What is in a name?
1998 :
• Carlo Strozzi developed a open-source relational database “Strozzi NoSQL”
• Database stores tables as ASCII files; tuples as tab separated values
• It doesn’t use SQL as query language – so given the name “NoSQL”
• Instead it used UNIX shell script and pipeline to retrieve data
Irony! A relational database is named as NoSQL!
2009 :
• Johan Oskarsson organized a meetup of people developing open-source,
distributed, non relational databases on June 11, 2009
• He wanted a simple twitter hash tag for the meetup; quick, memorable, & helps
Google search
• Eric Evans come up with the name NoSQL, for the single meetup
12. NoSQL! What is in a name?
• The name is negative
• The name does not describe the purpose of their meet up
• The name does not define the new database system
• But; the name just satisfied the twitter tag! And caught on like wildfire
What does it stands for!
• “No to SQL”? Not exactly
• “Not Only SQL”? Then what about SQL Server, Oracle etc.?
The answer is “You don’t worry about what it stands for!
13. NoSQL
• The NoSQL is a movement
• The NoSQL is an ecosystem for future database technology
• NoSQL is an accidental neologism. There is no prescriptive definition
Characteristics of NoSQL
• Not using the relational model
• Running well in clusters
• Open-source
• Built for 21st century web estates
• Schemaless
The most important result of NoSQL movement is; Polyglot Persistence
Theorems Ahead!
14. Brewer’s CAP theorem
• In 2000, Eric Brewer presented the CAP principle as conjuncture
• In 2002, Seth Gilbert & Nancy Lynch published a formal proof and rendered
the principle as CAP theorem
There are three essential system requirements necessary for the successful
design, implementation, and deployment of applications in distributed
computing
1. Consistency
2. Availability
3. Partition Tolerance
In majority of instances, a distributed system can only guarantee any two, not
all three
15. Brewer’s CAP theorem
Consistency refers to whether a system operates fully or not. Do all nodes
within a cluster see all the data they are supposed to? This is the same
idea presented in ACID
Availability means just as it sounds. Is the given service or system available
when requested? Does each request get a response outside of failure or
success?
Partition Tolerance represents the fact that a given system continues to
operate even under circumstances of data loss or system failure. A single
node failure should not cause the entire system to collapse.
In large scale, distributed, non relational systems, they need availability and
partition tolerance, so consistency suffers and ACID collapses
16. Brewer’s CAP theorem
Pick any two
CA AP
CP
RDBMS’s
SQL Server
Oracle
MySQL etc.
Availability
Each client can always read and
write
Consistency
All clients always have he same
view
of data
Partition
Tolerance
The system works well despite
physical
Network partitions
Bigtable, MongoDB, BerkleyDB, MemcacheDB, Hbase etc
Cassandra
CouchDB
Dynamo
Voldemort
17. BASE
Basically Available : states that the system does guarantee the availability
of the data as regards CAP Theorem; there will be a response to any
request. But, that response could still be ‘failure’ to obtain the requested
data or the data may be in an inconsistent or changing state
Soft state : The state of the system could change over time, so even during
times without input there may be changes going on due to ‘eventual
consistency,’ thus the state of the system is always ‘soft.’
Eventual Consistency : The system will eventually become consistent once
it stops receiving input. The data will propagate to everywhere it should
sooner or later, but the system will continue to receive input and is not
checking the consistency of every transaction before it moves onto the
next one
It’s OK to use stale data; it’s OK to give approximate answers.
18. NoSQL Data Architecture Patterns
Key-Value
key value
key value
key value
key value
Column-Family
Graph Document
19. Key-Value
Key-Value
key value
key value
key value
key value
Keys used to access opaque
blobs of data
Values can contain any type of
data (images, video)
Pros: scalable, simple API (put,
get, delete)
Cons: no way to query based on
the content of the value
20. Column family
Column-Family
Key includes a row, column
family and column name
Store versioned blobs in one
large table
Queries can be done on rows,
column families and column
names
Pros: Good scale out
Cons: Can not query blob
content, row and column designs
are critical
21. Graph Store
Graph Data is stored in a series of nodes
and properties
Queries are really graph traversals
Ideal when relationships between
data is key:
e.g. social networks
Pros: fast network search, works
with public linked data sets
Cons: Poor scalability when graphs
don't fit into RAM, specialized query
language
22. Document Store
Document Data stored in nested hierarchies
Logical data remains stored
together as a unit
Any item in the document can be
queried
Pros: No object-relational
mapping layer, ideal for search
Cons: Complex to implement,
incompatible with SQL
25. Polyglot Persistence
Different database systems are designed to solve different problems
Using single database engine for all the requirements leads to non-
performant solutions
The solution is polyglot persistence; a hybrid approach to data
persistence
27. References
• Making Sense of NoSQL – Dan McCreary and Ann Kelly
• NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot
Persistence - Pramod J. Sadalage and Martin Fowler
• Data Access for Highly-Scalable Solutions: Using SQL, NoSQL, and
Polyglot Persistence - John Sharp, Douglas McMurtry, Andrew
Oakley, Mani Subramanian, Hanzhong Zhang