1. Polyglot Persistence
Choosing the right persistence
option for the task at hand
SoftUni Team
Stamo Petkov
Software University
http://softuni.bg
2. Стамо Петков
Информационно обслужване АД
Отдел „Майкрософт технологии“
s.g.petkov@is-bg.com
stamo.petkov@gmail.com
https://github.com/stamo
http://www.stamopetkov.eu
http://bg.linkedin.com/in/stamopetkov
https://www.facebook.com/stamo.petkov
@stamo_petkov
Who am I?
3. 3
1. What does “Polyglot Persistence” means?
2. Why do we need it?
3. What options do we have?
RDBMS
Document stores
Key – value pairs
BLOB storage
Table storage
Graph DBs
Message Queues
4. Conclusions
Table of Contents
5. 5
Polyglot Persistence is all about choosing the right persistence
option for the task at hand.
Scott Leberknight, 2008
Gains popularity in 2011
with Martin Fowler’s
diagram of “Retailers
Web Application”
Making more sense
with rapidly emerging
cloud technologies
The origins of Polyglot Persistence
7. 7
Every two days now we create as much information as we did
from the dawn of civilization up until 2003. That’s something like
five Exabytes of data. Eric Schmidt, 4th of Aug 2010
Data production
8. 8
One minute on the Internet
Learn more at
http://www.domo.com
9. 9
Scalability and Performance
Vertical scaling – Pros and Cons
Horizontal scaling – Pros and Cons
Persistence storages scalability
11. 11
Oracle, SQL Server, Azure SQL, PostgreSQL,
MySQL
Relational databases have been around for
over four decades, and that means something
in the IT world.
Well known language – SQL was developed in
early 70s and was standardised in 1986
Simplicity of relational model.
Solid theoretical basis and normalization rules
Great expertise
Relational Database Management Systems
12. 12
NoSQL to be read Not Only SQL
It’s not SQL slayer, but SQL companion
Mostly open source
Horizontal scalability
Schema - less
MapReduce
Very fast for adding new data and for simple operations/queries.
CAP theorem
NoSQL
13. 13
Riak, Redis, Berkeley DB, Oracle NoSQL DB
Storing associative arrays (Dictionary, Hash)
Treat the data as a single opaque collection which may have
different fields for every record
Can store in RAM or HDD / SSD
Use far less memory in comparison with RDBMS
Ideal for cache or temporary storage
Complex consistency model
Key – value pairs
14. 14
Document stores
MongoDB, DocumentDB, CouchDB…
Storing documents in JSON, XML, YAML, BSON, etc.
REST API
Designed for horizontal scaling and
Big Data processing
MapReduce framework
JavaScript friendly, allow full stack
JavaScript development
Rapid development
15. 15
Apache Cassandra, Azure Table Storage, Apache Hbase…
Store semi-structured data that’s highly available. Flexible datasets
Designed for Big Data – store petabytes of data at reasonable cost
No single point of failure – every node in the cluster has the same role
MapReduce support
Read and write throughput both increase linearly as new machines are
added, with no downtime or interruption to applications
Fault – tolerant – supports replication, failover and disaster recovery
Table storage
16. 16
Neo4j, Titan, ArangoDB, Apache
Giraph…
Everything is stored in form of either an
edge, a node or an attribute
Each node and edge can have any
number of attributes
Facebook used Giraph with some
performance improvements to analyze
one trillion edges using 200 machines in
4 minutes
Use cases: Real-time recommendations,
Social networks, Graph-based search
Graph DBs
17. 17
MongoDb (GridFS), Azure BLOB, Azure File Storage
Store petabytes of highly available data
Serve content to web or mobile applications
Power big data analytics
Stream video and audio
Perform secure backup and disaster recovery
Cost – effective
Binary Large Object storage
18. 18
RabbitMQ, IronMQ, Azure Queue Storage
Asynchronous communications protocol
Can rise events or be directly accessed by clients
Messages may be kept in memory, written to disk, or even
committed to a DBMS
Allows creating of decoupled components
Azure Queue Storage can assign resources dynamically based on
queue length.
Message Queues
19. 19
Rank
DBMS Database Model
Score
Oct
2015
Sep
2015
Oct
2014
Oct
2015
Sep
2015
Oct
2014
1. 1. 1. Oracle Relational DBMS 1466.95 +3.58 -4.95
2. 2. 2. MySQL Relational DBMS 1278.96 +1.21 +15.99
3. 3. 3. Microsoft SQL Server Relational DBMS 1123.23 +25.40 -96.37
4. 4. 5. MongoDB Document store 293.27 -7.30 +52.86
5. 5. 4. PostgreSQL Relational DBMS 282.13 -4.05 +24.41
6. 6. 6. DB2 Relational DBMS 206.81 -2.33 -0.86
7. 7. 7. Microsoft Access Relational DBMS 141.83 -4.17 +0.19
8. 8. 10. Cassandra Wide column store 129.01 +1.41 +43.30
9. 9. 8. SQLite Relational DBMS 102.67 -4.99 +7.71
10. 10. 12. Redis Key-value store 98.80 -1.86 +19.42
The DB-Engines Ranking
20. 20
If your data is relational in nature use RDBMS
If your data is relatively constant in size and fit in tables use
RDBMS
Don’t be afraid to experiment with new persistence options, but
think twice before putting them in production
Try to use in-memory data stores for temporary data
Prefer BLOB storages when you are dealing with large files
Consider using some kind of cloud infrastructure
Conclusions
22. License
This course (slides, examples, labs, videos, homework, etc.)
is licensed under the "Creative Commons Attribution-
NonCommercial-ShareAlike 4.0 International" license
22
Attribution: this work may contain portions from
23. Free Trainings @ Software University
Software University Foundation – softuni.org
Software University – High-Quality Education,
Profession and Job for Software Developers
softuni.bg
Software University @ Facebook
facebook.com/SoftwareUniversity
Software University @ YouTube
youtube.com/SoftwareUniversity
Software University Forums – forum.softuni.bg
Hinweis der Redaktion
Pros
Less power consumption than running multiple servers
Cooling costs are less than scaling horizontally
Generally less challenging to implement
Less licensing costs
Cons
PRICE, PRICE, PRICE
Greater risk of hardware failure causing bigger outages
generally severe vendor lock-in and limited upgradeability in the future
Pros
Much cheaper than scaling vertically
Easier to run fault-tolerance
Easy to upgrade
Cons
More licensing fees
Bigger footprint in the Data Center
Higher utility cost (Electricity and cooling)
"A Relational Model of Data for Large Shared Data Banks" in 1970 Edgar F. Codd
In June 1979, Relational Software, Inc. introduced the first commercially available implementation of SQL, Oracle V2 (Version2) for VAX computers
In theoretical computer science, the CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:[1][2][3]
Consistency (all nodes see the same data at the same time)
Availability (a guarantee that every request receives a response about whether it succeeded or failed)
Partition tolerance (the system continues to operate despite arbitrary partitioning due to network failures)
Oracle NoSQL Database build on top of Berkeley DB. In addition to that it adds a layer of services for use in distributed environments to provide a distributed, highly available key/value storage, suited for large-volume, latency-sensitive applications. Latest version of Oracle DB adds Table structure
MongoDB don’t have native REST API
DocumentDB currently doesn’t support mapreduce
Stages of MapReduce – Map, Shuffle, Reduce
File Storage - Fully managed file shares that use the standard SMB 3.0 protocol.
Share data across on-premises and cloud servers
Migrate file share-based applications to the cloud with no code changes