SlideShare ist ein Scribd-Unternehmen logo
1 von 37
NoSQL Architecture
Overview
OVER 400 CUSTOMERS
TRUST THEIR DATABASES
TO RDX
RDX Insights Series Presentation – Introduction to NoSQL Architectures
Chris Foot
VP DB Technologies
RDX
March 23, 2017Video recording of this
presentation can be
found on RDX’s YouTube
Channel:
https://lnkd.in/g96cbUV
www. .com
NoSQL Product Offering Analysis
www. .com
NoSQL Competitors
Document
Graph
Key-Value
• Pairs a key with a complex data structure
called a document
• Records not required to have uniform
structure
• MongoDB, CouchDB, DynamoDB,
Couchbas, MarkLogic
• Record can have billions of columns
• Tables are collections of columns, rather than rows
• Column names and record keys are not fixed
• Cassandra, Bigtable, Hbase, Accumulo
• All items are stored as an indexed key-value pairs
• Redis, Riak, Memcached, Oracle NoSQL, DynamoDB
• Stores nodes (data elements) with
relationships
• Interconnected, strong relationships
• Neo4j, Datastax Cassandra, Titan,
ArangoDB
IN-MEMORY DB
Persistent DB
Wide-Column
• Operations performed in
memory
• Lightening fast read/write
• Often use Key-Value or Wide-
Column as data store
• Redis, Memcached, Oracle
Times 10, SAP HANA
In-Memory
www. .com
RDBMS and NoSQL will Merge
NoSQL vendors desire to
increase market share will
drive them to compete directly
with relational product
manufacturers
Vendors will add RDBMS-like
functionality that allows their
product to be more widely
adopted. Those that don’t will
quickly lose market share to
those that do
The larger relational vendors
will attempt to co-opt any
NoSQL technology that
challenges their dominant
role in the industry
As they identify offerings as
tangible threats, their
strategy will be to ensure
that the technologies used
by those vendors become a
component of, not a
replacement for, their
traditional database
products
Relational
DBMS
NoSQL
DBMS
General Purpose
DBMS
www. .com
Unstructured Data Examples
www. .com
NoSQL Adoption Drivers - Modern Applications
Single View
Sensor Data
Biometrics
Radiology
Videos, Images
Weather Data
Catalogs
Content Management
Geospatial
Social Data
• IDC: Unstructured data is growing at the rate of 62% per
year
• IDC: By 2022, 93% of all data in the digital universe will
be unstructured
• Gartner: Data volume is set to grow 800% over the next
5 years and 80% of it will reside as unstructured data
www. .com
NoSQL architectures leverage horizontal scalability to cost
effectively handle large volumes of data and/or users
NoSQL Adoption Drivers – Horizontal Scaling
Horizontal
Vertical
www. .com
Relational and NoSQL Parallel Adoption Drivers
Hierarchical and Network Databases – IMS and CODASYL/Network
Logical and physical layers entirely dependent upon each other. Both data storage and data navigation were rigidly
defined. Programs were required to follow the prebuilt paths to navigate through the stored data
Early Releases of DB2
• Flexibility
• Separate logical and physical layers - schema
• Set vs row processing
• Ease of use
• SQL language was intuitive
• Poor performance
• Crude locking, transaction management and
limited features
Early Releases of Oracle
• Flexibility
• Easy to use
• Lower Total Cost of Ownership (support, product costs)
• Low cost commodity hardware (as in it didn’t need a
mainframe)
• Crude locking, transaction management and limited
features
Early Releases of NoSQL
• Flexibility
• Easy to use
• Lower Total Cost of Ownership (support, product
costs)
• Faster application development
• Architected to scale horizontally for availability
and performance
• Crude locking, transaction management and
limited features
“Niche implementations, crude technically, will never become popular, no features - no future”.
Pretty much…. “Your career is going to be toast.”
www. .com
ACID vs BASE
ACID
Relational
BASE
NoSQL Distributed Tradeoff
 Atomicity
All operations in a single
transaction succeed or fail as a
group. No partial operations
 Consistency
The database is never in an
inconsistent state
 Isolated
Transactions do not interfere
with another. Contentious data
access is handled by the
database to make the
transactions appear to run
sequentially
 Durable
Transactions are permanent in
the presence of failures
 Basic Availability
The system is able to tolerate a
partial failure (loss of a single
node for example)
 Soft State
The state of the system is in flux
and may change over time
because of bullet below
 Eventual Consistency
As data is being added to the
system, consistency is gradually
replicated across all nodes.
Data may be inconsistent in the
short term but will eventually
become consistent
 The application is given a
greater responsibility for data
management in systems that
don’t follow ACID
 Leads to complex application
code when strong consistency is
needed across replicated nodes
www. .com
CAP Theorem Distributed Systems – Pick C or A
Consistency
A
C P Partition
Tolerance
Availability
CP:
MongoDB, Redis, BigTable,
Hbase, MemcacheDB
CA: Oracle, SQL
Server, MySQL…
AP: Cassandra, Riak,
CouchDB, DynamoDB
USER
USER
USER
USER
USER USER
USER USER
SAME DATA
HERE
SAME DATA
HERE
Consistency:
All clients see the same
data
AVAILABLE AVAILABLE
Availability:
All clients can read and
write
Partition Tolerance:
System continues to work
during network partitions
www. .com
CAP Theorem
Allow Updates Allow Updates
INCONSISTENT
Synchronizing Data
Partition
Allow Updates Prevent Updates
UNAVAILABLE
Synchronizing Data
Partition
AVAILABILITY
CONSISTENCY
System is available, but
data is inconsistent
due to lack of
synchronization
Data is in synch
because only one node
allows updates. The
system is unavailable
to one group of users
www. .com
Why Did RDX Choose MongoDB?
Business Drivers
• Industry analyst evaluations
• Customer use cases and recommendations
• Largest commercial investment in any database
vendor
• Popularity
• 10 million+ software downloads
• 1,000 partners
• 2,000 customers
• 1/3 of the Fortune 100
• Robust training available
• Strong open source community
• Excellent partnership support
Technical Drivers
• Wide scope of potential application
• Low TCO
• Combines capabilities of relational databases with next
generation NoSQL technologies
• Schemaless, flexible data model
• Nonstructured data support
• Easily accommodates large data volumes
• Rich query capability
• Strong, tunable consistency model
• Elastic, horizontal scalability
• Easily configurable system resiliency
• Vendor provided database support
Craigslist, New York Times, Verizon, Viacom, AstraZeneca, MTV, Google, Genetech, Adobe, GAP, Cisco, MetLife,
Facebook, Expedia, Ebay, Edmunds, Washington Post, Aol, ADP, Forbes, Intuit, The Weather Channel, Carfax…..
www. .com
MongoDB Features
• Multiple storage engines
 WiredTiger
 InMemory
 Encrypted
 Third-Party
 MMAPV1
• Indexing
 Enforce uniqueness on user defined and Object ID fields
 Partial – Only indexed if they meet filter expression
 Sparse – Only indexed if field is populated
 Compound – Multiple column index
 Multikey – Indexes on arrays
 TTL – Allow documents to be purged based on time
 Text Search
 Hash – Creates random values
• Easily ingests large, nonstructured data elements
 Decomposes large video files, images into smaller
components and rebuilds them using pointer during
retrieval
 Document validation rules enforce data validity
 Enforce checks on document structure, data types, data
ranges and the presence of mandatory fields
 DBAs can apply data governance standards, while
developers maintain the benefits of a flexible schema
• Automatic failover with no application
redirects to new primary required
• Driver support for all common
programming languages
• Data compression
• Tunable consistency model
• BI Connector allows MongoDB to act as
data source for SQL based BI analytics
platforms
• LDAP, Kerberos, Windows AD, x.509
authentication
• DML, DCL, DDL audit logging
• FIPs compliant and data encryption
www. .com
Rigid vs Dynamic Schemas
Relational Tables and Rows
• Schema design performed before application
is developed
• Schema must be built before inserting data
• Enforces data structure – rows can not
deviate from the predefined schema
• Schema design based on storage
• Schema alterations require database and
application changes to be coordinated
• Normalization process is critical
MongoDB Collections and Documents
• No schema required before inserting data
• Schema is created as each document is inserted
• Documents in collection can have a different
schema (sets of fields)
• Schema design based on application usage
• Schemas can evolve iteratively during application
life-cycle
• Higher dependency on application layer for data
integrity
• Normalization not as important
Predescribed Self-Describing
www. .com
Flexible Schemas
Insurance Policy Document Collection
AUTO LIFE HOME EQUIPMENT CYBER
Collections do *not* enforce document structure.
You do not predefine document schemas. The schema is defined during
initial document insertion. Data types are selected by MongoDB based on
data being inserted
www. .com
Agile Development Features
• Schemaless architecture
• Flexible data model = easy schema
changes
• Drivers for all major programming
languages
• Ability to store all types of data
FASTER
BETTER
LEANER
• Flexible JSON document format
• Rich content Using GridFS
• Simple system provisioning
• Scale vertically and horizontally
• Pluggable storage engines
• Easy replication setup
www. .com
Automatic Sharding
Logical Logical
Primary
Physical
Server
Secondary
Physical Server
Secondary
Physical Server
Primary
Physical
Server
Secondary
Physical Server
Secondary
Physical Server
Automatic Data Distribution - Sharded Cluster
Shard 1 Shard 2
Primary
Physical
Server
Secondary
Physical Server
Secondary
Physical Server
Horizontally
Scalable
Cluster metadata
includes data location,
shards, # of chunks….
Replicas ReplicasReplicas
Shard N
www. .com
Replica Sets
BI Connector
MULTI DATACENTER
CLUSTER
Site 2
Sec 1.1
Display
Sec 2.1
Batch
Sec 3.1
Batch
Site 2 – Display and Batch
Priority 1
Votes 1 Site 3
Sec 1.2
Batch
Sec 2.2
Batch
Sec 3.2
Delayed
Site 3 – Batch and DR
Priority 0
Votes 1
Config
Server
Config
Server
Priority 1
Votes 1
Config
Server
Collection
Primary 1
Display
Primary 2
Display
Primary 3
Display
www. .com
Global Data Distribution
Read Global/Write Local
Primary
Secondary
Secondary
www. .com
Videos and Images – Unstructured Data
• Store files larger than 16MB i.e. video, images
 Load chunks without reading entire file into memory
• Atomically sync files with their metadata
• Shard and distribute around the cluster
doc.jpg
doc.jpg
(meta data) doc.jpg
(1)
GridFS
API
fs.files fs.chunks
Driver
www. .com
Cassandra
Cassandra is a highly scalable, eventually consistent,
distributed, structured key-value store. Cassandra
brings together the distributed systems technologies
from Dynamo and the log-structured storage engine
from Google's BigTable.
.
Apple, Sony, Walmart, Comcast, eBay, GitHub, GoDaddy, Hulu, Instagram, Intuit, Netflix, Reddit, Weather Channel,
CERN, Constant Contact, Macy’s, Expedia
• Fault Tolerant
• Data Durability
• Data Center Aware
• High Performance
• Decentralized
• Horizontal Scalability
• Elastic Architecture
• Apple - 75,000 nodes storing over 10 PB of data
• Netflix - 2,500 nodes, 420 TB, over 1 trillion
requests per day
• Chinese search engine Easou - 270 nodes, 300 TB,
over 800 million requests per day
• eBay - 100 nodes, 250 TB
.
BIG Data High # Concurrent Users
www. .com
Datastax/Cassandra Features
• Multi-model storage
• Key Value NoSQL
• Tabular NoSQL
• JSON/Document NoSQL
• Graph
• Very high “linear” scalability
• Automatic data distribution amongst nodes
• Multi-data center replication
• CQL Access Language
• SQL “like” language
• Tunable consistency model
• Strong node fault detection and recovery
• Writes to Memtables in RAM
• Materialized views
• Advanced replication allows multiple clusters to
be synchronized
• OpsCenter – browser based administration and
monitoring toolset
• Driver support for all common programming
languages
• In-Memory option allows parts (or all) of
database to reside in RAM
• Tiered storage
• Interface to Spark (in-memory)
• Data stream processing
• Access to Spark SQL (more robust than CQL)
• Security
• End to end encryption
• AD, LDAP, Kerberos support
www. .com
Cassandra
Cluster
Cassandra/DataStax
REPLICATION
Node 1
Primary
Node 2
Copy
of 1
Node 2
Copy of
1
Node 3
Copy of
1
Node 4Node 4
West Coast
Datacenter
East Coast
Datacenter
REPLICATION
Node 3
Copy of
1
Node 1
Primary
www. .com
Cassandra/DataStax
• Keyspace - A keyspace is a logical container for data tables and indexes. It can be compared to an Oracle
Schema or a SQL Server database. Keyspaces define how the data is replicated amongst the nodes
• Table - A collection of columns fetched by a row. Columns are ordered by name
• Column - Supports different data types and consists of a name, value and timestamp
• Primary Key - Uniquely identifies a row occurrence in a Cassandra table
• Partition Key - The partition key identifies which node in the cluster will store the row. It is responsible for data
distribution across the nodes
• Clustering Key - Orders rows based on the column’s value
• Data Center - A collection of related nodes in a Cassandra Cluster
• Snitch - Determines which datacenters and racks nodes belong to. They inform Cassandra about the network
topology so that requests are routed efficiently and allows Cassandra to distribute replicas by grouping machines
into datacenters and racks
• Partitioner - A hashing algorithm that generates a hash value token from the partition key. The token is the
value used to distribute the data across the various nodes in the cluster. The partitioner’s goal is to assign equal
portions of data to each node. Each node in a Cassandra cluster becomes responsible for storing a range of
hash values
• Gossip - A peer-to-peer communications mechanism that identifies and shares node information (state and
location) to all nodes in the Cassandra cluster
www. .com
Cassandra/DataStax Decentralized Storage
Partitioners are hashing
algorithms that generate
tokens from partition keys
Each node in a Cassandra
cluster is responsible for a
range of tokens (hash
keys)
First column of primary
key becomes partition
key
Can use multiple
columns as primary key,
partition key
Also able to cluster
columns to order data
PRIMARY KEY (emp_id)
PRIMARY KEY (emp_id, dept_id) WITH
CLUSTERING ORDER BY (dept_loc))
PRIMARY KEY (emp_id, dept_id)
Partitioner
TOKEN RANGE
0 0-25
26 26-50
51 51-75
76 76-100
All nodes can
accept reads
and writes
Distributes data
amongst nodes
www. .com
Cassandra/DataStax Tunable Consistency
Write Consistency
Read Consistency
Read and Write consistency levels
are different than row replication
settings.
Replication factor will affect how
many copies are eventually
written vs tunable consistency for
fast client response
Level Description
ALL Returns the record after all replicas have responded. The read operation
will fail if a replica does not respond.
QUORUM Returns the record after a quorum of replicas from all datacenters has
responded.
LOCAL_QUORUM Returns the record after a quorum of replicas in the current datacenter as
the coordinator has reported. Avoids latency of inter-datacenter
communication.
ONE Returns a response from the closest replica, as determined by the snitch.
By default, a read repair runs in the background to make the other replicas
consistent.
TWO Returns the most recent data from two of the closest replicas.
THREE Returns the most recent data from three of the closest replicas.
LOCAL_ONE Returns a response from the closest replica in the local datacenter.
SERIAL Allows reading the current (and possibly uncommitted) state of data without
proposing a new addition or update. If a SERIAL read finds an uncommitted
transaction in progress, it will commit the transaction as part of the read.
LOCAL_SERIAL Same as SERIAL, but confined to the datacenter. Similar to
LOCAL_QUORUM.
Consistency
Latency
Level Description
ALL A write must be written to the commit log and memtable on all replica nodes in the cluster for that partition.
EACH_QUORU
M
Strong consistency. A write must be written to the commit log and memtable on a quorum of replica nodes in
each datacenter.
QUORUM A write must be written to the commit log and memtable on a quorum of replica nodes across all datacenters.
LOCAL_QUOR
UM
Strong consistency. A write must be written to the commit log and memtable on a quorum of replica nodes in
the same datacenter as the coordinator. Avoids latency of inter-datacenter communication.
ONE A write must be written to the commit log and memtable of at least one replica node.
TWO A write must be written to the commit log and memtable of at least two replica nodes.
THREE A write must be written to the commit log and memtable of at least three replica nodes.
LOCAL_ONE A write must be sent to, and successfully acknowledged by, at least one replica node in the local datacenter.
ANY A write must be written to at least one node. If all replica nodes for the given partition key are down, the write
can still succeed after a hinted handoff has been written. If all replica nodes are down at write time, an ANY
write is not readable until the replica nodes for that partition have recovered.
www. .com
Relational vs Cassandra NoSQL – Data Modeling
In relational systems, administrators model the data
In Cassandra, administrators design schemas that are
based on query patterns
www. .com
Cassandra/DataStax Modeling
Cassandra – YOU DESIGN SCHEMAS
BASED ON QUERY PATTERNS THEN DATA
RELATIONSHIPS
Maximization of Denormalization
Cassandra/Datastax
recommendation = 1
table per query
You are prebuilding
answers to unique
requests for data!
Overcome data duplication
by leveraging extremely
fast write performance
• Determine queries accessing data FIRST, then design the
data models
• No concept of foreign keys
• No concept of join operations
• Prepare data for fast reads by writing pre-built result sets
• Attempt to minimize reads from multiple partitions
• Cassandra prefers INSERTs over UPDATEs and
DELETEs
www. .com
Redis
• In-Memory, Key-Value Database
• Dumps to disk is configurable
• Database handles swapping
• All data can live in memory but key caching is required
• 1 Million Keys = 160 MEGs
• 10 Million Keys – 1.6 GIGs
• ATOMIC Operations
• Master-slave replication
• Scalability
• Redundancy
• Slaves
• Can’t respond to queries during initial synch
• Automatically reconnect and resynch after outage
• Journal file
• Every write is logged
• Commands replayed when server is started
• Configurable – Can choose between 2 settings
• Eventually consistent - “Speed”
• Immediately consistent - Safety”
Tumblr, Uber, Coinbase, Flickr, Hulu, Craigslist, Alibaba, Digg
www. .com
Redis Features
• Not a replacement for relational databases but can
be used as their “front end”
• Lightening fast read and write access
• Single threaded architecture – does not exploit
multiple CPU/Cores
• Does not support unit-of-work roll back
• Optimistic locking – data contention (race) will
cause transaction failure
• Redis Clusters
• Not able to guarantee strong consistency
amongst nodes
• Able to add/remove nodes in a Redis cluster
• Partitioning allows data to be split and stored in
multiple Redis instances. Each instance contains a
subset of keys
• Range partitioning
• Hash partitioning
• Can be used as a data store or a pure cache
• When used as a Cache, can be configured as a
LRU (gets rid of old data to make way for new)
• Sensor data
• Redis RDB persistence and backups
• Redis snapshots at specified time intervals
= a full database backup
• Move RDB files to other storage
• Write operations in memory can be logged
to Append Only Files (AOF)
• Appendfsynch parameter allows
administrator to configure log writes
www. .com
Neo4j
Walmart, Ebay, Cisco, Adobe, CrunchBase, Pitney Bowes, CareerBuilder, TomTom, ConocoPhillips, National
Geographic, Century Link, Glassdoor, Zephyr Health, Gamesys, Telenor
• Highly scalable, native graph database
• Enterprise and community editions
• Store, manage, analyze, and use data within the context of
connections, like the circles and lines drawn on whiteboards
• More than 1 Million downloads
• Understanding data relationships is also key to understanding
dependencies, uncovering cascading impacts, and predicting
behavior
• Access language allows you to traverse relationships in a
much more simple, and easy to understand, way than
relational SQL
SQL – Dozens of lines
Cypher – Couple of lines
www. .com
Neo4j Features
• Provides graphical browser utility to
better visualize relationships
• Import data from different sources
using rules
• Cypher is another SQL “like” language
• Properties are key-value pairs
• Nodes with properties (node is data, not server)
• Named relationships with properties
• Key – string
• Value – individual data types or array
• Path – connecting relationships, which you traverse using an API
• Schemaless
• Easily able to store unstructured data
• Easily able to store large volumes of data
• Full support for ACID Transactions
• Full indexing capabilities
• Constraint capabilities
• Unique
• Exists (like a Foreign Key with no parent
delete rules)
Find Sushi Restaurants in New York that my friends like
www. .com
Neo4j Graph Examples
Master Data
Management
Graph Based
Search
Recommendations
www. .com
NoSQL vs Relational
Strengths Weaknesses
• ACID
• Transaction management
• Sophisticated locking and latching
• Power of the SQL Language – Two-phase commits,
foreign key constraints, joins, subqueries,
integrated aggregations, complex business rule
enforcement
• Product maturity
• Robust utilities
• Vendor support
• Most vendors have robust cloud strategies
• Strong third-party software provider adoption
(applications, tools and utilities)
• Product purchase/support costs
• Scalability can be complex and expensive
• Data normalization can impact performance
• Schemas are not flexible
• Not all data fits neatly into rows and columns
• Geographic distribution can be complex
Relational DBMS
www. .com
NoSQL vs Relational
NoSQL DBMS
Strengths Weaknesses
• Dynamic schema flexibility
• Faster development times
• Total cost of ownership
• Easily stores semi, non and fully structured
data
• Horizontal and vertical scalability
• Geographic replication and data distribution
• Easier to achieve high performance accessing
large volumes of data
• Custom tailor environment to data storage
and processing needs
• Cost effective clustering
• Crude transaction management and locking
mechanisms (BASE vs ACID)
• Limited cloud offerings
• Vendor support (or lack thereof)
• Data is often denormalized leading to duplicate
updates
• Weak access languages
• No inherent data integrity enforcement
mechanisms
www. .com
NoSQL vs Relational
Transactions – COMPLEX Transactions – SIMPLE
Data – STRUCTURED AND STATIC Data – FULL/SEMI/NON STRUCTURED DYNAMIC
Data Velocity – MODERATE TO HIGH Data Velocity – HIGH to ASTRONOMICAL
Data Locations – FEWER THE BETTER Data Locations – MANY LOCATIONS
Data Volumes – MAINTAIN BY PURGING Data Volumes – RETAIN FOREVER
Data Availability – CLUSTER, LOG SHIPPING Data Availability – INHERENT ARCHITECTURE
Data Performance – FOCUS ON READS Data Performance – FOCUS ON READS/WRITES
Relational
DBMS
NoSQL
DBMS
www. .com
Questions and Additional Information
cfoot@rdx.com
Next Month’s Presentation – Evaluating and Selecting Cloud
Database Management Systems
The RDX Report
Is NoSQL the Natural Progression of DB Technology?, Cloud’s Hidden Impact on
IT Support, SQL Server 2016 Licensing Best Practices, The Rise of Corporate
Ransomware
LinkedIn
Selecting Cloud DBMS, NoSQL Architectures, Database Security Series,
Improving Customer Service
20YEARS OF
SERVICE DELIVERY
EXPERIENCE

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 
Sharding
ShardingSharding
Sharding
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 
Sql vs NoSQL-Presentation
 Sql vs NoSQL-Presentation Sql vs NoSQL-Presentation
Sql vs NoSQL-Presentation
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
 
NoSql
NoSqlNoSql
NoSql
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentials
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data Pipeline
 
Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational Databases
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 

Andere mochten auch

دليل الخير لمشاريع النجاة الخيرية في جمهورية باكستان
دليل الخير لمشاريع النجاة الخيرية في جمهورية باكستاندليل الخير لمشاريع النجاة الخيرية في جمهورية باكستان
دليل الخير لمشاريع النجاة الخيرية في جمهورية باكستان
جمعية النجاة الخيرية
 
Revista Yunke nº5 Órgano de Expresión de la Sección Sindical del S.A.T. en...
Revista Yunke nº5 Órgano de Expresión de la Sección Sindical del S.A.T. en...Revista Yunke nº5 Órgano de Expresión de la Sección Sindical del S.A.T. en...
Revista Yunke nº5 Órgano de Expresión de la Sección Sindical del S.A.T. en...
Benito Medina
 

Andere mochten auch (20)

ドローン農業最前線
ドローン農業最前線ドローン農業最前線
ドローン農業最前線
 
Goをカンストさせる話
Goをカンストさせる話Goをカンストさせる話
Goをカンストさせる話
 
A Beginners Guide to noSQL
A Beginners Guide to noSQLA Beginners Guide to noSQL
A Beginners Guide to noSQL
 
Memory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine LearningMemory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine Learning
 
Who Will Win the Database Wars?
Who Will Win the Database Wars?Who Will Win the Database Wars?
Who Will Win the Database Wars?
 
フラットなPHPからフレームワークへ
フラットなPHPからフレームワークへフラットなPHPからフレームワークへ
フラットなPHPからフレームワークへ
 
دليل الخير لمشاريع النجاة الخيرية في جمهورية باكستان
دليل الخير لمشاريع النجاة الخيرية في جمهورية باكستاندليل الخير لمشاريع النجاة الخيرية في جمهورية باكستان
دليل الخير لمشاريع النجاة الخيرية في جمهورية باكستان
 
Presentatie Digitale transformatie in de zorg
Presentatie Digitale transformatie in de zorgPresentatie Digitale transformatie in de zorg
Presentatie Digitale transformatie in de zorg
 
Vintage macintosh computing
Vintage macintosh computingVintage macintosh computing
Vintage macintosh computing
 
Revista Yunke nº5 Órgano de Expresión de la Sección Sindical del S.A.T. en...
Revista Yunke nº5 Órgano de Expresión de la Sección Sindical del S.A.T. en...Revista Yunke nº5 Órgano de Expresión de la Sección Sindical del S.A.T. en...
Revista Yunke nº5 Órgano de Expresión de la Sección Sindical del S.A.T. en...
 
TheRealFRANKYhollywood
TheRealFRANKYhollywoodTheRealFRANKYhollywood
TheRealFRANKYhollywood
 
Buscamos ser socios estratégicos del cliente
Buscamos ser socios estratégicos del clienteBuscamos ser socios estratégicos del cliente
Buscamos ser socios estratégicos del cliente
 
Final status of RTI dated 26.03.2017 against SC
Final status of RTI dated 26.03.2017 against SCFinal status of RTI dated 26.03.2017 against SC
Final status of RTI dated 26.03.2017 against SC
 
Conferencia "No te va a gustar lo que te voy decir"
Conferencia "No te va a gustar lo que te voy decir"Conferencia "No te va a gustar lo que te voy decir"
Conferencia "No te va a gustar lo que te voy decir"
 
Vikingos
VikingosVikingos
Vikingos
 
NMI13 Marek Prchal - prvních 10 věcí, které dělám, když je problém na facebooku
NMI13 Marek Prchal - prvních 10 věcí, které dělám, když je problém na facebookuNMI13 Marek Prchal - prvních 10 věcí, které dělám, když je problém na facebooku
NMI13 Marek Prchal - prvních 10 věcí, které dělám, když je problém na facebooku
 
Los ciberriesgos y su transferencia al sector asegurador
Los ciberriesgos y su transferencia al sector aseguradorLos ciberriesgos y su transferencia al sector asegurador
Los ciberriesgos y su transferencia al sector asegurador
 
How to Containerize WebSphere Application Server Traditional, and Why You Mig...
How to Containerize WebSphere Application Server Traditional, and Why You Mig...How to Containerize WebSphere Application Server Traditional, and Why You Mig...
How to Containerize WebSphere Application Server Traditional, and Why You Mig...
 
Head Start Online: A Good Start is Half the Work
Head Start Online: A Good Start is Half the WorkHead Start Online: A Good Start is Half the Work
Head Start Online: A Good Start is Half the Work
 
What is Register Registrable of Controller and Nominee Director
What is Register Registrable of Controller and Nominee DirectorWhat is Register Registrable of Controller and Nominee Director
What is Register Registrable of Controller and Nominee Director
 

Ähnlich wie NoSQL Architecture Overview

NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
Adi Challa
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
Qian Lin
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
Felix Gessert
 

Ähnlich wie NoSQL Architecture Overview (20)

مقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيمقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربي
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
NoSql Brownbag
NoSql BrownbagNoSql Brownbag
NoSql Brownbag
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Database Technologies
Database TechnologiesDatabase Technologies
Database Technologies
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
Presentation On NoSQL Databases
Presentation On NoSQL DatabasesPresentation On NoSQL Databases
Presentation On NoSQL Databases
 
No sql databases
No sql databases No sql databases
No sql databases
 
Master.pptx
Master.pptxMaster.pptx
Master.pptx
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
 
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12
 
Database Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastDatabase Revolution - Exploratory Webcast
Database Revolution - Exploratory Webcast
 

Mehr von Christopher Foot

Mehr von Christopher Foot (10)

Cloud's Hidden Impact on IT Shops
Cloud's Hidden Impact on IT ShopsCloud's Hidden Impact on IT Shops
Cloud's Hidden Impact on IT Shops
 
Selecting a SQL Server Cloud Platform - IaaS, Amazon RDS or Azure SQL DB?
Selecting a SQL Server Cloud Platform - IaaS, Amazon RDS or Azure SQL DB?Selecting a SQL Server Cloud Platform - IaaS, Amazon RDS or Azure SQL DB?
Selecting a SQL Server Cloud Platform - IaaS, Amazon RDS or Azure SQL DB?
 
Migrating On-Premises DBs to Cloud Systems
Migrating On-Premises DBs to Cloud SystemsMigrating On-Premises DBs to Cloud Systems
Migrating On-Premises DBs to Cloud Systems
 
Introduction to Azure SQL DB
Introduction to Azure SQL DBIntroduction to Azure SQL DB
Introduction to Azure SQL DB
 
BI in the Cloud - Microsoft Power BI Overview and Demo
BI in the Cloud - Microsoft Power BI Overview and DemoBI in the Cloud - Microsoft Power BI Overview and Demo
BI in the Cloud - Microsoft Power BI Overview and Demo
 
Secrets for Successful Regulatory Compliance Projects
Secrets for Successful Regulatory Compliance ProjectsSecrets for Successful Regulatory Compliance Projects
Secrets for Successful Regulatory Compliance Projects
 
Rising Interest in Open Source Relational Databases
Rising Interest in Open Source Relational DatabasesRising Interest in Open Source Relational Databases
Rising Interest in Open Source Relational Databases
 
RDX Insights Presentation - Microsoft Business Intelligence
RDX Insights Presentation - Microsoft Business IntelligenceRDX Insights Presentation - Microsoft Business Intelligence
RDX Insights Presentation - Microsoft Business Intelligence
 
Cloud's Hidden Impact on IT Support Organizations
Cloud's Hidden Impact on IT Support OrganizationsCloud's Hidden Impact on IT Support Organizations
Cloud's Hidden Impact on IT Support Organizations
 
Evaluating Cloud Database Offerings
Evaluating Cloud Database OfferingsEvaluating Cloud Database Offerings
Evaluating Cloud Database Offerings
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

NoSQL Architecture Overview

  • 1. NoSQL Architecture Overview OVER 400 CUSTOMERS TRUST THEIR DATABASES TO RDX RDX Insights Series Presentation – Introduction to NoSQL Architectures Chris Foot VP DB Technologies RDX March 23, 2017Video recording of this presentation can be found on RDX’s YouTube Channel: https://lnkd.in/g96cbUV
  • 2. www. .com NoSQL Product Offering Analysis
  • 3. www. .com NoSQL Competitors Document Graph Key-Value • Pairs a key with a complex data structure called a document • Records not required to have uniform structure • MongoDB, CouchDB, DynamoDB, Couchbas, MarkLogic • Record can have billions of columns • Tables are collections of columns, rather than rows • Column names and record keys are not fixed • Cassandra, Bigtable, Hbase, Accumulo • All items are stored as an indexed key-value pairs • Redis, Riak, Memcached, Oracle NoSQL, DynamoDB • Stores nodes (data elements) with relationships • Interconnected, strong relationships • Neo4j, Datastax Cassandra, Titan, ArangoDB IN-MEMORY DB Persistent DB Wide-Column • Operations performed in memory • Lightening fast read/write • Often use Key-Value or Wide- Column as data store • Redis, Memcached, Oracle Times 10, SAP HANA In-Memory
  • 4. www. .com RDBMS and NoSQL will Merge NoSQL vendors desire to increase market share will drive them to compete directly with relational product manufacturers Vendors will add RDBMS-like functionality that allows their product to be more widely adopted. Those that don’t will quickly lose market share to those that do The larger relational vendors will attempt to co-opt any NoSQL technology that challenges their dominant role in the industry As they identify offerings as tangible threats, their strategy will be to ensure that the technologies used by those vendors become a component of, not a replacement for, their traditional database products Relational DBMS NoSQL DBMS General Purpose DBMS
  • 6. www. .com NoSQL Adoption Drivers - Modern Applications Single View Sensor Data Biometrics Radiology Videos, Images Weather Data Catalogs Content Management Geospatial Social Data • IDC: Unstructured data is growing at the rate of 62% per year • IDC: By 2022, 93% of all data in the digital universe will be unstructured • Gartner: Data volume is set to grow 800% over the next 5 years and 80% of it will reside as unstructured data
  • 7. www. .com NoSQL architectures leverage horizontal scalability to cost effectively handle large volumes of data and/or users NoSQL Adoption Drivers – Horizontal Scaling Horizontal Vertical
  • 8. www. .com Relational and NoSQL Parallel Adoption Drivers Hierarchical and Network Databases – IMS and CODASYL/Network Logical and physical layers entirely dependent upon each other. Both data storage and data navigation were rigidly defined. Programs were required to follow the prebuilt paths to navigate through the stored data Early Releases of DB2 • Flexibility • Separate logical and physical layers - schema • Set vs row processing • Ease of use • SQL language was intuitive • Poor performance • Crude locking, transaction management and limited features Early Releases of Oracle • Flexibility • Easy to use • Lower Total Cost of Ownership (support, product costs) • Low cost commodity hardware (as in it didn’t need a mainframe) • Crude locking, transaction management and limited features Early Releases of NoSQL • Flexibility • Easy to use • Lower Total Cost of Ownership (support, product costs) • Faster application development • Architected to scale horizontally for availability and performance • Crude locking, transaction management and limited features “Niche implementations, crude technically, will never become popular, no features - no future”. Pretty much…. “Your career is going to be toast.”
  • 9. www. .com ACID vs BASE ACID Relational BASE NoSQL Distributed Tradeoff  Atomicity All operations in a single transaction succeed or fail as a group. No partial operations  Consistency The database is never in an inconsistent state  Isolated Transactions do not interfere with another. Contentious data access is handled by the database to make the transactions appear to run sequentially  Durable Transactions are permanent in the presence of failures  Basic Availability The system is able to tolerate a partial failure (loss of a single node for example)  Soft State The state of the system is in flux and may change over time because of bullet below  Eventual Consistency As data is being added to the system, consistency is gradually replicated across all nodes. Data may be inconsistent in the short term but will eventually become consistent  The application is given a greater responsibility for data management in systems that don’t follow ACID  Leads to complex application code when strong consistency is needed across replicated nodes
  • 10. www. .com CAP Theorem Distributed Systems – Pick C or A Consistency A C P Partition Tolerance Availability CP: MongoDB, Redis, BigTable, Hbase, MemcacheDB CA: Oracle, SQL Server, MySQL… AP: Cassandra, Riak, CouchDB, DynamoDB USER USER USER USER USER USER USER USER SAME DATA HERE SAME DATA HERE Consistency: All clients see the same data AVAILABLE AVAILABLE Availability: All clients can read and write Partition Tolerance: System continues to work during network partitions
  • 11. www. .com CAP Theorem Allow Updates Allow Updates INCONSISTENT Synchronizing Data Partition Allow Updates Prevent Updates UNAVAILABLE Synchronizing Data Partition AVAILABILITY CONSISTENCY System is available, but data is inconsistent due to lack of synchronization Data is in synch because only one node allows updates. The system is unavailable to one group of users
  • 12. www. .com Why Did RDX Choose MongoDB? Business Drivers • Industry analyst evaluations • Customer use cases and recommendations • Largest commercial investment in any database vendor • Popularity • 10 million+ software downloads • 1,000 partners • 2,000 customers • 1/3 of the Fortune 100 • Robust training available • Strong open source community • Excellent partnership support Technical Drivers • Wide scope of potential application • Low TCO • Combines capabilities of relational databases with next generation NoSQL technologies • Schemaless, flexible data model • Nonstructured data support • Easily accommodates large data volumes • Rich query capability • Strong, tunable consistency model • Elastic, horizontal scalability • Easily configurable system resiliency • Vendor provided database support Craigslist, New York Times, Verizon, Viacom, AstraZeneca, MTV, Google, Genetech, Adobe, GAP, Cisco, MetLife, Facebook, Expedia, Ebay, Edmunds, Washington Post, Aol, ADP, Forbes, Intuit, The Weather Channel, Carfax…..
  • 13. www. .com MongoDB Features • Multiple storage engines  WiredTiger  InMemory  Encrypted  Third-Party  MMAPV1 • Indexing  Enforce uniqueness on user defined and Object ID fields  Partial – Only indexed if they meet filter expression  Sparse – Only indexed if field is populated  Compound – Multiple column index  Multikey – Indexes on arrays  TTL – Allow documents to be purged based on time  Text Search  Hash – Creates random values • Easily ingests large, nonstructured data elements  Decomposes large video files, images into smaller components and rebuilds them using pointer during retrieval  Document validation rules enforce data validity  Enforce checks on document structure, data types, data ranges and the presence of mandatory fields  DBAs can apply data governance standards, while developers maintain the benefits of a flexible schema • Automatic failover with no application redirects to new primary required • Driver support for all common programming languages • Data compression • Tunable consistency model • BI Connector allows MongoDB to act as data source for SQL based BI analytics platforms • LDAP, Kerberos, Windows AD, x.509 authentication • DML, DCL, DDL audit logging • FIPs compliant and data encryption
  • 14. www. .com Rigid vs Dynamic Schemas Relational Tables and Rows • Schema design performed before application is developed • Schema must be built before inserting data • Enforces data structure – rows can not deviate from the predefined schema • Schema design based on storage • Schema alterations require database and application changes to be coordinated • Normalization process is critical MongoDB Collections and Documents • No schema required before inserting data • Schema is created as each document is inserted • Documents in collection can have a different schema (sets of fields) • Schema design based on application usage • Schemas can evolve iteratively during application life-cycle • Higher dependency on application layer for data integrity • Normalization not as important Predescribed Self-Describing
  • 15. www. .com Flexible Schemas Insurance Policy Document Collection AUTO LIFE HOME EQUIPMENT CYBER Collections do *not* enforce document structure. You do not predefine document schemas. The schema is defined during initial document insertion. Data types are selected by MongoDB based on data being inserted
  • 16. www. .com Agile Development Features • Schemaless architecture • Flexible data model = easy schema changes • Drivers for all major programming languages • Ability to store all types of data FASTER BETTER LEANER • Flexible JSON document format • Rich content Using GridFS • Simple system provisioning • Scale vertically and horizontally • Pluggable storage engines • Easy replication setup
  • 17. www. .com Automatic Sharding Logical Logical Primary Physical Server Secondary Physical Server Secondary Physical Server Primary Physical Server Secondary Physical Server Secondary Physical Server Automatic Data Distribution - Sharded Cluster Shard 1 Shard 2 Primary Physical Server Secondary Physical Server Secondary Physical Server Horizontally Scalable Cluster metadata includes data location, shards, # of chunks…. Replicas ReplicasReplicas Shard N
  • 18. www. .com Replica Sets BI Connector MULTI DATACENTER CLUSTER Site 2 Sec 1.1 Display Sec 2.1 Batch Sec 3.1 Batch Site 2 – Display and Batch Priority 1 Votes 1 Site 3 Sec 1.2 Batch Sec 2.2 Batch Sec 3.2 Delayed Site 3 – Batch and DR Priority 0 Votes 1 Config Server Config Server Priority 1 Votes 1 Config Server Collection Primary 1 Display Primary 2 Display Primary 3 Display
  • 19. www. .com Global Data Distribution Read Global/Write Local Primary Secondary Secondary
  • 20. www. .com Videos and Images – Unstructured Data • Store files larger than 16MB i.e. video, images  Load chunks without reading entire file into memory • Atomically sync files with their metadata • Shard and distribute around the cluster doc.jpg doc.jpg (meta data) doc.jpg (1) GridFS API fs.files fs.chunks Driver
  • 21. www. .com Cassandra Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store. Cassandra brings together the distributed systems technologies from Dynamo and the log-structured storage engine from Google's BigTable. . Apple, Sony, Walmart, Comcast, eBay, GitHub, GoDaddy, Hulu, Instagram, Intuit, Netflix, Reddit, Weather Channel, CERN, Constant Contact, Macy’s, Expedia • Fault Tolerant • Data Durability • Data Center Aware • High Performance • Decentralized • Horizontal Scalability • Elastic Architecture • Apple - 75,000 nodes storing over 10 PB of data • Netflix - 2,500 nodes, 420 TB, over 1 trillion requests per day • Chinese search engine Easou - 270 nodes, 300 TB, over 800 million requests per day • eBay - 100 nodes, 250 TB . BIG Data High # Concurrent Users
  • 22. www. .com Datastax/Cassandra Features • Multi-model storage • Key Value NoSQL • Tabular NoSQL • JSON/Document NoSQL • Graph • Very high “linear” scalability • Automatic data distribution amongst nodes • Multi-data center replication • CQL Access Language • SQL “like” language • Tunable consistency model • Strong node fault detection and recovery • Writes to Memtables in RAM • Materialized views • Advanced replication allows multiple clusters to be synchronized • OpsCenter – browser based administration and monitoring toolset • Driver support for all common programming languages • In-Memory option allows parts (or all) of database to reside in RAM • Tiered storage • Interface to Spark (in-memory) • Data stream processing • Access to Spark SQL (more robust than CQL) • Security • End to end encryption • AD, LDAP, Kerberos support
  • 23. www. .com Cassandra Cluster Cassandra/DataStax REPLICATION Node 1 Primary Node 2 Copy of 1 Node 2 Copy of 1 Node 3 Copy of 1 Node 4Node 4 West Coast Datacenter East Coast Datacenter REPLICATION Node 3 Copy of 1 Node 1 Primary
  • 24. www. .com Cassandra/DataStax • Keyspace - A keyspace is a logical container for data tables and indexes. It can be compared to an Oracle Schema or a SQL Server database. Keyspaces define how the data is replicated amongst the nodes • Table - A collection of columns fetched by a row. Columns are ordered by name • Column - Supports different data types and consists of a name, value and timestamp • Primary Key - Uniquely identifies a row occurrence in a Cassandra table • Partition Key - The partition key identifies which node in the cluster will store the row. It is responsible for data distribution across the nodes • Clustering Key - Orders rows based on the column’s value • Data Center - A collection of related nodes in a Cassandra Cluster • Snitch - Determines which datacenters and racks nodes belong to. They inform Cassandra about the network topology so that requests are routed efficiently and allows Cassandra to distribute replicas by grouping machines into datacenters and racks • Partitioner - A hashing algorithm that generates a hash value token from the partition key. The token is the value used to distribute the data across the various nodes in the cluster. The partitioner’s goal is to assign equal portions of data to each node. Each node in a Cassandra cluster becomes responsible for storing a range of hash values • Gossip - A peer-to-peer communications mechanism that identifies and shares node information (state and location) to all nodes in the Cassandra cluster
  • 25. www. .com Cassandra/DataStax Decentralized Storage Partitioners are hashing algorithms that generate tokens from partition keys Each node in a Cassandra cluster is responsible for a range of tokens (hash keys) First column of primary key becomes partition key Can use multiple columns as primary key, partition key Also able to cluster columns to order data PRIMARY KEY (emp_id) PRIMARY KEY (emp_id, dept_id) WITH CLUSTERING ORDER BY (dept_loc)) PRIMARY KEY (emp_id, dept_id) Partitioner TOKEN RANGE 0 0-25 26 26-50 51 51-75 76 76-100 All nodes can accept reads and writes Distributes data amongst nodes
  • 26. www. .com Cassandra/DataStax Tunable Consistency Write Consistency Read Consistency Read and Write consistency levels are different than row replication settings. Replication factor will affect how many copies are eventually written vs tunable consistency for fast client response Level Description ALL Returns the record after all replicas have responded. The read operation will fail if a replica does not respond. QUORUM Returns the record after a quorum of replicas from all datacenters has responded. LOCAL_QUORUM Returns the record after a quorum of replicas in the current datacenter as the coordinator has reported. Avoids latency of inter-datacenter communication. ONE Returns a response from the closest replica, as determined by the snitch. By default, a read repair runs in the background to make the other replicas consistent. TWO Returns the most recent data from two of the closest replicas. THREE Returns the most recent data from three of the closest replicas. LOCAL_ONE Returns a response from the closest replica in the local datacenter. SERIAL Allows reading the current (and possibly uncommitted) state of data without proposing a new addition or update. If a SERIAL read finds an uncommitted transaction in progress, it will commit the transaction as part of the read. LOCAL_SERIAL Same as SERIAL, but confined to the datacenter. Similar to LOCAL_QUORUM. Consistency Latency Level Description ALL A write must be written to the commit log and memtable on all replica nodes in the cluster for that partition. EACH_QUORU M Strong consistency. A write must be written to the commit log and memtable on a quorum of replica nodes in each datacenter. QUORUM A write must be written to the commit log and memtable on a quorum of replica nodes across all datacenters. LOCAL_QUOR UM Strong consistency. A write must be written to the commit log and memtable on a quorum of replica nodes in the same datacenter as the coordinator. Avoids latency of inter-datacenter communication. ONE A write must be written to the commit log and memtable of at least one replica node. TWO A write must be written to the commit log and memtable of at least two replica nodes. THREE A write must be written to the commit log and memtable of at least three replica nodes. LOCAL_ONE A write must be sent to, and successfully acknowledged by, at least one replica node in the local datacenter. ANY A write must be written to at least one node. If all replica nodes for the given partition key are down, the write can still succeed after a hinted handoff has been written. If all replica nodes are down at write time, an ANY write is not readable until the replica nodes for that partition have recovered.
  • 27. www. .com Relational vs Cassandra NoSQL – Data Modeling In relational systems, administrators model the data In Cassandra, administrators design schemas that are based on query patterns
  • 28. www. .com Cassandra/DataStax Modeling Cassandra – YOU DESIGN SCHEMAS BASED ON QUERY PATTERNS THEN DATA RELATIONSHIPS Maximization of Denormalization Cassandra/Datastax recommendation = 1 table per query You are prebuilding answers to unique requests for data! Overcome data duplication by leveraging extremely fast write performance • Determine queries accessing data FIRST, then design the data models • No concept of foreign keys • No concept of join operations • Prepare data for fast reads by writing pre-built result sets • Attempt to minimize reads from multiple partitions • Cassandra prefers INSERTs over UPDATEs and DELETEs
  • 29. www. .com Redis • In-Memory, Key-Value Database • Dumps to disk is configurable • Database handles swapping • All data can live in memory but key caching is required • 1 Million Keys = 160 MEGs • 10 Million Keys – 1.6 GIGs • ATOMIC Operations • Master-slave replication • Scalability • Redundancy • Slaves • Can’t respond to queries during initial synch • Automatically reconnect and resynch after outage • Journal file • Every write is logged • Commands replayed when server is started • Configurable – Can choose between 2 settings • Eventually consistent - “Speed” • Immediately consistent - Safety” Tumblr, Uber, Coinbase, Flickr, Hulu, Craigslist, Alibaba, Digg
  • 30. www. .com Redis Features • Not a replacement for relational databases but can be used as their “front end” • Lightening fast read and write access • Single threaded architecture – does not exploit multiple CPU/Cores • Does not support unit-of-work roll back • Optimistic locking – data contention (race) will cause transaction failure • Redis Clusters • Not able to guarantee strong consistency amongst nodes • Able to add/remove nodes in a Redis cluster • Partitioning allows data to be split and stored in multiple Redis instances. Each instance contains a subset of keys • Range partitioning • Hash partitioning • Can be used as a data store or a pure cache • When used as a Cache, can be configured as a LRU (gets rid of old data to make way for new) • Sensor data • Redis RDB persistence and backups • Redis snapshots at specified time intervals = a full database backup • Move RDB files to other storage • Write operations in memory can be logged to Append Only Files (AOF) • Appendfsynch parameter allows administrator to configure log writes
  • 31. www. .com Neo4j Walmart, Ebay, Cisco, Adobe, CrunchBase, Pitney Bowes, CareerBuilder, TomTom, ConocoPhillips, National Geographic, Century Link, Glassdoor, Zephyr Health, Gamesys, Telenor • Highly scalable, native graph database • Enterprise and community editions • Store, manage, analyze, and use data within the context of connections, like the circles and lines drawn on whiteboards • More than 1 Million downloads • Understanding data relationships is also key to understanding dependencies, uncovering cascading impacts, and predicting behavior • Access language allows you to traverse relationships in a much more simple, and easy to understand, way than relational SQL SQL – Dozens of lines Cypher – Couple of lines
  • 32. www. .com Neo4j Features • Provides graphical browser utility to better visualize relationships • Import data from different sources using rules • Cypher is another SQL “like” language • Properties are key-value pairs • Nodes with properties (node is data, not server) • Named relationships with properties • Key – string • Value – individual data types or array • Path – connecting relationships, which you traverse using an API • Schemaless • Easily able to store unstructured data • Easily able to store large volumes of data • Full support for ACID Transactions • Full indexing capabilities • Constraint capabilities • Unique • Exists (like a Foreign Key with no parent delete rules) Find Sushi Restaurants in New York that my friends like
  • 33. www. .com Neo4j Graph Examples Master Data Management Graph Based Search Recommendations
  • 34. www. .com NoSQL vs Relational Strengths Weaknesses • ACID • Transaction management • Sophisticated locking and latching • Power of the SQL Language – Two-phase commits, foreign key constraints, joins, subqueries, integrated aggregations, complex business rule enforcement • Product maturity • Robust utilities • Vendor support • Most vendors have robust cloud strategies • Strong third-party software provider adoption (applications, tools and utilities) • Product purchase/support costs • Scalability can be complex and expensive • Data normalization can impact performance • Schemas are not flexible • Not all data fits neatly into rows and columns • Geographic distribution can be complex Relational DBMS
  • 35. www. .com NoSQL vs Relational NoSQL DBMS Strengths Weaknesses • Dynamic schema flexibility • Faster development times • Total cost of ownership • Easily stores semi, non and fully structured data • Horizontal and vertical scalability • Geographic replication and data distribution • Easier to achieve high performance accessing large volumes of data • Custom tailor environment to data storage and processing needs • Cost effective clustering • Crude transaction management and locking mechanisms (BASE vs ACID) • Limited cloud offerings • Vendor support (or lack thereof) • Data is often denormalized leading to duplicate updates • Weak access languages • No inherent data integrity enforcement mechanisms
  • 36. www. .com NoSQL vs Relational Transactions – COMPLEX Transactions – SIMPLE Data – STRUCTURED AND STATIC Data – FULL/SEMI/NON STRUCTURED DYNAMIC Data Velocity – MODERATE TO HIGH Data Velocity – HIGH to ASTRONOMICAL Data Locations – FEWER THE BETTER Data Locations – MANY LOCATIONS Data Volumes – MAINTAIN BY PURGING Data Volumes – RETAIN FOREVER Data Availability – CLUSTER, LOG SHIPPING Data Availability – INHERENT ARCHITECTURE Data Performance – FOCUS ON READS Data Performance – FOCUS ON READS/WRITES Relational DBMS NoSQL DBMS
  • 37. www. .com Questions and Additional Information cfoot@rdx.com Next Month’s Presentation – Evaluating and Selecting Cloud Database Management Systems The RDX Report Is NoSQL the Natural Progression of DB Technology?, Cloud’s Hidden Impact on IT Support, SQL Server 2016 Licensing Best Practices, The Rise of Corporate Ransomware LinkedIn Selecting Cloud DBMS, NoSQL Architectures, Database Security Series, Improving Customer Service 20YEARS OF SERVICE DELIVERY EXPERIENCE