SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
September, 2014 
NoSQL – A Quick Tour
Objectives What is NoSQL How is data growing ? Challenges What’s the solution ? NoSQL Features and Types Eventual Consistency High-level Overview of some popular No-SQL DB NoSQL – Not mandatory Popular Jargons Useful Links
What is NoSQL Abbreviation of “Not an SQL”, any data source which doesn’t come under SQL category A new way to store and retrieve data (specially sparse and unstructured) in modern day high-volume real-time web traffic, batch processing, and analytics Not an alternative for RDBMS but a parallel concept A “Shared Nothing” architecture as opposed to “Shared Architecture” in RDBMS (Philosophy of Shared Nothing (NA) architecture - A shared nothing architecture (SN) is a distributed computing architecture in which each node is independent and self-sufficient, and there is no single point of contention across the system. More specifically, none of the nodes share memory or disk storage.). The term was first coined by Michael Stonebraker at University of California at Berkeley in his 1986 paper “The Case for Shared Nothing.” Can be tagged with technologies which doesn’t use SQL and relational mapping of data Promotes huge storage of data and efficient retrieval, supports normal CRUD operations Not a stringent follower of ACID due to its inherent nature Works with “Big Data” for supporting high volume processing of data Comes with varied flavor of products often coined due to research at social networking giants like Twitter, Facebook, LinkedIn, Google, Yahoo etc. Data can be stored in four basic databases like key/value pair, column-family data, document, and graph
What is NoSQL Continued.. Mostly these stores or databases don’t expect pre-defined schemas or normalization and consistency 
Primary purpose of NoSQL is to have fast and efficient storing and processing of constantly growing data without the constraint of relation database and provide the scalable architecture to support future growth of data without compromising the performance First generation and second generation of NoSQL 
Mantra of NoSQL : “Getting an answer quickly is more important than getting a correct answer” If you can’t split it, you can’t scale it 
—Randy Shoup, Distinguished Architect, eBay
How is data growing ?
How is data growing ? Continued.. 
There are two concepts Big User and Big Data along with Cloud Computing Big User – Number of users accessing web is growing rapidly accessing several kind of data such Personal information, social data like tweets, likes, blogs, click streams, comments, follows, or geo location data, log files, system generated data, user generated data, sensor-generated data etc. The growing number of users can’t be predicted specially the advancement of mobile space Big Data – Source of data has increased tremendously but the actual data increased exponentially. Data is not in-terms of gigabyte but in more higher number (tera/penta/exa/zetta/yotta bytes) Cloud Computing – More and more Data is stored in the cloud and access to data should be fast 
The number of concurrent users skyrocketed as applications increasingly became accessible via the web (and later on mobile devices) 
 The amount of data collected and processed soared as it became easier and increasingly valuable to capture all kinds of data 
 The amount of unstructured or semi-structured data exploded and its use became integral to the value and richness of applications
Challenges 
Shared 
Application Server Layer to Traditional RDBMS 
Add CPU 
Add RAM 
Vertical 
Shared Disk 
Oh No! I am loaded 
Application Server Layer to NoSQL DB 
Commodity Server 
Commodity Server 
Commodity Server 
Commodity Server 
Commodity Server
Challenges Continued.. 
Need to scale out (i.e. sharding) and without compromising latency and performance (keeping low latency and high throughout) 
Need to avoid scale up with costly and complex high-end servers 
Should be easily scaled out with low cost commodity servers without any application downtime 
Data is not only structured but majority of the social networking data is unstructured. So, there is a need to support schema less data structures 
There are many cases where transaction is not of prime concern and costly and heavy writes are not needed 
Should support easy and quick replication as well as failover
What’s the solution ? 
Real time and batch processing for analytical and operational data by second generation of NoSQL like Couchbase 
The picture depict a scenario where both batch processing as well as real-time data is being handled by a combination of Hadoop and Couchbase.
NoSQL Features and Types 
Early adopter: Google’s Bigtable is used for throughput sensitive batch processing to latency based online queries. Used in Google Earth, Finance, Orkut, Analytics etc. This is based on the concept of column family data store. 
There are mainly 3 types of NoSQL databases as mentioned below. 
Document Database 
MongoDB and Couchbase (An amalgamated version of CouchDB and Membase) 
Couchebase started as Apache incubator project and then continued by Couchbase Inc 
Implemented in Erlang and C, and Javascript execution environment 
Used by Apple, BBC, and many others.
NoSQL Features and Types Continued.. 
Key/Value pair and Eventual Consistency datastore Redis, Membase, Voldemort, and Cassandra Cassandra supports key/value pair and eventual consistency (based on Amazon’s DynamoDB). Developed by Facebook and implemented in Java Clients are available in Java, PHP, Python, Grails, Ruby, .NET. Used by Facebook, Twitter, Paddy Power, GitHub, and many others. Redis supports key-value pair and a distributed in-memory as well as persistent storage system Started by Salvatore Sanfilippo in 2009 as an independent project Implemented in C Clients are available in PHP, Java, Ruby, Python, C++ etc. Used by Craigslist Sorted Column Family datastore HBase (developed based on Google's bigtable) Created by Powerset and donated to Apache. Implemented in Java. Used by Facebook, Yahoo! and many others. Access method is JRuby, Java, Thrift, REST, ProtoBuf etc. Graph Database Neo4J, FlockDB Neo4J was developed in 2003 and implemented in Java Accessed through REST and Gremlin interfaces. Used by Box.com, ThoughtWorks There are other types of NOSQL databases available as well. 
XML Database 
Object Database 
Grid and Cloud Database 
Multimodel Database
Eventual Consistency 
Consistency – Data should be consistent across user transactions and each client can have the same view of the data 
Availability – System should be available always and user can read as well as write always 
Partition – System should be partition-tolerant and work in a distributed environment 
Eventual Consistency implies BASE (Basically Available, Soft state, Eventual Consistency) 
Brewer’s CAP Theorem 
Succinctly put, Brewer’s Theorem states that in systems that are distributed or scaled out it’s impossible to achieve all three (Consistency, Availability, and Partition Tolerance) at the same time. You must make trade-offs and sacrifice at least one in favor of the other two. 
ACID 
BASE 
Strong consistency 
Weak consistency – stale data OK 
Isolation 
Availability first 
Focus on “commit” 
Best effort 
Nested transactions 
Approximate answers OK 
Availability? 
Aggressive (optimistic) 
Conservative (pessimistic) 
Simpler! 
Difficult evolution (e. g. schema) 
Faster 
Easier evolution
Eventual Consistency – HAPPY GO LUCKY An example of booking flights for two close friends for attending a conference There is only one ticket left 
Data Center 
(Asia) 
Data Center (US) 
Anand 
Scott 
DATA SYNC-UP 
GOOD FRIENDS 
bookflighttickets.com 
SYNC DOWN 
Will the tickets be booked ? 
Booking done
Brewer’s Theorem was conjectured by Eric Brewer and presented by him (www.cs.berkeley.edu/ 
~brewer/cs262b-2004/PODC-keynote.pdf) as a keynote address at the ACM Symposium on the 
Principles of Distributed Computing (PODC) in 2000. 
Brewer’s ideas on CAP developed as a part of his work at UC Berkeley and at Inktomi. 
A look at Distributed Update and Replication 
Replication 
Eventual Consistency – Distributed Environment 
A 
V0 (V1) 
B 
V0(V1) 
Writes 
Reads
Theorem of Two options and Three Alternatives SYNCHRONOUS SYNCHRONIZATION 
A Single Transaction, Consistency is of prime importance ASYNCHRONOUS SYNCHRONIZATION 
Can be achieved but if synchronization fails, no way to know when it happened 
Option 1 : Let the Consistency take the precedence and Availability may be compromised considering the system supports partition-tolerance (CP) 
Option 2: Let the Availability take the precedence and Consistency may be compromised 
considering the system supports partition-tolerance (AP) 
Option 3: Let both Consistency and Availability take the precedence and the system is not partition-tolerant (AC)
A SMALL TALK COUCHBASE (DOCUMENT) REDIS (KEY-VALUE PAIR) CASSANDRA (COLUMN-FAMILY AND EVENTUAL CONSISTENCY) NEO4J (GRAPH) 
HOW WILL WE GO ? 
A SMALL OVERVIEW 
BASIC FEATURES
Features of Couchbase 
A high level overview of Couchbase is mentioned below. 
1.Stores data in a JSON or binary format in the data store (called a document) 2. Supports basic CRUD operations like get, set, delete etc. Uses MVCC to continue with non-blocking IO for read/writes. 3. Provides a strong layer of caching of data in memory and automatically persists data in file system to support strong failover mechanism 4. Uses concepts like Buckets to group physical resources in a cluster logically with options like setting memory for each bucket as well as replication rule. 5. Each buckets divided in 24 logical partitions called vBuckets and used a cluster map to locate document in a cluster 6. vBuckets the lowest denominator to locate a document in a cluster through hash identifier for each document 7. Asynchronous storing of data in disk and replication data to other servers in a cluster as well as across data-center through XDCR feature 8. Very efficient and easy management of a distributed cluster (horizontal scaling), also known as “Scale Out” 9. Supports integration with Memcache protocol seamlessly 10. Rebalancing of data (documents) through change in the clustering (adding or removing nodes from a cluster and updating cluster map with updated location of documents) 11. Database index like feature through Views for faster access of indexed data 12. Highly secured access to Couchbase server through SASL mechanism 13. Provides the support for both optimistic (through compare and swap - CAS mechanism) and pessimistic locking (through explicit locking) 14. An asynchronous listener based approach for manipulating data through Future interface
How does Couchbase work ?
More Overview 
Replication Process 
Smart Client writes the data in server object-managed Cache 
Documents is submitted to intra-cluster replication queue for replicating to other servers 
The document is persisted to disk write queue asynchronously to write in the disk. The data is written to disk once disk queue flushes off 
The data is replicated to other clusters through XDCR once the data in persisted in the disk and eventually indexed for searching. Major Components Data Manager 
Object Managed Cache 
server warm-up, checkpoint, TAP replicator, backfill, resident item ratio, NRU, ejection, item pager) 
Storage Engine 
compaction) 
Query Engine 
Index can be created and queried for JSON documents 
Secondary indexes are created through View and Design documents Cluster Manager (orchestration node) 
The Hearbeat watchdog 
The process Manager 
The configuration manager
Features of REDIS 
A high level overview of Redis is mentioned below. 
1.Started in 2009, REDIS (Remote Dictionary Service) is a distributed key-value pair database. A shared memory system for very fast read and write capabilities. Fundamentally, an advanced version of Memcache database. 2. Creator of Redis - Salvatore Sanfilippo termed as “Data Structure Store” capable of storing complex data structure as keys like Set, List, Hash, Sorted Set, bitmaps etc. apart from normal strings. 3. Apart from a data structure store, it also works as blocking queue (Stack) and Publish-Subscribe system 4. A powerful command-line-interface (CLI) and rich API for the clients. 5. An expired based policy can be set for each key-value pair in order to let the list grow unbounded 6. Provides an option to save data in the disk, an unusual case for any key-value system which primarily operates over memory. There is the facility to take the snapshot in an interval based on some criteria like number of changes for keys etc. 7. An additional protection of data through Append-only file for each writes to save from crashing of the server 8. By default, Redis doesn’t provide a good way to handle security of its own. So, its better to use firewall on SSH to protect the secure data. 9. Supports a Master-Slave replication mechanism but not an multi-master scaling and fail-over intelligent system 10. A client managed cluster support through consistent-hashing rather than server side 11. Provides a probabilistic determination of non-existence of data through Bloom filters (managing sequence of bits) 12. Uses a special data structure called “Dynamic String Structure” (SDS) to store all the data internally 13. Uses its own Virtual Memory management to locate data in the disk
Replication in REDIS 
Master Node 
Slave Node 
(Read-Write) 
Slave Node 
(Read-Write) 
Slave Node 
(Read-only) 
Repl 
i 
cation 
Repl 
i 
cation 
Repl 
i 
cation 
Disk 
Non-Blocking Synchronization 
Redis Server 
Redis Client/Smart Client 
Periodic Save 
Redis uses hash slots to bucketing data elements across nodes, so that data is sharded across nodes for fault-tolerant. If any new node has been added or removed from the cluster, Redis maintains the linkage of data from old to new node through its internal node-to-node communication (ping-pong in Redis’s term) based on binary protocol. Under the hood, Redis also uses a Gossip-based protocol among the nodes to track the status of each node and take necessary actions in case some node went down or not responding. Redis has a smart client who can decide to connect to the right node in the cluster to find the data instead of client to any node arbitrarily. 
Gossip 
Gossip 
A 
B 
D 
C 
E
An Example of REDIS 
Redis can be used for managing data where caching based simple key-value pair along with complex querying facility be given based on the keys. 
Analytics 
Caching 
Search Engine 
Messaging 
Broker 
1.Get a list of cities under a zip code around the world 
2.Get a list of books based on ISBN code where each book is associated with multiple “tagging” words 
3.Build a sub-system which can browse through a catalog system to find the product data 
4.Use a broker to collect data for multiple sources like managing centralized log content Not so good use cases 1. Every bit of data is very precious 2. Multiple master-master setup and failover needed 3. ACID transaction is highly desired 4. Relational data is of prime importance
Features of APACHE CASSANDRA 
A high level overview of Cassandra is mentioned below. If I had asked people what they wanted, they would have said faster horses. – Henry Ford 
1.Influenced by Amazon’s DynamoDB for its distributed design and Google’s Bigtable for the data model, Cassandra is a hybrid datastore supporting both column-family as well as key-value data with Eventual Consistency 2. Cassandra was developed by Facebook and it’s a sparse multi-dimensional hashtable 3. Supports secondary indexes apart from the index on the row-key 4. Supports powerful command-line-interface (CLI) as well as Thrift based multi-lingual drive type communication techniques for the clients. 
5.Runs on decentralized mode of keeping each node identical, not like a master-slave topology 6. A tunable consistent system instead of eventual consistency (A always writeble system) 
7.Uses Gossip protocol with hinted hands-off to perform peer-to-peer communication across nodes 
8.Uses Anti-entropy to manage data synchronization (replication) across multiple nodes with the updated version 
9.Uses compaction to merge large datafiles for better management of spaces 
10.Uses Bloom filter to find if any element is available in map 
11.Uses a concept called “Tombstone” for soft delete. The data is physically deleted during compaction. 
12.Uses “Staged Event-Driven Architecture” (SEDA) for highly efficient parallel processing 
13.Uses the three separate processes (commit log, memtable, and SSTable) to store and manage data during write operation 
14.Uses a concept called “Read Repair” to update outdated values in any node
CASSANDRA – Column Family 
Suppose A customer has a personal information as well as address. So, two column families can be created, one for personal data and the other for address. 
In column family store, data is identified by the row-key. The difference from a RDBMS is that each row can have its own column family data. In case, there are null values for some columns data is not stored there unlike RDBMS tables which consume additional space.
CASSANDRA – An Example 
Cassandra is a popular datastore for many popular large scale web applications. The following criteria describes some of the important use cases for Cassandra. 
High volume writes like tweets from Twitter or comments from Facebook 
Don’t need strong consistency 
High throughput for Writes 
Consistency can be controlled
Features of Neo4J 
A high level overview of Neo4J is mentioned below. 
1.A Graph database (mathematical modeling) for supporting relational information (relationship ) across multiple entities. Developed by Neo technologies. 
2.It is built on the concept of nodes, relationship, parameters (key/value pair), and labels 
3.A proprietary query named “Cypher” for performing CRUD operations 
4.A very high-performing NoSQL database for storing and retrieving connected data 
5.Graph is a way to maintain multi-dimensional relation among entities 
6.Highly applicable in social networking applications like social graphs, recommendation etc. 
7.The Neo4J site provides a rich REPL (Read-eval- print loop) web interface for running queries as well as performing administrative works. 
8.It is ACID compliant as well as provides high-availability and master-slave replication across multiple nodes 
9.Provides easy client interface through REST and Gremlin 
10.Provides fast-look up through Lucene 
Sachin 
Grapes 
Gaurav 
Java 
Bikram 
friend 
likes 
friend 
likes 
eats
NoSQL – Not mandatory 
NoSQL is not a replacement of SQL 
Generally NoSQL is not fit for applications which need strong consistency 
Correctness of data is more important than availability of data 
Transactional context is important than analytical processing 
Data is structured and maintained through object relational hierarchy 
Need to support Legacy database Future of Databases The future of databases lies with the amalgamation of relational and NoSQL databases based on the need. Pramod J. Sadalage and Martin Fowler mentioned the concept of Polygot Persistence in their famous book “NoSQL Distilled”.
Popular Jargons 
The following terms are some of the most common and important jargons used in the NoSQL world. 
Sharding or Horizontal Scaling 
Quorum 
Gossip Protocol 
Hinted Hands-off 
Read Repair 
Consistent Hashing 
Merkle Tree
Useful Links 
http://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf 
Professional NoSQL by Shashank Tiwari 
NoSQL Databases by Christof Strauch 
Couchbase Server Under the Hood from Couchbase Inc. 
http://www.slideshare.net/Muratakal/rdbms-vs-nosql-15797058 
http://www.nosql-database.org/ 
http://www.youtube.com/watch?v=MmL9Lq6WbSY 
http://incubator.apache.org/thrift/ 
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html 
http://www.youtube.com/watch?v=uMxZ4RI6sCQ 
http://planetcassandra.org/apache-cassandra-use-cases/ 
http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf 
Seven Databases in Seven Weeks by Eric Redmond and Jim R. Wilson

Weitere ähnliche Inhalte

Was ist angesagt?

Comparison between rdbms and nosql
Comparison between rdbms and nosqlComparison between rdbms and nosql
Comparison between rdbms and nosqlbharati k
 
Deep semantic understanding
Deep semantic understandingDeep semantic understanding
Deep semantic understandingsidra ali
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systemselliando dias
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?Venu Anuganti
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAijfcstjournal
 
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...✔ Eric David Benari, PMP
 
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsWhat Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsTodd Hoff
 
cassandra
cassandracassandra
cassandraAkash R
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
 
Vargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbtVargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbtGenoveva Vargas-Solar
 

Was ist angesagt? (20)

RDBMS vs NoSQL
RDBMS vs NoSQLRDBMS vs NoSQL
RDBMS vs NoSQL
 
Polyglot Persistence
Polyglot Persistence Polyglot Persistence
Polyglot Persistence
 
Consistency in NoSQL
Consistency in NoSQLConsistency in NoSQL
Consistency in NoSQL
 
Nosql
NosqlNosql
Nosql
 
Datastores
DatastoresDatastores
Datastores
 
Comparison between rdbms and nosql
Comparison between rdbms and nosqlComparison between rdbms and nosql
Comparison between rdbms and nosql
 
Deep semantic understanding
Deep semantic understandingDeep semantic understanding
Deep semantic understanding
 
Datastores
DatastoresDatastores
Datastores
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
 
Queues, Pools and Caches paper
Queues, Pools and Caches paperQueues, Pools and Caches paper
Queues, Pools and Caches paper
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
 
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
 
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsWhat Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
 
cassandra
cassandracassandra
cassandra
 
Android project (1)
Android project (1)Android project (1)
Android project (1)
 
Unit 3 MongDB
Unit 3 MongDBUnit 3 MongDB
Unit 3 MongDB
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Vargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbtVargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbt
 

Andere mochten auch

Program setara d1 teknologi informasi dan komunikasi
Program setara d1 teknologi informasi dan komunikasiProgram setara d1 teknologi informasi dan komunikasi
Program setara d1 teknologi informasi dan komunikasiHedy Ramadhan
 
Situation Analysis - MyRelo FYP
Situation Analysis - MyRelo FYPSituation Analysis - MyRelo FYP
Situation Analysis - MyRelo FYPweiting1243
 
2014 ftth council ap the indigenous connectivity gap - gill
2014 ftth council ap   the indigenous connectivity gap - gill2014 ftth council ap   the indigenous connectivity gap - gill
2014 ftth council ap the indigenous connectivity gap - gilliwhglobal
 
Vineet Malu-Resume
Vineet Malu-ResumeVineet Malu-Resume
Vineet Malu-ResumeVineet Malu
 
parque eolico punta colorada
parque eolico punta coloradaparque eolico punta colorada
parque eolico punta coloradapepetrueno2k
 
Www.icomold.com
Www.icomold.comWww.icomold.com
Www.icomold.comgarycrity
 
Accurate and Efficient Secured Dynamic Multi-keyword Ranked Search
Accurate and Efficient Secured Dynamic Multi-keyword Ranked SearchAccurate and Efficient Secured Dynamic Multi-keyword Ranked Search
Accurate and Efficient Secured Dynamic Multi-keyword Ranked SearchDakshineshwar Swain
 
Penimbangan BB dan Pengukuran TB
Penimbangan BB dan Pengukuran TBPenimbangan BB dan Pengukuran TB
Penimbangan BB dan Pengukuran TBRyaniegizi
 
Penggunaan KMS
Penggunaan KMSPenggunaan KMS
Penggunaan KMSRyaniegizi
 

Andere mochten auch (15)

Dot net
Dot netDot net
Dot net
 
Program setara d1 teknologi informasi dan komunikasi
Program setara d1 teknologi informasi dan komunikasiProgram setara d1 teknologi informasi dan komunikasi
Program setara d1 teknologi informasi dan komunikasi
 
Tutorial c 3
Tutorial c 3Tutorial c 3
Tutorial c 3
 
Situation Analysis - MyRelo FYP
Situation Analysis - MyRelo FYPSituation Analysis - MyRelo FYP
Situation Analysis - MyRelo FYP
 
2014 ftth council ap the indigenous connectivity gap - gill
2014 ftth council ap   the indigenous connectivity gap - gill2014 ftth council ap   the indigenous connectivity gap - gill
2014 ftth council ap the indigenous connectivity gap - gill
 
Aerogeneradores
AerogeneradoresAerogeneradores
Aerogeneradores
 
Vineet Malu-Resume
Vineet Malu-ResumeVineet Malu-Resume
Vineet Malu-Resume
 
HBC
HBCHBC
HBC
 
parque eolico punta colorada
parque eolico punta coloradaparque eolico punta colorada
parque eolico punta colorada
 
Www.icomold.com
Www.icomold.comWww.icomold.com
Www.icomold.com
 
Accurate and Efficient Secured Dynamic Multi-keyword Ranked Search
Accurate and Efficient Secured Dynamic Multi-keyword Ranked SearchAccurate and Efficient Secured Dynamic Multi-keyword Ranked Search
Accurate and Efficient Secured Dynamic Multi-keyword Ranked Search
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
 
Engineering education, the crux of the matter
Engineering education, the crux of the matterEngineering education, the crux of the matter
Engineering education, the crux of the matter
 
Penimbangan BB dan Pengukuran TB
Penimbangan BB dan Pengukuran TBPenimbangan BB dan Pengukuran TB
Penimbangan BB dan Pengukuran TB
 
Penggunaan KMS
Penggunaan KMSPenggunaan KMS
Penggunaan KMS
 

Ähnlich wie NoSQL Basics - A Quick Tour

Nosql availability & integrity
Nosql availability & integrityNosql availability & integrity
Nosql availability & integrityFahri Firdausillah
 
05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.pptAnandKonj1
 
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'sankarapu posibabu
 
No SQL Databases.ppt
No SQL Databases.pptNo SQL Databases.ppt
No SQL Databases.pptssuser8c8fc1
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceJ Singh
 
no sql presentation
no sql presentationno sql presentation
no sql presentationchandanm2
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfajajkhan16
 
مقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيمقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيMohamed Galal
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesEditor Jacotech
 
No Sql On Social And Sematic Web
No Sql On Social And Sematic WebNo Sql On Social And Sematic Web
No Sql On Social And Sematic WebStefan Ceriu
 
NoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebNoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebStefan Prutianu
 
To SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionTo SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionKrishnakumar S
 

Ähnlich wie NoSQL Basics - A Quick Tour (20)

Nosql availability & integrity
Nosql availability & integrityNosql availability & integrity
Nosql availability & integrity
 
05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt
 
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
 
No SQL Databases.ppt
No SQL Databases.pptNo SQL Databases.ppt
No SQL Databases.ppt
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 
no sql presentation
no sql presentationno sql presentation
no sql presentation
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdf
 
مقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيمقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربي
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunities
 
No sq lv2
No sq lv2No sq lv2
No sq lv2
 
No Sql On Social And Sematic Web
No Sql On Social And Sematic WebNo Sql On Social And Sematic Web
No Sql On Social And Sematic Web
 
NoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebNoSQL On Social And Sematic Web
NoSQL On Social And Sematic Web
 
nosql.pptx
nosql.pptxnosql.pptx
nosql.pptx
 
NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
 
The NoSQL Movement
The NoSQL MovementThe NoSQL Movement
The NoSQL Movement
 
To SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionTo SQL or NoSQL, that is the question
To SQL or NoSQL, that is the question
 
No sql
No sqlNo sql
No sql
 

Kürzlich hochgeladen

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

NoSQL Basics - A Quick Tour

  • 1. September, 2014 NoSQL – A Quick Tour
  • 2. Objectives What is NoSQL How is data growing ? Challenges What’s the solution ? NoSQL Features and Types Eventual Consistency High-level Overview of some popular No-SQL DB NoSQL – Not mandatory Popular Jargons Useful Links
  • 3. What is NoSQL Abbreviation of “Not an SQL”, any data source which doesn’t come under SQL category A new way to store and retrieve data (specially sparse and unstructured) in modern day high-volume real-time web traffic, batch processing, and analytics Not an alternative for RDBMS but a parallel concept A “Shared Nothing” architecture as opposed to “Shared Architecture” in RDBMS (Philosophy of Shared Nothing (NA) architecture - A shared nothing architecture (SN) is a distributed computing architecture in which each node is independent and self-sufficient, and there is no single point of contention across the system. More specifically, none of the nodes share memory or disk storage.). The term was first coined by Michael Stonebraker at University of California at Berkeley in his 1986 paper “The Case for Shared Nothing.” Can be tagged with technologies which doesn’t use SQL and relational mapping of data Promotes huge storage of data and efficient retrieval, supports normal CRUD operations Not a stringent follower of ACID due to its inherent nature Works with “Big Data” for supporting high volume processing of data Comes with varied flavor of products often coined due to research at social networking giants like Twitter, Facebook, LinkedIn, Google, Yahoo etc. Data can be stored in four basic databases like key/value pair, column-family data, document, and graph
  • 4. What is NoSQL Continued.. Mostly these stores or databases don’t expect pre-defined schemas or normalization and consistency Primary purpose of NoSQL is to have fast and efficient storing and processing of constantly growing data without the constraint of relation database and provide the scalable architecture to support future growth of data without compromising the performance First generation and second generation of NoSQL Mantra of NoSQL : “Getting an answer quickly is more important than getting a correct answer” If you can’t split it, you can’t scale it —Randy Shoup, Distinguished Architect, eBay
  • 5. How is data growing ?
  • 6. How is data growing ? Continued.. There are two concepts Big User and Big Data along with Cloud Computing Big User – Number of users accessing web is growing rapidly accessing several kind of data such Personal information, social data like tweets, likes, blogs, click streams, comments, follows, or geo location data, log files, system generated data, user generated data, sensor-generated data etc. The growing number of users can’t be predicted specially the advancement of mobile space Big Data – Source of data has increased tremendously but the actual data increased exponentially. Data is not in-terms of gigabyte but in more higher number (tera/penta/exa/zetta/yotta bytes) Cloud Computing – More and more Data is stored in the cloud and access to data should be fast The number of concurrent users skyrocketed as applications increasingly became accessible via the web (and later on mobile devices)  The amount of data collected and processed soared as it became easier and increasingly valuable to capture all kinds of data  The amount of unstructured or semi-structured data exploded and its use became integral to the value and richness of applications
  • 7. Challenges Shared Application Server Layer to Traditional RDBMS Add CPU Add RAM Vertical Shared Disk Oh No! I am loaded Application Server Layer to NoSQL DB Commodity Server Commodity Server Commodity Server Commodity Server Commodity Server
  • 8. Challenges Continued.. Need to scale out (i.e. sharding) and without compromising latency and performance (keeping low latency and high throughout) Need to avoid scale up with costly and complex high-end servers Should be easily scaled out with low cost commodity servers without any application downtime Data is not only structured but majority of the social networking data is unstructured. So, there is a need to support schema less data structures There are many cases where transaction is not of prime concern and costly and heavy writes are not needed Should support easy and quick replication as well as failover
  • 9. What’s the solution ? Real time and batch processing for analytical and operational data by second generation of NoSQL like Couchbase The picture depict a scenario where both batch processing as well as real-time data is being handled by a combination of Hadoop and Couchbase.
  • 10. NoSQL Features and Types Early adopter: Google’s Bigtable is used for throughput sensitive batch processing to latency based online queries. Used in Google Earth, Finance, Orkut, Analytics etc. This is based on the concept of column family data store. There are mainly 3 types of NoSQL databases as mentioned below. Document Database MongoDB and Couchbase (An amalgamated version of CouchDB and Membase) Couchebase started as Apache incubator project and then continued by Couchbase Inc Implemented in Erlang and C, and Javascript execution environment Used by Apple, BBC, and many others.
  • 11. NoSQL Features and Types Continued.. Key/Value pair and Eventual Consistency datastore Redis, Membase, Voldemort, and Cassandra Cassandra supports key/value pair and eventual consistency (based on Amazon’s DynamoDB). Developed by Facebook and implemented in Java Clients are available in Java, PHP, Python, Grails, Ruby, .NET. Used by Facebook, Twitter, Paddy Power, GitHub, and many others. Redis supports key-value pair and a distributed in-memory as well as persistent storage system Started by Salvatore Sanfilippo in 2009 as an independent project Implemented in C Clients are available in PHP, Java, Ruby, Python, C++ etc. Used by Craigslist Sorted Column Family datastore HBase (developed based on Google's bigtable) Created by Powerset and donated to Apache. Implemented in Java. Used by Facebook, Yahoo! and many others. Access method is JRuby, Java, Thrift, REST, ProtoBuf etc. Graph Database Neo4J, FlockDB Neo4J was developed in 2003 and implemented in Java Accessed through REST and Gremlin interfaces. Used by Box.com, ThoughtWorks There are other types of NOSQL databases available as well. XML Database Object Database Grid and Cloud Database Multimodel Database
  • 12. Eventual Consistency Consistency – Data should be consistent across user transactions and each client can have the same view of the data Availability – System should be available always and user can read as well as write always Partition – System should be partition-tolerant and work in a distributed environment Eventual Consistency implies BASE (Basically Available, Soft state, Eventual Consistency) Brewer’s CAP Theorem Succinctly put, Brewer’s Theorem states that in systems that are distributed or scaled out it’s impossible to achieve all three (Consistency, Availability, and Partition Tolerance) at the same time. You must make trade-offs and sacrifice at least one in favor of the other two. ACID BASE Strong consistency Weak consistency – stale data OK Isolation Availability first Focus on “commit” Best effort Nested transactions Approximate answers OK Availability? Aggressive (optimistic) Conservative (pessimistic) Simpler! Difficult evolution (e. g. schema) Faster Easier evolution
  • 13. Eventual Consistency – HAPPY GO LUCKY An example of booking flights for two close friends for attending a conference There is only one ticket left Data Center (Asia) Data Center (US) Anand Scott DATA SYNC-UP GOOD FRIENDS bookflighttickets.com SYNC DOWN Will the tickets be booked ? Booking done
  • 14. Brewer’s Theorem was conjectured by Eric Brewer and presented by him (www.cs.berkeley.edu/ ~brewer/cs262b-2004/PODC-keynote.pdf) as a keynote address at the ACM Symposium on the Principles of Distributed Computing (PODC) in 2000. Brewer’s ideas on CAP developed as a part of his work at UC Berkeley and at Inktomi. A look at Distributed Update and Replication Replication Eventual Consistency – Distributed Environment A V0 (V1) B V0(V1) Writes Reads
  • 15. Theorem of Two options and Three Alternatives SYNCHRONOUS SYNCHRONIZATION A Single Transaction, Consistency is of prime importance ASYNCHRONOUS SYNCHRONIZATION Can be achieved but if synchronization fails, no way to know when it happened Option 1 : Let the Consistency take the precedence and Availability may be compromised considering the system supports partition-tolerance (CP) Option 2: Let the Availability take the precedence and Consistency may be compromised considering the system supports partition-tolerance (AP) Option 3: Let both Consistency and Availability take the precedence and the system is not partition-tolerant (AC)
  • 16. A SMALL TALK COUCHBASE (DOCUMENT) REDIS (KEY-VALUE PAIR) CASSANDRA (COLUMN-FAMILY AND EVENTUAL CONSISTENCY) NEO4J (GRAPH) HOW WILL WE GO ? A SMALL OVERVIEW BASIC FEATURES
  • 17. Features of Couchbase A high level overview of Couchbase is mentioned below. 1.Stores data in a JSON or binary format in the data store (called a document) 2. Supports basic CRUD operations like get, set, delete etc. Uses MVCC to continue with non-blocking IO for read/writes. 3. Provides a strong layer of caching of data in memory and automatically persists data in file system to support strong failover mechanism 4. Uses concepts like Buckets to group physical resources in a cluster logically with options like setting memory for each bucket as well as replication rule. 5. Each buckets divided in 24 logical partitions called vBuckets and used a cluster map to locate document in a cluster 6. vBuckets the lowest denominator to locate a document in a cluster through hash identifier for each document 7. Asynchronous storing of data in disk and replication data to other servers in a cluster as well as across data-center through XDCR feature 8. Very efficient and easy management of a distributed cluster (horizontal scaling), also known as “Scale Out” 9. Supports integration with Memcache protocol seamlessly 10. Rebalancing of data (documents) through change in the clustering (adding or removing nodes from a cluster and updating cluster map with updated location of documents) 11. Database index like feature through Views for faster access of indexed data 12. Highly secured access to Couchbase server through SASL mechanism 13. Provides the support for both optimistic (through compare and swap - CAS mechanism) and pessimistic locking (through explicit locking) 14. An asynchronous listener based approach for manipulating data through Future interface
  • 19. More Overview Replication Process Smart Client writes the data in server object-managed Cache Documents is submitted to intra-cluster replication queue for replicating to other servers The document is persisted to disk write queue asynchronously to write in the disk. The data is written to disk once disk queue flushes off The data is replicated to other clusters through XDCR once the data in persisted in the disk and eventually indexed for searching. Major Components Data Manager Object Managed Cache server warm-up, checkpoint, TAP replicator, backfill, resident item ratio, NRU, ejection, item pager) Storage Engine compaction) Query Engine Index can be created and queried for JSON documents Secondary indexes are created through View and Design documents Cluster Manager (orchestration node) The Hearbeat watchdog The process Manager The configuration manager
  • 20. Features of REDIS A high level overview of Redis is mentioned below. 1.Started in 2009, REDIS (Remote Dictionary Service) is a distributed key-value pair database. A shared memory system for very fast read and write capabilities. Fundamentally, an advanced version of Memcache database. 2. Creator of Redis - Salvatore Sanfilippo termed as “Data Structure Store” capable of storing complex data structure as keys like Set, List, Hash, Sorted Set, bitmaps etc. apart from normal strings. 3. Apart from a data structure store, it also works as blocking queue (Stack) and Publish-Subscribe system 4. A powerful command-line-interface (CLI) and rich API for the clients. 5. An expired based policy can be set for each key-value pair in order to let the list grow unbounded 6. Provides an option to save data in the disk, an unusual case for any key-value system which primarily operates over memory. There is the facility to take the snapshot in an interval based on some criteria like number of changes for keys etc. 7. An additional protection of data through Append-only file for each writes to save from crashing of the server 8. By default, Redis doesn’t provide a good way to handle security of its own. So, its better to use firewall on SSH to protect the secure data. 9. Supports a Master-Slave replication mechanism but not an multi-master scaling and fail-over intelligent system 10. A client managed cluster support through consistent-hashing rather than server side 11. Provides a probabilistic determination of non-existence of data through Bloom filters (managing sequence of bits) 12. Uses a special data structure called “Dynamic String Structure” (SDS) to store all the data internally 13. Uses its own Virtual Memory management to locate data in the disk
  • 21. Replication in REDIS Master Node Slave Node (Read-Write) Slave Node (Read-Write) Slave Node (Read-only) Repl i cation Repl i cation Repl i cation Disk Non-Blocking Synchronization Redis Server Redis Client/Smart Client Periodic Save Redis uses hash slots to bucketing data elements across nodes, so that data is sharded across nodes for fault-tolerant. If any new node has been added or removed from the cluster, Redis maintains the linkage of data from old to new node through its internal node-to-node communication (ping-pong in Redis’s term) based on binary protocol. Under the hood, Redis also uses a Gossip-based protocol among the nodes to track the status of each node and take necessary actions in case some node went down or not responding. Redis has a smart client who can decide to connect to the right node in the cluster to find the data instead of client to any node arbitrarily. Gossip Gossip A B D C E
  • 22. An Example of REDIS Redis can be used for managing data where caching based simple key-value pair along with complex querying facility be given based on the keys. Analytics Caching Search Engine Messaging Broker 1.Get a list of cities under a zip code around the world 2.Get a list of books based on ISBN code where each book is associated with multiple “tagging” words 3.Build a sub-system which can browse through a catalog system to find the product data 4.Use a broker to collect data for multiple sources like managing centralized log content Not so good use cases 1. Every bit of data is very precious 2. Multiple master-master setup and failover needed 3. ACID transaction is highly desired 4. Relational data is of prime importance
  • 23. Features of APACHE CASSANDRA A high level overview of Cassandra is mentioned below. If I had asked people what they wanted, they would have said faster horses. – Henry Ford 1.Influenced by Amazon’s DynamoDB for its distributed design and Google’s Bigtable for the data model, Cassandra is a hybrid datastore supporting both column-family as well as key-value data with Eventual Consistency 2. Cassandra was developed by Facebook and it’s a sparse multi-dimensional hashtable 3. Supports secondary indexes apart from the index on the row-key 4. Supports powerful command-line-interface (CLI) as well as Thrift based multi-lingual drive type communication techniques for the clients. 5.Runs on decentralized mode of keeping each node identical, not like a master-slave topology 6. A tunable consistent system instead of eventual consistency (A always writeble system) 7.Uses Gossip protocol with hinted hands-off to perform peer-to-peer communication across nodes 8.Uses Anti-entropy to manage data synchronization (replication) across multiple nodes with the updated version 9.Uses compaction to merge large datafiles for better management of spaces 10.Uses Bloom filter to find if any element is available in map 11.Uses a concept called “Tombstone” for soft delete. The data is physically deleted during compaction. 12.Uses “Staged Event-Driven Architecture” (SEDA) for highly efficient parallel processing 13.Uses the three separate processes (commit log, memtable, and SSTable) to store and manage data during write operation 14.Uses a concept called “Read Repair” to update outdated values in any node
  • 24. CASSANDRA – Column Family Suppose A customer has a personal information as well as address. So, two column families can be created, one for personal data and the other for address. In column family store, data is identified by the row-key. The difference from a RDBMS is that each row can have its own column family data. In case, there are null values for some columns data is not stored there unlike RDBMS tables which consume additional space.
  • 25. CASSANDRA – An Example Cassandra is a popular datastore for many popular large scale web applications. The following criteria describes some of the important use cases for Cassandra. High volume writes like tweets from Twitter or comments from Facebook Don’t need strong consistency High throughput for Writes Consistency can be controlled
  • 26. Features of Neo4J A high level overview of Neo4J is mentioned below. 1.A Graph database (mathematical modeling) for supporting relational information (relationship ) across multiple entities. Developed by Neo technologies. 2.It is built on the concept of nodes, relationship, parameters (key/value pair), and labels 3.A proprietary query named “Cypher” for performing CRUD operations 4.A very high-performing NoSQL database for storing and retrieving connected data 5.Graph is a way to maintain multi-dimensional relation among entities 6.Highly applicable in social networking applications like social graphs, recommendation etc. 7.The Neo4J site provides a rich REPL (Read-eval- print loop) web interface for running queries as well as performing administrative works. 8.It is ACID compliant as well as provides high-availability and master-slave replication across multiple nodes 9.Provides easy client interface through REST and Gremlin 10.Provides fast-look up through Lucene Sachin Grapes Gaurav Java Bikram friend likes friend likes eats
  • 27. NoSQL – Not mandatory NoSQL is not a replacement of SQL Generally NoSQL is not fit for applications which need strong consistency Correctness of data is more important than availability of data Transactional context is important than analytical processing Data is structured and maintained through object relational hierarchy Need to support Legacy database Future of Databases The future of databases lies with the amalgamation of relational and NoSQL databases based on the need. Pramod J. Sadalage and Martin Fowler mentioned the concept of Polygot Persistence in their famous book “NoSQL Distilled”.
  • 28. Popular Jargons The following terms are some of the most common and important jargons used in the NoSQL world. Sharding or Horizontal Scaling Quorum Gossip Protocol Hinted Hands-off Read Repair Consistent Hashing Merkle Tree
  • 29. Useful Links http://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf Professional NoSQL by Shashank Tiwari NoSQL Databases by Christof Strauch Couchbase Server Under the Hood from Couchbase Inc. http://www.slideshare.net/Muratakal/rdbms-vs-nosql-15797058 http://www.nosql-database.org/ http://www.youtube.com/watch?v=MmL9Lq6WbSY http://incubator.apache.org/thrift/ http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html http://www.youtube.com/watch?v=uMxZ4RI6sCQ http://planetcassandra.org/apache-cassandra-use-cases/ http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf Seven Databases in Seven Weeks by Eric Redmond and Jim R. Wilson