2. Agenda Communicating Knowledge
New Challenges for RDBMS
Introduction to NoSQL
MongoDB Sharding
2
3. Relational DBMS Communicating Knowledge
Since 1970
Use SQL to manipulate data
Easy to use
Easy to integrate with other system
Excellent for applications such as management
(accounting, reservations, staff management,
etc)
3
4. ACID Properties of RDBMS Communicating Knowledge
Databases always satisfy this four properties
Atomic: “all or nothing”, when a statement is
executed, it is either successful or failed
Consistent: data moves from one correct state to
another correct state
Isolated: two concurrent transaction will not
become entangle with each other
Durable: one a transaction has succeeded, the
change will not be lost
4
5. What is problem of RDBMS? Communicating Knowledge
Schemas aren't designed for sparse data
Normalize, creates a lot of tables
Joins can be prohibitively expensive
Most importantly, databases are simply not
designed to be distributed.
5
6. An Example of a Distributed DB Communicating Knowledge
A banking system consisting of 4 branches in
four different city. Each branch maintains
accounts locally
Account = (account-number, branch, balance)
One single site that maintains information about
branches
Branch = (branch-name, city, assets)
6
7. An Example of a Distributed DB Communicating Knowledge
Transfer $1000 Transaction
From A:$3000 coordinator
To B:$2000
client
Bank A Bank B
Clients want all-or-nothing transactions
Transfer either happens or not at all
7
8. An Example of a Distributed DB Communicating Knowledge
Simple solution
client transaction bank A bank B
coordinator
start
A=A-1000
done B=B+1000
What can go wrong?
A does not have enough money
B’s account no longer exists
B has crashed
Coordinator crashes 8
9. An Example of a Distributed DB Communicating Knowledge
Two-phase Commit Protocol (2PC)
client transaction bank A bank B
coordinator
start
Locked
prepare
prepare
rA
Loss of ravailability and
B
outcome
result higher latency!
outcome
If rA==yes && rB==yes
outcome = “commit”
B commits upon
else
receiving “commit”
outcome = “abort”
9
10. Schemas vs. Schema-free Communicating Knowledge
Use tables to represent real objects
Join operation is expensive and difficult to be
executed in horizontal scale-out
Name Surname Home Mobile Telephone Office Marital -
Status
Quang Nguyen Null 398 Null Null Null null
Cuong Trinh Nguyen 999 555 null null null
Dinh
Chieu
- - - - - - - -
user :{
user :{ name: Cuong,
name: quang, surname: Trinh,
surname: Nguyen, Home: Nguyen Dinh Chieu,
mobile : 398 mobile : 999,
} Telephone: 555,
}
10
12. Information amount is growing fast Communicating Knowledge
In 2010, the amount of information created and
replicated first time exceeded zettabytes (trillion
gigabytes). In 2011, it surpass 1.8 zettabytes
12
13. Google: BigTable Communicating Knowledge
Web Indexing
Google Earth
Youtube
Google Books
Google Mail
High Scalability
High Availability
13
14. Amazon: DynamoDB Communicating Knowledge
RDBMS doesn’t fit requirements
10 of thousands servers around the world
10 million customers
High Reliability
High Availability
14
15. Facebook: Cassandra, HBase Communicating Knowledge
People
High Scalability
More than 800 million active users
High Availability
More than 50% of our active users log on to
Facebook in any given day
Average user has 130 friends
Activity
More than 900 million objects that people interact with
(pages, groups, events and community pages)
On average, more than 250 million photos are
uploaded per day
Messaging system including chat, wall posts,
and email has 135+ billion messages per month
15
16. Twitter Communicating Knowledge
High Availability
16
17. CAP Theorem Communicating Knowledge
It is impossible for a distributed computer system
to simultaneously provide all three of the following
guarantees
Consistency: all nodes see the same data at the
same time
Availability: every request receives a response
about whether it was successful or failed
Partition Tolerance: the system continues to
operate despite arbitrary message loss
You have to choose only two. In almost all cases, you
would choose availability over consistency
17
18. Consistency Level Communicating Knowledge
Strong (Sequential): After the update
completes any subsequent access will return the
updated value.
Weak (weaker than Sequential): The system
does not guarantee that subsequent accesses
will return the updated value.
Eventual: All updates will propagate throughout
all of the replicas in a distributed system, but that
this may take some time. Eventually, all replicas
will be consistent.
18
19. What is NoSQL Communicating Knowledge
Stands for Not Only SQL
Class of non-relational data storage systems
Usually do not require a fixed table schema nor
do they use the concept of joins
All NoSQL offerings relax one or more of the
ACID properties
NoSQL !=
19
21. NoSQL Features Communicating Knowledge
Key/Value stores or “the big hash table”
Amazon S3 (Dynamo)
Memcached
Schema-less, which comes in multiple flavors
Document-based (MongoDB, CouchDB)
Column-based (Cassandra, Hbase)
Graph-based (neo4j)
21
22. Key/Value Communicating Knowledge
Advantages
Very fast
Very scalable
Simple model
Able to distribute horizontally
Disadvantages
Many data structures (objects) can't be easily
modeled as key value pairs
22
23. Schema-less Communicating Knowledge
Advantages
Schema-less data model is richer than key/value
pairs
Eventual consistency
Many are distributed
Still provide excellent performance and scalability
Disadvantages
no ACID transactions
23
26. Introduction to MongoDB Communicating Knowledge
MongoDB is document-oriented database
Key -> Document
Structured Document
Schema-free
user :{
name: quang,
Key = quang surname: Nguyen,
mobile : 398
}
user :{
name: Cuong,
surname: Trinh,
Key = cuong Home: Nguyen Dinh Chieu,
mobile : 999,
Telephone: 555,
}
26
27. Introduction to MongoDB Communicating Knowledge
Result count: 1
user :{
name: quang,
Query = quang surname: Nguyen,
mobile : 398
}
Result count: 1
user :{
name: Cuong,
surname: Trinh,
Query =cuong Home: Nguyen Dinh Chieu,
mobile : 999,
Telephone: 555,
}
27
28. Features of Mongo DB Communicating Knowledge
Indexing
Stored JavaScript
Aggregation
File Storage
Make Scaling out easier
Scaling out vs. Scaling up
Scaling out is done automatically, balanced across a
cluster
28
29. Some applications of MongoDB Communicating Knowledge
Large scale application
Archiving and event logging
Document and Content Management Systems
foursquare uses MongoDB to store venues
and user "check-ins" into venues, sharding
the data over more than 25 machines on
Amazon EC2
Craigslist uses MongoDB to archive billions
of records
Disney built a common set of tools and
APIs for all games within the Interactive
Media Group, using MongoDB as a
common object repository to persist state
information
29
31. Introduction to Cassandra Communicating Knowledge
Column Family: logical division that associate
similar data. E.g., User Column Family, Hotel
Column Family.
Row oriented: each row doesn’t need to have all
the same columns as other rows like it (as in a
relational model).
Schema-Free
31
33. Features of Cassandra Communicating Knowledge
Distributed and Decentralized
Some nodes need to be set up as masters in order to
organize other nodes, which are set up as slaves
That there is no single point of failure
High Availability & Fault Tolerance
You can replace failed nodes in the cluster with no
downtime, and you can replicate data to multiple data
centers to offer improved local performance and
prevent downtime if one data center experiences a
catastrophe such as fire or flood.
Tunable Consistency
It allows you to easily decide the level of consistency
you require, in balance with the level of availability
33
34. Features of Cassandra Communicating Knowledge
Elastic Scalability
Elastic scalability refers to a special property of
horizontal scalability. It means that your cluster can
seamlessly scale up and scale back down.
34
35. Some Applications of Cassandra Communicating Knowledge
Large Deployments
Lots of Writes, Statistics, and Analysis
Geographical Distribution
Facebook used Cassandra to power Inbox
Search, with over 200 nodes deployed
Twitter announced it is planning to use
Cassandra because it can be run on large
server clusters and is capable of taking in
very large amounts of data at a time
AppScale uses Cassandra as a back-end
for Google App Engine applications
35
37. Neo4j – Graph Database Communicating Knowledge
Data is stored as a Graph/Network
Nodes and relationships with properties
Schema-free
people :{ KNOWS people :{
name: quang, KNOWS name: Cuong,
surname: Nguyen} surname: Trinh,
hobbies: uncountable}
KNOWS KNOWS
WORKS
people:{ OWNS
name: Thanh,
Company:{ surname: Nguyen} Company:{
name: Saltlux, Vietnam name: TechMaster,
WORKS
Area: SearchEngine} area: IT Education,
Company:{ founded: 2011}
name: Fami,
area: Furniture}
37
38. Neo4j – Graph Database Communicating Knowledge
Find all persons that KNOWS a friend that
KNOWS someone called “Larry Ellison”
SELECT ?person WHERE {
?person neo4j:KNOWS ?friend .
?friend neo4j:KNOWS ?foe .
?foe neo4j:name "Larry Ellison" .
}
38
39. Features of Neo4j Communicating Knowledge
Disk-based
Fully transactional like a real database (ACID is
satisfied)
Scale-up, massive scalability. Neo4j can handle
graphs of several billion nodes/ relationships/
properties on a single machine.
No sharding
39
40. Some Applications of Neo4j Communicating Knowledge
Ideal for any application that relies on the
relationships between records
Social Networks
Recommendations
40
42. Some Considerations Communicating Knowledge
If you want to store a large volume of data or
access to it at a higher rate higher than a single
server can handle?
More servers are added, what is the
dependency between servers
Can your application handle if one server/subset
of servers crashes?
What if communication has problems?
42
43. What is sharding Communicating Knowledge
Sharding is the method MongoDB uses to split a
large collection across server servers (called
cluster)
MongoDB does almost everything automatically;
MongoDB lets your application grow – easily,
robustly, and natually
Making the cluster “invisible”
Making the cluster always available for reads and
writes
Let the cluster grow easily
43
44. A Shard Communicating Knowledge
A shard is one or more servers in a cluster that
are responsible for some subset of the data
A shard can consist of many servers. If there is
more than one server in a shard, each server
has identical copy of the subset of the data
abc abc
abc
Shard
abc
44
45. Distributing Data – One range per shard Communicating Knowledge
One range per shard
[“a”, “f”) [“f”, “n”) [“n”, “t”) [“t”,”{”)
Shard 1 Shard 2 Shard 3 Shard 4
Data movement issue
[“c”, “f”)
[“a”, “f”) [“f”, “n”) [“n”, “t”) [“t”,”{”)
Shard 1 Shard 2 Shard 3 Shard 4
[“a”, “c”) [“c”, “n”) [“n”, “t”) [“t”,”{”)
Shard 1 Shard 2 Shard 3 Shard 4
45
46. Distributing Data – One range per shard Communicating Knowledge
Data has to be moved across the cluster
500 GB 500 GB 300 GB 300 GB
100 GB
400 GB 400GB Data
600 GB 300 GB 300 GB
Movement 200 GB
400 GB 400 GB 500 GB 300 GB
100 GB
400 GB 400 GB 400 GB 400 GB
46
47. Distributing Data – One range per shard Communicating Knowledge
It’s worse when a new shard is added
500 GB 500 GB 500 GB 500 GB 0 GB
1 TB Data Movement
100 GB 200 GB 300 GB 400 GB
400 GB 400 GB 400 GB 400 GB 400 GB
47
48. Distributing Data – Multi range shards Communicating Knowledge
Each shard can contain multiple ranges. Each
range of data is called a chunk.
500 GB 500 GB 300 GB 300 GB
[“a”, “f”) [“f”, “n”) [“n”, “t”) [“t”, “{“)
100 GB, [“d”, “f”) 100 GB, [“j”, “n”)
500 GB 500 GB 300 GB 300 GB
[“a”, “f”) [“f”, “n”) [“n”, “t”) [“t”, “{“)
400 GB 400 GB 400 GB
400 GB
[“a”, “d”) [“n”, “t”); [“t”, “{“);
[“f”, “j”)
[“d”, “f”) [“j”, “n”)
48
49. Sharding a collection Communicating Knowledge
Key (Shard Key) is used for chunk ranges.
Shard key is of any types
null < numbers < strings < objects < arrays < binary data <
ObjectIds < boolean < dates < regular expression
MongoDB first creates a (-∞, + ∞) chunk for a
collection
If we add more data, MongoDB would split
existing chunks to create new ones
Every chunk range must be distinc, not
overlapped with other chunk range
Data movement is resource-consuming, a chunk
is only 200MB by default
49
50. Balancing Communicating Knowledge
MongoDB automatically moves chunks from one
shard to another in order to
keep the data evenly distributed and
minimize the data movement. A shard must have at
least 09 more chunks than the least populous chunk
50
51. Choose a Sharding Key Communicating Knowledge
Avoid low-cardinality sharding key
Continent value: “Asia”, “Australia”, ”Europe”,”North
America”, or “South America”
MongoDB can’t split these chunks any further! The
chunks will just keep getting bigger and bigger.
Ascending key does not work as well as we
expect.
Use timestamp as sharding key
Everything is added to the last chunk
51
52. Choose a Sharding Key Communicating Knowledge
Random Shard key
Waste of index
So, we want to choose a shard key with nice
data locality, but not so local that we end up with
a hot spot.
52
53. When to shard Communicating Knowledge
In general, you should start with a nonsharded
setup and convert it to a sharded one, if and
when you need.
Run out of disk space on your current machine.
Want to write data faster than a single process can
handle.
Want to keep a larger proportion of data in memory to
improve performance.
53
Tell story of RDMBSWhy RDBMS is popularWhat is the problem of RDBMSWhy need features of NoSQL
MongoDB is very good at real-time inserts, updates, and queries. Scalability and replication are provided which are necessary functions for large web sites' real-time data store
Scalability is an architectural feature of a system that can continue serving a greaternumber of requests with little degradation in performance. Vertical scaling—simplyadding more hardware capacity and memory to your existing machine—is the easiestway to achieve this. Horizontal scaling means adding more machines that have all orsome of the data on them so that no one machine has to bear the entire burden ofserving requests. But then the software itself must have an internal mechanism forkeeping its data in sync with the other nodes in the cluster.