2. Agenda
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Why noSQL ?
noSQL World
ACID vs CAP
DynamoDB – What is it ?
DynamoDB Architecture
Conditional Writes
Strongly consistent vs Eventually consistent
Provisioned throughput
Query vs Scan
DynamoDB parameters
Secondary Index
Item Collections
Limitations
AWS CLI
Table operations
3. Why noSQL ?
•
•
•
•
•
•
•
•
•
Data Velocity – Huge data getting generated in less time
Data Variety – Structured, semi-structures and unstructured
Data Volume – Terabytes and petabytes of data
Dynamic Schema – Flexible data model
Auto-sharding – Horizontal scalability
Continuous Availability
Integrated Caching
Replication – Automatic replication to support high availability
Dynamic provisioned throughput
6. noSQL world
Database Type
Properties
Example
Key-values Stores
•
•
Simplest noSQL datbase
Every item is a set of key-value
pair
DynamoDB (amazon), Amazon
SimpleDB (amazon), Valdemort
(linkedin), MemcacheDB (zinga),
Azure (microsoft), Redis (VMWare)
Column Family
Stores
•
Optimized for queries over large
Hbase (powerset), Cassandra
datasets
(facebook), BigTable (Google)
Contains key-value pair where
key if mapped to set of colums
In analogy with RDBMS, a column
family is a “table”
•
•
Document
Databases
•
•
Graph Databases
•
pair each key with a complex
data structure called “document”
Documents can contain many
different key-value pairs
CouchDB, MongoDB (10gen)
Stores information about
network, social connections
InfiniteGraph, Neo4J (Neo tech),
InfoGrid, FlockDB (Twitter)
7. ACID vs CAP
ACID properties relates to relational databases
• Atomicity
– Requires each transaction to be all or nothing
• Consistency
– Brings databases from one valid state to another
• Isolation
– Takes care of concurrent execution and avoid overwritting
• Durability
– Commited transactions will remain so in the event of power loss,
crashes etc
8. ACID vs CAP
CAP theorem relates to noSQL database
(pick any two)
• Consistency
– All nodes see the same data at the same time
• Availability
– A guarantee that every request receives a response about whether it
was successful or failed
• Partition tolerance
– The system continues to operate despite arbitrary message loss or
failure of part of the system)
10. Amazon DynamoDB – What is it ?
•
•
•
•
Fully managed nosql database service on AWS
Data model in the form of tables
Data stored in the form of items (name – value attributes)
Automatic scaling
– Provisioned throughput
– Storage scaling
– Distributed architecture
• Easy Administration
• Monitoring of tables using CloudWatch
• Integration with EMR (Elastic MapReduce)
– Analyze data and store in S3
11. Amazon DynamoDB – What is it ?
key=value
key=value
key=value
key=value
Table
Item (64KB max)
Attributes
• Primary key (mandatory for every table)
– Hash or Hash + Range
• Data model in the form of tables
• Data stored in the form of items (name – value attributes)
• Secondary Indexes for improved performance
– Local secondary index
– Global secondary index
• Scalar data type (number, string etc) or multi-valued data type (sets)
12. DynamoDB Architecture
•
•
•
•
True distributed architecture
Data is spread across hundreds of servers called storage nodes
Hundreds of servers form a cluster in the form of a “ring”
Client application can connect using one of the two approaches
–
–
•
•
•
•
•
•
Routing using a load balancer
Client-library that reflects Dynamo’s partitioning scheme and can determine the storage host to
connect
Advantage of load balancer – no need for dynamo specific code in client
application
Advantage of client-library – saves 1 network hop to load balancer
Synchronous replication is not achievable for high availability and scalability
requirement at amazon
DynamoDB is designed to be “always writable” storage solution
Allows multiple versions of data on multiple storage nodes
Conflict resolution happens while reads and NOT during writes
–
–
Syntactic conflict resolution
Symantec conflict resolution
13. DynamoDB Architecture - Partitioning
•
•
•
Data is partitioned over multiple hosts called storage nodes (ring)
Uses consistent hashing to dynamically partition data across storage hosts
Two problems associated with consistent hashing
– Hashing of storage hosts can cause imbalance of data and load
– Consistent hashing treats every storage host as same capacity
A [3,4]
1
4
B [1]
2
3
C[2]
Initial Situation
14. DynamoDB Architecture - Partitioning
•
•
•
Data is partitioned over multiple hosts called storage nodes (ring)
Uses consistent hashing to dynamically partition data across storage hosts
Two problems associated with consistent hashing
– Hashing of storage hosts can cause imbalance of data and load
– Consistent hashing treats every storage host as same capacity
A[4]
A [3,4]
1
4
4
B [1]
B[1]
D[2,3]
2
2
3
1
3
C[2]
Initial Situation
Situation after C left and D joined
15. DynamoDB Architecture - Partitioning
Solution – Virtual hosts mapped to physical hosts (tokens)
Number of virtual hosts for a physical host depends on capacity of physical host
Virtual nodes are function-mapped to physical nodes
A
B
D
Physical Host
C
B
C
J
I
H
•
•
•
Virtual Host
16. DynamoDB Architecture – Membership Change
•
•
Gossip protocol used for node membership
Every second, storage node randomly contact a peer to bilaterally reconcile persisted
membership history
Doing so, membership changes are spread and eventually consistent membership view is
formed
When a new members joins, adjacent nodes adjust their object and replica ownership
When a member leaves adjacent nodes adjust their object and replica and distributes keys
owned by leaving member
•
•
•
H
A
G
B
F
Replication factor = 3
E
C
D
17. DynamoDB Architecture – Membership Change
•
•
Gossip protocol used for node membership
Every second, storage node randomly contact a peer to bilaterally reconcile persisted
membership history
Doing so, membership changes are spread and eventually consistent membership view is
formed
When a new members joins, adjacent nodes adjust their object and replica ownership
When a member leaves adjacent nodes adjust their object and replica and distributes keys
owned by leaving member
•
•
•
H
A
G
B
F
Replication factor = 3
E
C
D
18. DynamoDB Architecture – Membership Change
•
•
Gossip protocol used for node membership
Every second, storage node randomly contact a peer to bilaterally reconcile persisted
membership history
Doing so, membership changes are spread and eventually consistent membership view is
formed
When a new members joins, adjacent nodes adjust their object and replica ownership
When a member leaves adjacent nodes adjust their object and replica and distributes keys
owned by leaving member
•
•
•
H
A
G
B
F
Replication factor = 3
E
C
D
19. DynamoDB Architecture – Replication
Hash value of Key K
A
Preference list
A
B
D
B
D
Physical Host
C
B
C
I
H
•
Each data item is replicated N times – replication factor
Storage node in charge of storing key K replicates to its N-1 successor clockwise
Every node has “Preference list” which has mapping of which key belongs where
based on consistent hashing
Preference list skips virtual nodes and contains only distinct physical nodes
J
•
•
•
Virtual Host
20. DynamoDB Architecture – Data Versioning
•
•
•
•
•
•
•
•
•
•
•
•
Eventual Consistency – data propagates asynchronously
Its possible to have multiple version on different storage nodes
If latest data is not available on storage node, client will update the old version of data
Conflict resolution is done using “vector clock”
Vector clock is a metadata information added to data when using get() or put()
Vector clock effectively a list of (node, counter) pairs
One vector clock is associated with every version of every object
One can determine two version has causal ordering or parallel branches by examining
vector clocks
Client specify vector clock information while reading or writing data in the form of
“context”
Dynamo tries to resolve conflict among multiple version using syntactic reconciliation
If syntactic reconciliation does not work, client has to resolve conflict using semantic
reconciliation
Application should be designed to acknowledge multiple version and should have
logic to reconcile the same
22. DynamoDB Architecture – Data Versioning
Sx
Client
Add 2 items
[Sx,1]
Add 1 item
[Sx,2]
Sy
Sz
Time
23. DynamoDB Architecture – Data Versioning
Sx
Client
Add 2 items
[Sx,1]
Add 1 item
[Sx,2]
Add delete
1 item
[Sx,2][Sy,1]
Sy
Sz
Time
24. DynamoDB Architecture – Data Versioning
Sx
Client
Add 2 items
[Sx,1]
Add 1 item
[Sx,2]
Sy
Add delete
1 item
[Sx,2][Sy,1]
Sz
Add 3 items
[Sx,2][Sz,1]
Time
25. DynamoDB Architecture – Data Versioning
Sx
Client
Add 2 items
[Sx,1]
Final Cart – 5
items
[Sx,3][Sy,1][Sz,1]
Add 1 item
[Sx,2]
Sy
Add delete
1 item
[Sx,2][Sy,1]
Sz
Add 3 items
[Sx,2][Sz,1]
Time
26. DynamoDB Architecture – Data Versioning
Syntactic reconciliation
(handled by dynamoDB as
[Sx,2] > [Sx,1])
Sx
Client
Add 2 items
[Sx,1]
Final Cart – 5
items
[Sx,3][Sy,1][Sz,1]
Add 1 item
[Sx,2]
Sy
Add delete
1 item
[Sx,2][Sy,1]
Sz
Add 3 items
[Sx,2][Sz,1]
Time
27. DynamoDB Architecture – Data Versioning
Syntactic reconciliation
(handled by dynamoDB as
[Sx,2] > [Sx,1])
Sx
Client
Add 2 items
[Sx,1]
Symantec reconciliation
(handled by client application)
Is [Sx,2][Sy,1] > [Sx,2][Sz,1] ???
Final Cart – 5
items
[Sx,3][Sy,1][Sz,1]
Add 1 item
[Sx,2]
Sy
Add delete
1 item
[Sx,2][Sy,1]
Sz
Add 3 items
[Sx,2][Sz,1]
Time
28. DynamoDB Architecture – Data Versioning
Syntactic reconciliation
(handled by dynamoDB as
[Sx,2] > [Sx,1])
Sx
Client
Sy
Sz
Add 2 items
[Sx,1]
Symantec reconciliation
(handled by client application)
Is [Sx,2][Sy,1] > [Sx,2][Sz,1] ???
Final Cart – 5
items
[Sx,3][Sy,1][Sz,1]
Add 1 item
[Sx,2]
Add delete
1 item
[Sx,2][Sy,1]
Syntactic reconciliation
(handled by dynamoDB as
[Sx,1][Sz,1] > [Sx,1])
Add 3 items
[Sx,2][Sz,1]
Add 1 item
[Sx,1][Sz,1]
Time
29. DynamoDB Architecture – Data Versioning
Syntactic reconciliation
(handled by dynamoDB as
[Sx,2] > [Sx,1])
Sx
Client
Sy
Sz
Add 2 items
[Sx,1]
Symantec reconciliation
(handled by client application)
Is [Sx,2][Sy,1] > [Sx,2][Sz,1] ???
Final Cart – 5
items
[Sx,3][Sy,1][Sz,1]
Add 1 item
[Sx,2]
Add delete
1 item
[Sx,2][Sy,1]
Syntactic reconciliation
(handled by dynamoDB as
[Sx,1][Sz,1] > [Sx,1])
Add 3 items
[Sx,2][Sz,1]
Add 1 item
[Sx,1][Sz,1]
Time
If counters on first object’s vector clock are less than or equal to all of the nodes in
second vector clock, then first is an ancestor of second and can be forgotten
30. DynamoDB Architecture – get() and put()
•
2 strategies client uses to select a node
–
–
•
•
•
•
Node handling read and write operation is called “coordinator”
Coordinator should be among first in top N nodes in preference list
For reads and writes dynamo uses consistency protocol similar to quorum system
Consistency protocol has 3 variables
–
–
–
•
N – Number of replicas of data to be read or written
W – Number of nodes that must participate in successful write operation
R – Number of machines contacted in read operation
Put()
–
–
–
•
Generic load-balancer that will select a node based on load
Partition aware client-library which routes request to right node
Upon receiving put() request coordinator generates vector clock and writes data locally
Coordinator sends new version (along with vector clock) to top N nodes from preference list
If at least W-1 nodes respond, write is successful
Get()
–
–
–
–
Coordinator requests all existing version of data from top N nodes from preference list
Waits for at least R nodes to respond
In case of multiple versions, coordinator send all casually unrelated versions to client
Client reconcile divergent versions and supersede current version with reconciled version
32. Conditional Writes
• Multiple clients can access the same item and try to update the same
attributes
• idempotent operation – Running same request multiple times will have
not effect
33. Strongly consistent vs Eventually Consistent
• Multiple copies of same item to ensure durability
• Takes time for update to propagate multiple copies
• Eventually consistent
– After write happens, immediate read may not give latest value
• Strongly consistency
– Requires additional read capacity unit
– Gets most up-to-date version of item value
• Eventually consistent read consumes half the read capacity
unit as strongly consistent read
34. Provisioned throughput
• Configuring read and write capacity
• Read capacity
– 1 read unit = 4KB (item will be rounded off to multiple of 4KB)
– 1 read capacity = 1 read unit for strong consistency or 2 read unit for
eventual consistency
– 1 read capacity = 4KB for strong consistency or 8KB for eventual
consistency
• Write capacity
– 1 write unit = 1KB (item will be rounded off to multiple of 1KB)
– 1 write capacity = 1 write unit
– 1 write capacity = 1KB
36. Operations vs Provisioned throughput
Operation
Description
Provisioning
throughput
Calculation
GetItem
Reads only 1 item
Item rounded off to
multiple of 4KB
Item size = 400 bytes
Read capacity = 1 (eventually
consistent)
BatchGetItem
Reads multiple items
(max 1MB or 100
items)
Every item rounded
off to multiple of
4KB
Item size = 400 bytes
Number of items 100
Read capacity = 50
(eventually consistent)
Query
Reads multiple items
(max 1MB or 100
items)
Total output is
rounded off to
multiple of 4KB
Item size = 400 bytes
Number of items 100
Read capacity = 10
(eventually consistent)
Scan
Reads entire table and
then apply filter
Size of table
rounded off to
multiple of 4KB
Item size = 400 bytes
Number of records in table =
10000
Read capacity = 500
(eventually consistent)
37. Query vs Scan
• Query
–
–
–
–
–
–
–
Search based on primary key and examines only matching data
Maximum 1MB result
Query result sorted by range key
Query result can be opted to give strong consistency
Query can be done based on secondary index
Requires less provisioned throughput
Query performance depends on amount of data retrieved
• Scan
–
–
–
–
–
–
–
Examines every item and then applies filter
Maximum 1MB result
Scan result is always eventually consistent
Required high provisioned throughput
Scan performance depends on table size
Secondary index does not have any impact on scan performance
Parallel scan can be performed using TotalSegments
38. DynamoDB Parameters
• LastEvaluatedKey
– Key of last returned value for operation > 1 MB
– If LastEvaluatedKey is null then all data is transfered
• ExclusiveStartKey
– Starting point of second query request for operation > 1MB
• Count
– During request if count = true, dynamoDB provides only count(*) from table
• ScannedCount
–
Total number of items scanned in scan operation before applying any filter
• Limit
– Limit number of rows in result
• Segments
– Segment to be scanned by a worker in parallel scan
• TotalSegments
– Degree of parallelism set in parallel scan
39. Secondary Index
• For efficiently accessing data using keys other than primary
key
• Can only be created on scalar data type
• Can include projections from table into index
• Primary key attributes always part of secondary index
• DynamoDB copies items from table to index
• 2 types of secondary index
– Local secondary index
– Global secondary index
• Access to index similar as access to table
• Indexes are maintained automatically
• Maximum 5 local and 5 global secondary indexes for a table
40. Local vs Global Secondary Index
• Local Secondary Index
–
–
–
–
–
–
–
Hash + Range key
Has the same hash key as primary key. Range key defers
For each hash key total size of all indexed items <= 10GB
We can choose between eventual consistency or strong consistency
Consumes read capacity units from table
Write to table consumes write capacity from table to update index
Can request other table attributes which are not part of local index
• Global Secondary index
–
–
–
–
–
–
Hash or Hash + Range
Hash key different that hash key of primary key
No Size restrictions
Supports only eventual consistency
Has own provisioning throughput and doesn’t use table’s throughput
Can only request attributes which are part of global index
41. Item Collections
• Items with same hash key
• Includes table items and all local secondary indexes
• Max size allowed for any item is 10GB (applicable to tables with secondary
indexes)
• If size exceeds, dynamoDB won’t be able to add any data to item
• Table and index data for same item is stored in same partition
• View item sizes using ReturnItemCollectionMetrics parameter
42. Limitations of DynamoDB
•
•
•
•
•
•
•
•
64KB limit on item size (row size)
1 MB limit on fetching data
Pay more if you want strongly consistent data
Size is multiple of 4KB (provisioning throughput wastage)
Cannot join tables
Indexes should be created during table creation only
No triggers or server side scripts
Limited comparison capability (no not_null, contains etc)
43. Table Operations – Console
•
https://console.aws.amazon.com/dynamodb - fc-dba-rds@amzon.com / <password>
If you already have tables
57. boto – AWS Python Interface
Install git:
wget http://kernel.org/pub/software/scm/git/git-1.8.5.tar.bz2
tar xvfj git-1.8.5.tar.bz2
cd git-1.8.5/
./configure
make
make install
advaitd-2.desktop$ whereis git
git: /usr/local/bin/git
Install boto:
git clone git://github.com/boto/boto.git
cd boto
python setup.py install
*** Needs python 2.7 and above