SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Amazon DynamoDB
- Advait Deo
Agenda
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•

Why noSQL ?
noSQL World
ACID vs CAP
DynamoDB – What is it ?
DynamoDB Architecture
Conditional Writes
Strongly consistent vs Eventually consistent
Provisioned throughput
Query vs Scan
DynamoDB parameters
Secondary Index
Item Collections
Limitations
AWS CLI
Table operations
Why noSQL ?
•
•
•
•
•
•
•
•
•

Data Velocity – Huge data getting generated in less time
Data Variety – Structured, semi-structures and unstructured
Data Volume – Terabytes and petabytes of data
Dynamic Schema – Flexible data model
Auto-sharding – Horizontal scalability
Continuous Availability
Integrated Caching
Replication – Automatic replication to support high availability
Dynamic provisioned throughput
Why noSQL ? – Scalability advantage
Why noSQL ? – Scalability advantage
noSQL world
Database Type

Properties

Example

Key-values Stores

•
•

Simplest noSQL datbase
Every item is a set of key-value
pair

DynamoDB (amazon), Amazon
SimpleDB (amazon), Valdemort
(linkedin), MemcacheDB (zinga),
Azure (microsoft), Redis (VMWare)

Column Family
Stores

•

Optimized for queries over large
Hbase (powerset), Cassandra
datasets
(facebook), BigTable (Google)
Contains key-value pair where
key if mapped to set of colums
In analogy with RDBMS, a column
family is a “table”

•
•
Document
Databases

•
•

Graph Databases

•

pair each key with a complex
data structure called “document”
Documents can contain many
different key-value pairs

CouchDB, MongoDB (10gen)

Stores information about
network, social connections

InfiniteGraph, Neo4J (Neo tech),
InfoGrid, FlockDB (Twitter)
ACID vs CAP
ACID properties relates to relational databases
• Atomicity
– Requires each transaction to be all or nothing

• Consistency
– Brings databases from one valid state to another

• Isolation
– Takes care of concurrent execution and avoid overwritting

• Durability
– Commited transactions will remain so in the event of power loss,
crashes etc
ACID vs CAP
CAP theorem relates to noSQL database
(pick any two)
• Consistency
– All nodes see the same data at the same time

• Availability
– A guarantee that every request receives a response about whether it
was successful or failed

• Partition tolerance
– The system continues to operate despite arbitrary message loss or
failure of part of the system)
ACID vs CAP
Amazon DynamoDB – What is it ?
•
•
•
•

Fully managed nosql database service on AWS
Data model in the form of tables
Data stored in the form of items (name – value attributes)
Automatic scaling
– Provisioned throughput
– Storage scaling
– Distributed architecture
• Easy Administration
• Monitoring of tables using CloudWatch
• Integration with EMR (Elastic MapReduce)
– Analyze data and store in S3
Amazon DynamoDB – What is it ?
key=value

key=value

key=value

key=value

Table
Item (64KB max)
Attributes

• Primary key (mandatory for every table)
– Hash or Hash + Range
• Data model in the form of tables
• Data stored in the form of items (name – value attributes)
• Secondary Indexes for improved performance
– Local secondary index
– Global secondary index
• Scalar data type (number, string etc) or multi-valued data type (sets)
DynamoDB Architecture
•
•
•
•

True distributed architecture
Data is spread across hundreds of servers called storage nodes
Hundreds of servers form a cluster in the form of a “ring”
Client application can connect using one of the two approaches
–
–

•
•
•
•
•
•

Routing using a load balancer
Client-library that reflects Dynamo’s partitioning scheme and can determine the storage host to
connect

Advantage of load balancer – no need for dynamo specific code in client
application
Advantage of client-library – saves 1 network hop to load balancer
Synchronous replication is not achievable for high availability and scalability
requirement at amazon
DynamoDB is designed to be “always writable” storage solution
Allows multiple versions of data on multiple storage nodes
Conflict resolution happens while reads and NOT during writes
–
–

Syntactic conflict resolution
Symantec conflict resolution
DynamoDB Architecture - Partitioning
•
•
•

Data is partitioned over multiple hosts called storage nodes (ring)
Uses consistent hashing to dynamically partition data across storage hosts
Two problems associated with consistent hashing
– Hashing of storage hosts can cause imbalance of data and load
– Consistent hashing treats every storage host as same capacity
A [3,4]

1

4

B [1]

2
3
C[2]

Initial Situation
DynamoDB Architecture - Partitioning
•
•
•

Data is partitioned over multiple hosts called storage nodes (ring)
Uses consistent hashing to dynamically partition data across storage hosts
Two problems associated with consistent hashing
– Hashing of storage hosts can cause imbalance of data and load
– Consistent hashing treats every storage host as same capacity
A[4]

A [3,4]

1

4

4
B [1]

B[1]
D[2,3]

2

2
3

1

3
C[2]

Initial Situation

Situation after C left and D joined
DynamoDB Architecture - Partitioning
Solution – Virtual hosts mapped to physical hosts (tokens)
Number of virtual hosts for a physical host depends on capacity of physical host
Virtual nodes are function-mapped to physical nodes

A

B

D

Physical Host

C

B
C

J
I
H

•
•
•

Virtual Host
DynamoDB Architecture – Membership Change
•
•

Gossip protocol used for node membership
Every second, storage node randomly contact a peer to bilaterally reconcile persisted
membership history
Doing so, membership changes are spread and eventually consistent membership view is
formed
When a new members joins, adjacent nodes adjust their object and replica ownership
When a member leaves adjacent nodes adjust their object and replica and distributes keys
owned by leaving member

•
•
•

H

A

G
B

F

Replication factor = 3
E

C

D
DynamoDB Architecture – Membership Change
•
•

Gossip protocol used for node membership
Every second, storage node randomly contact a peer to bilaterally reconcile persisted
membership history
Doing so, membership changes are spread and eventually consistent membership view is
formed
When a new members joins, adjacent nodes adjust their object and replica ownership
When a member leaves adjacent nodes adjust their object and replica and distributes keys
owned by leaving member

•
•
•

H

A

G
B

F

Replication factor = 3
E

C

D
DynamoDB Architecture – Membership Change
•
•

Gossip protocol used for node membership
Every second, storage node randomly contact a peer to bilaterally reconcile persisted
membership history
Doing so, membership changes are spread and eventually consistent membership view is
formed
When a new members joins, adjacent nodes adjust their object and replica ownership
When a member leaves adjacent nodes adjust their object and replica and distributes keys
owned by leaving member

•
•
•

H

A

G
B

F

Replication factor = 3
E

C

D
DynamoDB Architecture – Replication

Hash value of Key K
A

Preference list
A
B
D

B

D

Physical Host

C

B

C

I
H

•

Each data item is replicated N times – replication factor
Storage node in charge of storing key K replicates to its N-1 successor clockwise
Every node has “Preference list” which has mapping of which key belongs where
based on consistent hashing
Preference list skips virtual nodes and contains only distinct physical nodes

J

•
•
•

Virtual Host
DynamoDB Architecture – Data Versioning
•
•
•
•
•
•
•
•
•
•
•
•

Eventual Consistency – data propagates asynchronously
Its possible to have multiple version on different storage nodes
If latest data is not available on storage node, client will update the old version of data
Conflict resolution is done using “vector clock”
Vector clock is a metadata information added to data when using get() or put()
Vector clock effectively a list of (node, counter) pairs
One vector clock is associated with every version of every object
One can determine two version has causal ordering or parallel branches by examining
vector clocks
Client specify vector clock information while reading or writing data in the form of
“context”
Dynamo tries to resolve conflict among multiple version using syntactic reconciliation
If syntactic reconciliation does not work, client has to resolve conflict using semantic
reconciliation
Application should be designed to acknowledge multiple version and should have
logic to reconcile the same
DynamoDB Architecture – Data Versioning

Sx

Client

Add 2 items
[Sx,1]

Sy

Sz

Time
DynamoDB Architecture – Data Versioning

Sx

Client

Add 2 items
[Sx,1]

Add 1 item
[Sx,2]

Sy

Sz

Time
DynamoDB Architecture – Data Versioning

Sx

Client

Add 2 items
[Sx,1]

Add 1 item
[Sx,2]

Add delete
1 item
[Sx,2][Sy,1]

Sy

Sz

Time
DynamoDB Architecture – Data Versioning

Sx

Client

Add 2 items
[Sx,1]

Add 1 item
[Sx,2]

Sy

Add delete
1 item
[Sx,2][Sy,1]

Sz

Add 3 items
[Sx,2][Sz,1]

Time
DynamoDB Architecture – Data Versioning

Sx

Client

Add 2 items
[Sx,1]

Final Cart – 5
items
[Sx,3][Sy,1][Sz,1]

Add 1 item
[Sx,2]

Sy

Add delete
1 item
[Sx,2][Sy,1]

Sz

Add 3 items
[Sx,2][Sz,1]

Time
DynamoDB Architecture – Data Versioning
Syntactic reconciliation
(handled by dynamoDB as
[Sx,2] > [Sx,1])

Sx

Client

Add 2 items
[Sx,1]

Final Cart – 5
items
[Sx,3][Sy,1][Sz,1]

Add 1 item
[Sx,2]

Sy

Add delete
1 item
[Sx,2][Sy,1]

Sz

Add 3 items
[Sx,2][Sz,1]

Time
DynamoDB Architecture – Data Versioning
Syntactic reconciliation
(handled by dynamoDB as
[Sx,2] > [Sx,1])

Sx

Client

Add 2 items
[Sx,1]

Symantec reconciliation
(handled by client application)
Is [Sx,2][Sy,1] > [Sx,2][Sz,1] ???
Final Cart – 5
items
[Sx,3][Sy,1][Sz,1]

Add 1 item
[Sx,2]

Sy

Add delete
1 item
[Sx,2][Sy,1]

Sz

Add 3 items
[Sx,2][Sz,1]

Time
DynamoDB Architecture – Data Versioning
Syntactic reconciliation
(handled by dynamoDB as
[Sx,2] > [Sx,1])

Sx

Client

Sy

Sz

Add 2 items
[Sx,1]

Symantec reconciliation
(handled by client application)
Is [Sx,2][Sy,1] > [Sx,2][Sz,1] ???
Final Cart – 5
items
[Sx,3][Sy,1][Sz,1]

Add 1 item
[Sx,2]

Add delete
1 item
[Sx,2][Sy,1]

Syntactic reconciliation
(handled by dynamoDB as
[Sx,1][Sz,1] > [Sx,1])

Add 3 items
[Sx,2][Sz,1]

Add 1 item
[Sx,1][Sz,1]

Time
DynamoDB Architecture – Data Versioning
Syntactic reconciliation
(handled by dynamoDB as
[Sx,2] > [Sx,1])

Sx

Client

Sy

Sz

Add 2 items
[Sx,1]

Symantec reconciliation
(handled by client application)
Is [Sx,2][Sy,1] > [Sx,2][Sz,1] ???
Final Cart – 5
items
[Sx,3][Sy,1][Sz,1]

Add 1 item
[Sx,2]

Add delete
1 item
[Sx,2][Sy,1]

Syntactic reconciliation
(handled by dynamoDB as
[Sx,1][Sz,1] > [Sx,1])

Add 3 items
[Sx,2][Sz,1]

Add 1 item
[Sx,1][Sz,1]

Time

If counters on first object’s vector clock are less than or equal to all of the nodes in
second vector clock, then first is an ancestor of second and can be forgotten
DynamoDB Architecture – get() and put()
•

2 strategies client uses to select a node
–
–

•
•
•
•

Node handling read and write operation is called “coordinator”
Coordinator should be among first in top N nodes in preference list
For reads and writes dynamo uses consistency protocol similar to quorum system
Consistency protocol has 3 variables
–
–
–

•

N – Number of replicas of data to be read or written
W – Number of nodes that must participate in successful write operation
R – Number of machines contacted in read operation

Put()
–
–
–

•

Generic load-balancer that will select a node based on load
Partition aware client-library which routes request to right node

Upon receiving put() request coordinator generates vector clock and writes data locally
Coordinator sends new version (along with vector clock) to top N nodes from preference list
If at least W-1 nodes respond, write is successful

Get()
–
–
–
–

Coordinator requests all existing version of data from top N nodes from preference list
Waits for at least R nodes to respond
In case of multiple versions, coordinator send all casually unrelated versions to client
Client reconcile divergent versions and supersede current version with reconciled version
Conditional Writes
• Multiple clients can access the same item and try to update the same
attributes
Conditional Writes
• Multiple clients can access the same item and try to update the same
attributes

• idempotent operation – Running same request multiple times will have
not effect
Strongly consistent vs Eventually Consistent
• Multiple copies of same item to ensure durability
• Takes time for update to propagate multiple copies
• Eventually consistent
– After write happens, immediate read may not give latest value

• Strongly consistency
– Requires additional read capacity unit
– Gets most up-to-date version of item value

• Eventually consistent read consumes half the read capacity
unit as strongly consistent read
Provisioned throughput
• Configuring read and write capacity
• Read capacity
– 1 read unit = 4KB (item will be rounded off to multiple of 4KB)
– 1 read capacity = 1 read unit for strong consistency or 2 read unit for
eventual consistency
– 1 read capacity = 4KB for strong consistency or 8KB for eventual
consistency

• Write capacity
– 1 write unit = 1KB (item will be rounded off to multiple of 1KB)
– 1 write capacity = 1 write unit
– 1 write capacity = 1KB
Provisioned throughput
Read Capacity provisioning
Expected
Item Size

Consistency

Desired Reads Per Provisioned Throughput
Second
Required

4 KB

Strongly consistent

50

50

8 KB

Strongly consistent

50

100

4 KB

Eventually consistent

50

25

8 KB

Eventually consistent

50

50

Write Capacity provisioning
Expected
Item Size

Desired Writes Per
Second

Provisioned
Throughput
Required

1 KB

50

50

2 KB

50

100
Operations vs Provisioned throughput
Operation

Description

Provisioning
throughput

Calculation

GetItem

Reads only 1 item

Item rounded off to
multiple of 4KB

Item size = 400 bytes
Read capacity = 1 (eventually
consistent)

BatchGetItem

Reads multiple items
(max 1MB or 100
items)

Every item rounded
off to multiple of
4KB

Item size = 400 bytes
Number of items 100
Read capacity = 50
(eventually consistent)

Query

Reads multiple items
(max 1MB or 100
items)

Total output is
rounded off to
multiple of 4KB

Item size = 400 bytes
Number of items 100
Read capacity = 10
(eventually consistent)

Scan

Reads entire table and
then apply filter

Size of table
rounded off to
multiple of 4KB

Item size = 400 bytes
Number of records in table =
10000
Read capacity = 500
(eventually consistent)
Query vs Scan
• Query
–
–
–
–
–
–
–

Search based on primary key and examines only matching data
Maximum 1MB result
Query result sorted by range key
Query result can be opted to give strong consistency
Query can be done based on secondary index
Requires less provisioned throughput
Query performance depends on amount of data retrieved

• Scan
–
–
–
–
–
–
–

Examines every item and then applies filter
Maximum 1MB result
Scan result is always eventually consistent
Required high provisioned throughput
Scan performance depends on table size
Secondary index does not have any impact on scan performance
Parallel scan can be performed using TotalSegments
DynamoDB Parameters
• LastEvaluatedKey
– Key of last returned value for operation > 1 MB
– If LastEvaluatedKey is null then all data is transfered

• ExclusiveStartKey
– Starting point of second query request for operation > 1MB

• Count
– During request if count = true, dynamoDB provides only count(*) from table

• ScannedCount
–

Total number of items scanned in scan operation before applying any filter

• Limit
– Limit number of rows in result

• Segments
– Segment to be scanned by a worker in parallel scan

• TotalSegments
– Degree of parallelism set in parallel scan
Secondary Index
• For efficiently accessing data using keys other than primary
key
• Can only be created on scalar data type
• Can include projections from table into index
• Primary key attributes always part of secondary index
• DynamoDB copies items from table to index
• 2 types of secondary index
– Local secondary index
– Global secondary index

• Access to index similar as access to table
• Indexes are maintained automatically
• Maximum 5 local and 5 global secondary indexes for a table
Local vs Global Secondary Index
• Local Secondary Index
–
–
–
–
–
–
–

Hash + Range key
Has the same hash key as primary key. Range key defers
For each hash key total size of all indexed items <= 10GB
We can choose between eventual consistency or strong consistency
Consumes read capacity units from table
Write to table consumes write capacity from table to update index
Can request other table attributes which are not part of local index

• Global Secondary index
–
–
–
–
–
–

Hash or Hash + Range
Hash key different that hash key of primary key
No Size restrictions
Supports only eventual consistency
Has own provisioning throughput and doesn’t use table’s throughput
Can only request attributes which are part of global index
Item Collections
• Items with same hash key
• Includes table items and all local secondary indexes
• Max size allowed for any item is 10GB (applicable to tables with secondary
indexes)
• If size exceeds, dynamoDB won’t be able to add any data to item
• Table and index data for same item is stored in same partition
• View item sizes using ReturnItemCollectionMetrics parameter
Limitations of DynamoDB
•
•
•
•
•
•
•
•

64KB limit on item size (row size)
1 MB limit on fetching data
Pay more if you want strongly consistent data
Size is multiple of 4KB (provisioning throughput wastage)
Cannot join tables
Indexes should be created during table creation only
No triggers or server side scripts
Limited comparison capability (no not_null, contains etc)
Table Operations – Console
•

https://console.aws.amazon.com/dynamodb - fc-dba-rds@amzon.com / <password>

If you already have tables
Table Operations – Console
Table Operations – Console
Table Operations – Console
Table Operations – Console
Table Operations – Console
Table Operations – Console
Table Operations – Console
Table Operations - CLI
Install AWS CLI using pip:
pip install awscli

** Needs python 2.7
Command structure - aws <command> <sub-command> [options and parameters]
<command>:

autoscaling
cloudformation
cloudwatch
Configure
directconnec t

Dynamodb
ec2
Elasticache
elasticbeanstalk
elastictranscoder

Elb
Help
iam
importexport
opsworks

rds
<sub-command>:
batch-get-item
batch-write-item
create-table
delete-item
delete-table

describe-table
get-item
help
list-tables
put-item

query
scan
update-item
update-table

redshift
route53
s3
s3api

ses
sns

sqs
storagegateway
Table Operations - CLI
mydesktop$ aws configure
AWS Access Key ID [None]: <access key>
AWS Secret Access Key [None]: <secret key>
Default region name [None]: us-east-1
Default output format [None]: table
Output formats – [ JSON, table, text ]
mydesktop$ aws dynamodb list-tables
-------------| ListTables |
+------------+
||TableNames||
|+----------+|
|| emp
||
|+----------+|
mydesktop$ aws dynamodb list-tables --output=text
emp
mydesktop$
mydesktop$ aws dynamodb create-table help
create-table
--attribute-definitions <value>
--table-name <value>
--key-schema <value>
[--local-secondary-indexes <value>]
--provisioned-throughput <value>
Table Operations - CLI
mydesktop$ aws dynamodb describe-table --table-name=emp
----------------------------------------------------------------------------------|
DescribeTable
|
+---------------------------------------------------------------------------------+
||
Table
||
|+-------------------+------------+------------+-----------------+---------------+|
|| CreationDateTime | ItemCount | TableName | TableSizeBytes | TableStatus ||
|+-------------------+------------+------------+-----------------+---------------+|
|| 1391691188.38
| 0
| emp
| 0
| ACTIVE
||
|+-------------------+------------+------------+-----------------+---------------+|
|||
AttributeDefinitions
|||
||+-------------------------------------+---------------------------------------+||
|||
AttributeName
|
AttributeType
|||
||+-------------------------------------+---------------------------------------+||
||| dept_id
| N
|||
||| emp_id
| N
|||
||| emp_name
| S
|||
||| location
| S
|||
||+-------------------------------------+---------------------------------------+||
|||
GlobalSecondaryIndexes
|||
||+------------------------+--------------------+----------------+--------------+||
|||
IndexName
| IndexSizeBytes
| IndexStatus
| ItemCount
|||
||+------------------------+--------------------+----------------+--------------+||
||| dept_id_global_idx
| 0
| ACTIVE
| 0
|||
||+------------------------+--------------------+----------------+--------------+||
Table Operations - CLI
mydesktop$ aws dynamodb create-table --table-name=test --attribute-definitions
AttributeName=col1,AttributeType=N AttributeName=col2,AttributeType=S --key-schema
AttributeName=col1,KeyType=HASH AttributeName=col2,KeyType=RANGE --provisionedthroughput=ReadCapacityUnits=1,WriteCapacityUnits=1
----------------------------------------------------------------------------------|
CreateTable
|
+---------------------------------------------------------------------------------+
||
TableDescription
||
|+-------------------+------------+------------+-----------------+---------------+|
|| CreationDateTime | ItemCount | TableName | TableSizeBytes | TableStatus ||
|+-------------------+------------+------------+-----------------+---------------+|
|| 1391697357.73
| 0
| test
| 0
| CREATING
||
|+-------------------+------------+------------+-----------------+---------------+|
|||
AttributeDefinitions
|||
||+-------------------------------------+---------------------------------------+||
|||
AttributeName
|
AttributeType
|||
||+-------------------------------------+---------------------------------------+||
||| col1
| N
|||
||| col2
| S
|||
||+-------------------------------------+---------------------------------------+||
|||
KeySchema
|||
||+----------------------------------------------+------------------------------+||
|||
AttributeName
|
KeyType
|||
||+----------------------------------------------+------------------------------+||
||| col1
| HASH
|||
||| col2
| RANGE
|||
Table Operations - CLI
JSON Format:
attribute-definition

Key-schema

provisioned-throughput

[

[

{
{

{

"AttributeName": "col1",
"KeyType": "HASH"

"AttributeName": "col1",
"AttributeType": "N"
},
{

},
{

"AttributeName": "col2",
"KeyType": "RANGE"

"AttributeName": "col2",
"AttributeType": "S"

}

}
]

]

"ReadCapacityUnits": 5,
"WriteCapacityUnits": 5
}
Table Operations - CLI
aws dynamodb create-table --table-name=test1 --attribute-definitions '[
{
"AttributeName": "col1",
"AttributeType": "N"
},
{
"AttributeName": "col2",
"AttributeType": "S"
}
]' --key-schema '[
{
"AttributeName": "col1",
"KeyType": "HASH"
},
{
"AttributeName": "col2",
"KeyType": "RANGE"
}
]' --provisioned-throughput '{
"ReadCapacityUnits": 5,
"WriteCapacityUnits": 5
}'
boto – AWS Python Interface
Install git:
wget http://kernel.org/pub/software/scm/git/git-1.8.5.tar.bz2
tar xvfj git-1.8.5.tar.bz2
cd git-1.8.5/
./configure
make
make install
advaitd-2.desktop$ whereis git
git: /usr/local/bin/git

Install boto:
git clone git://github.com/boto/boto.git
cd boto
python setup.py install
*** Needs python 2.7 and above
boto – AWS Python Interface
>>> import boto.dynamodb
>>> conn = boto.dynamodb.connect_to_region(
... 'us-east-1',
... aws_access_key_id=’<access key>',
... aws_secret_access_key=’<secret key>')
>>> conn
<boto.dynamodb.layer2.Layer2 object at 0x21ade90>
>>> conn.list_tables()
['emp', 'test', 'test1']
Create table:

>>> message_table_schema = conn.create_schema(
... hash_key_name='forum_name',
... hash_key_proto_value=str,
... range_key_name='subject',
... range_key_proto_value=str
... )
>>> table = conn.create_table(
... name='messages',
... schema=message_table_schema,
... read_units=10,
... write_units=10
... )
>>> print table
Table(messages)
References
•
•
•
•
•
•
•

https://docs.aws.amazon.com/amazondynamodb/
http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf
http://home.aubg.bg/students/ENL100/Cloud%20Computing/Research%20Paper/nosqldbs.p
df
http://docs.aws.amazon.com/cli/latest/reference/dynamodb/index.html - CLI reference
http://aws.amazon.com/sdkforpython/ - SDK for python
http://docs.pythonboto.org/en/latest/dynamodb_tut.html - boto tutorial and command
reference
http://en.wikipedia.org/wiki/Consistent_hashing

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Aws config
Aws configAws config
Aws config
 
Intro to AWS: Database Services
Intro to AWS: Database ServicesIntro to AWS: Database Services
Intro to AWS: Database Services
 
AWS Lambda를 기반으로한 실시간 빅테이터 처리하기
AWS Lambda를 기반으로한 실시간 빅테이터 처리하기AWS Lambda를 기반으로한 실시간 빅테이터 처리하기
AWS Lambda를 기반으로한 실시간 빅테이터 처리하기
 
AWS Route53
AWS Route53AWS Route53
AWS Route53
 
Introduction to AWS Step Functions
Introduction to AWS Step FunctionsIntroduction to AWS Step Functions
Introduction to AWS Step Functions
 
Deep Dive on Amazon RDS (Relational Database Service)
Deep Dive on Amazon RDS (Relational Database Service)Deep Dive on Amazon RDS (Relational Database Service)
Deep Dive on Amazon RDS (Relational Database Service)
 
AWS Cloud Watch
AWS Cloud WatchAWS Cloud Watch
AWS Cloud Watch
 
AWS CloudWatch
AWS CloudWatchAWS CloudWatch
AWS CloudWatch
 
ABCs of AWS: S3
ABCs of AWS: S3ABCs of AWS: S3
ABCs of AWS: S3
 
Identity and Access Management: The First Step in AWS Security
Identity and Access Management: The First Step in AWS SecurityIdentity and Access Management: The First Step in AWS Security
Identity and Access Management: The First Step in AWS Security
 
AWS Cloud Formation
AWS Cloud Formation AWS Cloud Formation
AWS Cloud Formation
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database Service
 
AWS August Webinar Series - Services Overview
AWS August Webinar Series - Services Overview AWS August Webinar Series - Services Overview
AWS August Webinar Series - Services Overview
 
AWS Simple Storage Service (s3)
AWS Simple Storage Service (s3) AWS Simple Storage Service (s3)
AWS Simple Storage Service (s3)
 
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
 
Access Control for the Cloud: AWS Identity and Access Management (IAM) (SEC20...
Access Control for the Cloud: AWS Identity and Access Management (IAM) (SEC20...Access Control for the Cloud: AWS Identity and Access Management (IAM) (SEC20...
Access Control for the Cloud: AWS Identity and Access Management (IAM) (SEC20...
 
Introduction to Amazon Elastic File System (EFS)
Introduction to Amazon Elastic File System (EFS)Introduction to Amazon Elastic File System (EFS)
Introduction to Amazon Elastic File System (EFS)
 
AWS Monitoring & Logging
AWS Monitoring & LoggingAWS Monitoring & Logging
AWS Monitoring & Logging
 
AWS EBS
AWS EBSAWS EBS
AWS EBS
 
Amazon CloudWatch Tutorial | AWS Certification | Cloud Monitoring Tools | AWS...
Amazon CloudWatch Tutorial | AWS Certification | Cloud Monitoring Tools | AWS...Amazon CloudWatch Tutorial | AWS Certification | Cloud Monitoring Tools | AWS...
Amazon CloudWatch Tutorial | AWS Certification | Cloud Monitoring Tools | AWS...
 

Andere mochten auch

Andere mochten auch (6)

What's New with Amazon DynamoDB - AWS Online Tech Talks
What's New with Amazon DynamoDB - AWS Online Tech TalksWhat's New with Amazon DynamoDB - AWS Online Tech Talks
What's New with Amazon DynamoDB - AWS Online Tech Talks
 
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
 
Gluecon 2012 - DynamoDB
Gluecon 2012 - DynamoDBGluecon 2012 - DynamoDB
Gluecon 2012 - DynamoDB
 
Webinar | Introduction to Amazon DynamoDB
Webinar | Introduction to Amazon DynamoDBWebinar | Introduction to Amazon DynamoDB
Webinar | Introduction to Amazon DynamoDB
 
(SDD424) Simplifying Scalable Distributed Applications Using DynamoDB Streams...
(SDD424) Simplifying Scalable Distributed Applications Using DynamoDB Streams...(SDD424) Simplifying Scalable Distributed Applications Using DynamoDB Streams...
(SDD424) Simplifying Scalable Distributed Applications Using DynamoDB Streams...
 
Masterclass Webinar: Amazon DynamoDB July 2014
Masterclass Webinar: Amazon DynamoDB July 2014Masterclass Webinar: Amazon DynamoDB July 2014
Masterclass Webinar: Amazon DynamoDB July 2014
 

Ähnlich wie Dynamodb Presentation

Ähnlich wie Dynamodb Presentation (20)

AWS Webcast - Build high-scale applications with Amazon DynamoDB
AWS Webcast - Build high-scale applications with Amazon DynamoDBAWS Webcast - Build high-scale applications with Amazon DynamoDB
AWS Webcast - Build high-scale applications with Amazon DynamoDB
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
 
AWS Analytics
AWS AnalyticsAWS Analytics
AWS Analytics
 
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
 
Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
How and when to use NoSQL
How and when to use NoSQLHow and when to use NoSQL
How and when to use NoSQL
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the fly
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
AWS Study Group - Chapter 09 - Storage Option [Solution Architect Associate G...
AWS Study Group - Chapter 09 - Storage Option [Solution Architect Associate G...AWS Study Group - Chapter 09 - Storage Option [Solution Architect Associate G...
AWS Study Group - Chapter 09 - Storage Option [Solution Architect Associate G...
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
 
Amazon Kinesis Firehose - Pop-up Loft TLV 2017
Amazon Kinesis Firehose - Pop-up Loft TLV 2017Amazon Kinesis Firehose - Pop-up Loft TLV 2017
Amazon Kinesis Firehose - Pop-up Loft TLV 2017
 
Serverless Web Apps using API Gateway, Lambda and DynamoDB
Serverless Web Apps using API Gateway, Lambda and DynamoDBServerless Web Apps using API Gateway, Lambda and DynamoDB
Serverless Web Apps using API Gateway, Lambda and DynamoDB
 
Data Analysis on AWS
Data Analysis on AWSData Analysis on AWS
Data Analysis on AWS
 
Processing and Analytics
Processing and AnalyticsProcessing and Analytics
Processing and Analytics
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Kürzlich hochgeladen (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Dynamodb Presentation

  • 2. Agenda • • • • • • • • • • • • • • • Why noSQL ? noSQL World ACID vs CAP DynamoDB – What is it ? DynamoDB Architecture Conditional Writes Strongly consistent vs Eventually consistent Provisioned throughput Query vs Scan DynamoDB parameters Secondary Index Item Collections Limitations AWS CLI Table operations
  • 3. Why noSQL ? • • • • • • • • • Data Velocity – Huge data getting generated in less time Data Variety – Structured, semi-structures and unstructured Data Volume – Terabytes and petabytes of data Dynamic Schema – Flexible data model Auto-sharding – Horizontal scalability Continuous Availability Integrated Caching Replication – Automatic replication to support high availability Dynamic provisioned throughput
  • 4. Why noSQL ? – Scalability advantage
  • 5. Why noSQL ? – Scalability advantage
  • 6. noSQL world Database Type Properties Example Key-values Stores • • Simplest noSQL datbase Every item is a set of key-value pair DynamoDB (amazon), Amazon SimpleDB (amazon), Valdemort (linkedin), MemcacheDB (zinga), Azure (microsoft), Redis (VMWare) Column Family Stores • Optimized for queries over large Hbase (powerset), Cassandra datasets (facebook), BigTable (Google) Contains key-value pair where key if mapped to set of colums In analogy with RDBMS, a column family is a “table” • • Document Databases • • Graph Databases • pair each key with a complex data structure called “document” Documents can contain many different key-value pairs CouchDB, MongoDB (10gen) Stores information about network, social connections InfiniteGraph, Neo4J (Neo tech), InfoGrid, FlockDB (Twitter)
  • 7. ACID vs CAP ACID properties relates to relational databases • Atomicity – Requires each transaction to be all or nothing • Consistency – Brings databases from one valid state to another • Isolation – Takes care of concurrent execution and avoid overwritting • Durability – Commited transactions will remain so in the event of power loss, crashes etc
  • 8. ACID vs CAP CAP theorem relates to noSQL database (pick any two) • Consistency – All nodes see the same data at the same time • Availability – A guarantee that every request receives a response about whether it was successful or failed • Partition tolerance – The system continues to operate despite arbitrary message loss or failure of part of the system)
  • 10. Amazon DynamoDB – What is it ? • • • • Fully managed nosql database service on AWS Data model in the form of tables Data stored in the form of items (name – value attributes) Automatic scaling – Provisioned throughput – Storage scaling – Distributed architecture • Easy Administration • Monitoring of tables using CloudWatch • Integration with EMR (Elastic MapReduce) – Analyze data and store in S3
  • 11. Amazon DynamoDB – What is it ? key=value key=value key=value key=value Table Item (64KB max) Attributes • Primary key (mandatory for every table) – Hash or Hash + Range • Data model in the form of tables • Data stored in the form of items (name – value attributes) • Secondary Indexes for improved performance – Local secondary index – Global secondary index • Scalar data type (number, string etc) or multi-valued data type (sets)
  • 12. DynamoDB Architecture • • • • True distributed architecture Data is spread across hundreds of servers called storage nodes Hundreds of servers form a cluster in the form of a “ring” Client application can connect using one of the two approaches – – • • • • • • Routing using a load balancer Client-library that reflects Dynamo’s partitioning scheme and can determine the storage host to connect Advantage of load balancer – no need for dynamo specific code in client application Advantage of client-library – saves 1 network hop to load balancer Synchronous replication is not achievable for high availability and scalability requirement at amazon DynamoDB is designed to be “always writable” storage solution Allows multiple versions of data on multiple storage nodes Conflict resolution happens while reads and NOT during writes – – Syntactic conflict resolution Symantec conflict resolution
  • 13. DynamoDB Architecture - Partitioning • • • Data is partitioned over multiple hosts called storage nodes (ring) Uses consistent hashing to dynamically partition data across storage hosts Two problems associated with consistent hashing – Hashing of storage hosts can cause imbalance of data and load – Consistent hashing treats every storage host as same capacity A [3,4] 1 4 B [1] 2 3 C[2] Initial Situation
  • 14. DynamoDB Architecture - Partitioning • • • Data is partitioned over multiple hosts called storage nodes (ring) Uses consistent hashing to dynamically partition data across storage hosts Two problems associated with consistent hashing – Hashing of storage hosts can cause imbalance of data and load – Consistent hashing treats every storage host as same capacity A[4] A [3,4] 1 4 4 B [1] B[1] D[2,3] 2 2 3 1 3 C[2] Initial Situation Situation after C left and D joined
  • 15. DynamoDB Architecture - Partitioning Solution – Virtual hosts mapped to physical hosts (tokens) Number of virtual hosts for a physical host depends on capacity of physical host Virtual nodes are function-mapped to physical nodes A B D Physical Host C B C J I H • • • Virtual Host
  • 16. DynamoDB Architecture – Membership Change • • Gossip protocol used for node membership Every second, storage node randomly contact a peer to bilaterally reconcile persisted membership history Doing so, membership changes are spread and eventually consistent membership view is formed When a new members joins, adjacent nodes adjust their object and replica ownership When a member leaves adjacent nodes adjust their object and replica and distributes keys owned by leaving member • • • H A G B F Replication factor = 3 E C D
  • 17. DynamoDB Architecture – Membership Change • • Gossip protocol used for node membership Every second, storage node randomly contact a peer to bilaterally reconcile persisted membership history Doing so, membership changes are spread and eventually consistent membership view is formed When a new members joins, adjacent nodes adjust their object and replica ownership When a member leaves adjacent nodes adjust their object and replica and distributes keys owned by leaving member • • • H A G B F Replication factor = 3 E C D
  • 18. DynamoDB Architecture – Membership Change • • Gossip protocol used for node membership Every second, storage node randomly contact a peer to bilaterally reconcile persisted membership history Doing so, membership changes are spread and eventually consistent membership view is formed When a new members joins, adjacent nodes adjust their object and replica ownership When a member leaves adjacent nodes adjust their object and replica and distributes keys owned by leaving member • • • H A G B F Replication factor = 3 E C D
  • 19. DynamoDB Architecture – Replication Hash value of Key K A Preference list A B D B D Physical Host C B C I H • Each data item is replicated N times – replication factor Storage node in charge of storing key K replicates to its N-1 successor clockwise Every node has “Preference list” which has mapping of which key belongs where based on consistent hashing Preference list skips virtual nodes and contains only distinct physical nodes J • • • Virtual Host
  • 20. DynamoDB Architecture – Data Versioning • • • • • • • • • • • • Eventual Consistency – data propagates asynchronously Its possible to have multiple version on different storage nodes If latest data is not available on storage node, client will update the old version of data Conflict resolution is done using “vector clock” Vector clock is a metadata information added to data when using get() or put() Vector clock effectively a list of (node, counter) pairs One vector clock is associated with every version of every object One can determine two version has causal ordering or parallel branches by examining vector clocks Client specify vector clock information while reading or writing data in the form of “context” Dynamo tries to resolve conflict among multiple version using syntactic reconciliation If syntactic reconciliation does not work, client has to resolve conflict using semantic reconciliation Application should be designed to acknowledge multiple version and should have logic to reconcile the same
  • 21. DynamoDB Architecture – Data Versioning Sx Client Add 2 items [Sx,1] Sy Sz Time
  • 22. DynamoDB Architecture – Data Versioning Sx Client Add 2 items [Sx,1] Add 1 item [Sx,2] Sy Sz Time
  • 23. DynamoDB Architecture – Data Versioning Sx Client Add 2 items [Sx,1] Add 1 item [Sx,2] Add delete 1 item [Sx,2][Sy,1] Sy Sz Time
  • 24. DynamoDB Architecture – Data Versioning Sx Client Add 2 items [Sx,1] Add 1 item [Sx,2] Sy Add delete 1 item [Sx,2][Sy,1] Sz Add 3 items [Sx,2][Sz,1] Time
  • 25. DynamoDB Architecture – Data Versioning Sx Client Add 2 items [Sx,1] Final Cart – 5 items [Sx,3][Sy,1][Sz,1] Add 1 item [Sx,2] Sy Add delete 1 item [Sx,2][Sy,1] Sz Add 3 items [Sx,2][Sz,1] Time
  • 26. DynamoDB Architecture – Data Versioning Syntactic reconciliation (handled by dynamoDB as [Sx,2] > [Sx,1]) Sx Client Add 2 items [Sx,1] Final Cart – 5 items [Sx,3][Sy,1][Sz,1] Add 1 item [Sx,2] Sy Add delete 1 item [Sx,2][Sy,1] Sz Add 3 items [Sx,2][Sz,1] Time
  • 27. DynamoDB Architecture – Data Versioning Syntactic reconciliation (handled by dynamoDB as [Sx,2] > [Sx,1]) Sx Client Add 2 items [Sx,1] Symantec reconciliation (handled by client application) Is [Sx,2][Sy,1] > [Sx,2][Sz,1] ??? Final Cart – 5 items [Sx,3][Sy,1][Sz,1] Add 1 item [Sx,2] Sy Add delete 1 item [Sx,2][Sy,1] Sz Add 3 items [Sx,2][Sz,1] Time
  • 28. DynamoDB Architecture – Data Versioning Syntactic reconciliation (handled by dynamoDB as [Sx,2] > [Sx,1]) Sx Client Sy Sz Add 2 items [Sx,1] Symantec reconciliation (handled by client application) Is [Sx,2][Sy,1] > [Sx,2][Sz,1] ??? Final Cart – 5 items [Sx,3][Sy,1][Sz,1] Add 1 item [Sx,2] Add delete 1 item [Sx,2][Sy,1] Syntactic reconciliation (handled by dynamoDB as [Sx,1][Sz,1] > [Sx,1]) Add 3 items [Sx,2][Sz,1] Add 1 item [Sx,1][Sz,1] Time
  • 29. DynamoDB Architecture – Data Versioning Syntactic reconciliation (handled by dynamoDB as [Sx,2] > [Sx,1]) Sx Client Sy Sz Add 2 items [Sx,1] Symantec reconciliation (handled by client application) Is [Sx,2][Sy,1] > [Sx,2][Sz,1] ??? Final Cart – 5 items [Sx,3][Sy,1][Sz,1] Add 1 item [Sx,2] Add delete 1 item [Sx,2][Sy,1] Syntactic reconciliation (handled by dynamoDB as [Sx,1][Sz,1] > [Sx,1]) Add 3 items [Sx,2][Sz,1] Add 1 item [Sx,1][Sz,1] Time If counters on first object’s vector clock are less than or equal to all of the nodes in second vector clock, then first is an ancestor of second and can be forgotten
  • 30. DynamoDB Architecture – get() and put() • 2 strategies client uses to select a node – – • • • • Node handling read and write operation is called “coordinator” Coordinator should be among first in top N nodes in preference list For reads and writes dynamo uses consistency protocol similar to quorum system Consistency protocol has 3 variables – – – • N – Number of replicas of data to be read or written W – Number of nodes that must participate in successful write operation R – Number of machines contacted in read operation Put() – – – • Generic load-balancer that will select a node based on load Partition aware client-library which routes request to right node Upon receiving put() request coordinator generates vector clock and writes data locally Coordinator sends new version (along with vector clock) to top N nodes from preference list If at least W-1 nodes respond, write is successful Get() – – – – Coordinator requests all existing version of data from top N nodes from preference list Waits for at least R nodes to respond In case of multiple versions, coordinator send all casually unrelated versions to client Client reconcile divergent versions and supersede current version with reconciled version
  • 31. Conditional Writes • Multiple clients can access the same item and try to update the same attributes
  • 32. Conditional Writes • Multiple clients can access the same item and try to update the same attributes • idempotent operation – Running same request multiple times will have not effect
  • 33. Strongly consistent vs Eventually Consistent • Multiple copies of same item to ensure durability • Takes time for update to propagate multiple copies • Eventually consistent – After write happens, immediate read may not give latest value • Strongly consistency – Requires additional read capacity unit – Gets most up-to-date version of item value • Eventually consistent read consumes half the read capacity unit as strongly consistent read
  • 34. Provisioned throughput • Configuring read and write capacity • Read capacity – 1 read unit = 4KB (item will be rounded off to multiple of 4KB) – 1 read capacity = 1 read unit for strong consistency or 2 read unit for eventual consistency – 1 read capacity = 4KB for strong consistency or 8KB for eventual consistency • Write capacity – 1 write unit = 1KB (item will be rounded off to multiple of 1KB) – 1 write capacity = 1 write unit – 1 write capacity = 1KB
  • 35. Provisioned throughput Read Capacity provisioning Expected Item Size Consistency Desired Reads Per Provisioned Throughput Second Required 4 KB Strongly consistent 50 50 8 KB Strongly consistent 50 100 4 KB Eventually consistent 50 25 8 KB Eventually consistent 50 50 Write Capacity provisioning Expected Item Size Desired Writes Per Second Provisioned Throughput Required 1 KB 50 50 2 KB 50 100
  • 36. Operations vs Provisioned throughput Operation Description Provisioning throughput Calculation GetItem Reads only 1 item Item rounded off to multiple of 4KB Item size = 400 bytes Read capacity = 1 (eventually consistent) BatchGetItem Reads multiple items (max 1MB or 100 items) Every item rounded off to multiple of 4KB Item size = 400 bytes Number of items 100 Read capacity = 50 (eventually consistent) Query Reads multiple items (max 1MB or 100 items) Total output is rounded off to multiple of 4KB Item size = 400 bytes Number of items 100 Read capacity = 10 (eventually consistent) Scan Reads entire table and then apply filter Size of table rounded off to multiple of 4KB Item size = 400 bytes Number of records in table = 10000 Read capacity = 500 (eventually consistent)
  • 37. Query vs Scan • Query – – – – – – – Search based on primary key and examines only matching data Maximum 1MB result Query result sorted by range key Query result can be opted to give strong consistency Query can be done based on secondary index Requires less provisioned throughput Query performance depends on amount of data retrieved • Scan – – – – – – – Examines every item and then applies filter Maximum 1MB result Scan result is always eventually consistent Required high provisioned throughput Scan performance depends on table size Secondary index does not have any impact on scan performance Parallel scan can be performed using TotalSegments
  • 38. DynamoDB Parameters • LastEvaluatedKey – Key of last returned value for operation > 1 MB – If LastEvaluatedKey is null then all data is transfered • ExclusiveStartKey – Starting point of second query request for operation > 1MB • Count – During request if count = true, dynamoDB provides only count(*) from table • ScannedCount – Total number of items scanned in scan operation before applying any filter • Limit – Limit number of rows in result • Segments – Segment to be scanned by a worker in parallel scan • TotalSegments – Degree of parallelism set in parallel scan
  • 39. Secondary Index • For efficiently accessing data using keys other than primary key • Can only be created on scalar data type • Can include projections from table into index • Primary key attributes always part of secondary index • DynamoDB copies items from table to index • 2 types of secondary index – Local secondary index – Global secondary index • Access to index similar as access to table • Indexes are maintained automatically • Maximum 5 local and 5 global secondary indexes for a table
  • 40. Local vs Global Secondary Index • Local Secondary Index – – – – – – – Hash + Range key Has the same hash key as primary key. Range key defers For each hash key total size of all indexed items <= 10GB We can choose between eventual consistency or strong consistency Consumes read capacity units from table Write to table consumes write capacity from table to update index Can request other table attributes which are not part of local index • Global Secondary index – – – – – – Hash or Hash + Range Hash key different that hash key of primary key No Size restrictions Supports only eventual consistency Has own provisioning throughput and doesn’t use table’s throughput Can only request attributes which are part of global index
  • 41. Item Collections • Items with same hash key • Includes table items and all local secondary indexes • Max size allowed for any item is 10GB (applicable to tables with secondary indexes) • If size exceeds, dynamoDB won’t be able to add any data to item • Table and index data for same item is stored in same partition • View item sizes using ReturnItemCollectionMetrics parameter
  • 42. Limitations of DynamoDB • • • • • • • • 64KB limit on item size (row size) 1 MB limit on fetching data Pay more if you want strongly consistent data Size is multiple of 4KB (provisioning throughput wastage) Cannot join tables Indexes should be created during table creation only No triggers or server side scripts Limited comparison capability (no not_null, contains etc)
  • 43. Table Operations – Console • https://console.aws.amazon.com/dynamodb - fc-dba-rds@amzon.com / <password> If you already have tables
  • 51. Table Operations - CLI Install AWS CLI using pip: pip install awscli ** Needs python 2.7 Command structure - aws <command> <sub-command> [options and parameters] <command>: autoscaling cloudformation cloudwatch Configure directconnec t Dynamodb ec2 Elasticache elasticbeanstalk elastictranscoder Elb Help iam importexport opsworks rds <sub-command>: batch-get-item batch-write-item create-table delete-item delete-table describe-table get-item help list-tables put-item query scan update-item update-table redshift route53 s3 s3api ses sns sqs storagegateway
  • 52. Table Operations - CLI mydesktop$ aws configure AWS Access Key ID [None]: <access key> AWS Secret Access Key [None]: <secret key> Default region name [None]: us-east-1 Default output format [None]: table Output formats – [ JSON, table, text ] mydesktop$ aws dynamodb list-tables -------------| ListTables | +------------+ ||TableNames|| |+----------+| || emp || |+----------+| mydesktop$ aws dynamodb list-tables --output=text emp mydesktop$ mydesktop$ aws dynamodb create-table help create-table --attribute-definitions <value> --table-name <value> --key-schema <value> [--local-secondary-indexes <value>] --provisioned-throughput <value>
  • 53. Table Operations - CLI mydesktop$ aws dynamodb describe-table --table-name=emp ----------------------------------------------------------------------------------| DescribeTable | +---------------------------------------------------------------------------------+ || Table || |+-------------------+------------+------------+-----------------+---------------+| || CreationDateTime | ItemCount | TableName | TableSizeBytes | TableStatus || |+-------------------+------------+------------+-----------------+---------------+| || 1391691188.38 | 0 | emp | 0 | ACTIVE || |+-------------------+------------+------------+-----------------+---------------+| ||| AttributeDefinitions ||| ||+-------------------------------------+---------------------------------------+|| ||| AttributeName | AttributeType ||| ||+-------------------------------------+---------------------------------------+|| ||| dept_id | N ||| ||| emp_id | N ||| ||| emp_name | S ||| ||| location | S ||| ||+-------------------------------------+---------------------------------------+|| ||| GlobalSecondaryIndexes ||| ||+------------------------+--------------------+----------------+--------------+|| ||| IndexName | IndexSizeBytes | IndexStatus | ItemCount ||| ||+------------------------+--------------------+----------------+--------------+|| ||| dept_id_global_idx | 0 | ACTIVE | 0 ||| ||+------------------------+--------------------+----------------+--------------+||
  • 54. Table Operations - CLI mydesktop$ aws dynamodb create-table --table-name=test --attribute-definitions AttributeName=col1,AttributeType=N AttributeName=col2,AttributeType=S --key-schema AttributeName=col1,KeyType=HASH AttributeName=col2,KeyType=RANGE --provisionedthroughput=ReadCapacityUnits=1,WriteCapacityUnits=1 ----------------------------------------------------------------------------------| CreateTable | +---------------------------------------------------------------------------------+ || TableDescription || |+-------------------+------------+------------+-----------------+---------------+| || CreationDateTime | ItemCount | TableName | TableSizeBytes | TableStatus || |+-------------------+------------+------------+-----------------+---------------+| || 1391697357.73 | 0 | test | 0 | CREATING || |+-------------------+------------+------------+-----------------+---------------+| ||| AttributeDefinitions ||| ||+-------------------------------------+---------------------------------------+|| ||| AttributeName | AttributeType ||| ||+-------------------------------------+---------------------------------------+|| ||| col1 | N ||| ||| col2 | S ||| ||+-------------------------------------+---------------------------------------+|| ||| KeySchema ||| ||+----------------------------------------------+------------------------------+|| ||| AttributeName | KeyType ||| ||+----------------------------------------------+------------------------------+|| ||| col1 | HASH ||| ||| col2 | RANGE |||
  • 55. Table Operations - CLI JSON Format: attribute-definition Key-schema provisioned-throughput [ [ { { { "AttributeName": "col1", "KeyType": "HASH" "AttributeName": "col1", "AttributeType": "N" }, { }, { "AttributeName": "col2", "KeyType": "RANGE" "AttributeName": "col2", "AttributeType": "S" } } ] ] "ReadCapacityUnits": 5, "WriteCapacityUnits": 5 }
  • 56. Table Operations - CLI aws dynamodb create-table --table-name=test1 --attribute-definitions '[ { "AttributeName": "col1", "AttributeType": "N" }, { "AttributeName": "col2", "AttributeType": "S" } ]' --key-schema '[ { "AttributeName": "col1", "KeyType": "HASH" }, { "AttributeName": "col2", "KeyType": "RANGE" } ]' --provisioned-throughput '{ "ReadCapacityUnits": 5, "WriteCapacityUnits": 5 }'
  • 57. boto – AWS Python Interface Install git: wget http://kernel.org/pub/software/scm/git/git-1.8.5.tar.bz2 tar xvfj git-1.8.5.tar.bz2 cd git-1.8.5/ ./configure make make install advaitd-2.desktop$ whereis git git: /usr/local/bin/git Install boto: git clone git://github.com/boto/boto.git cd boto python setup.py install *** Needs python 2.7 and above
  • 58. boto – AWS Python Interface >>> import boto.dynamodb >>> conn = boto.dynamodb.connect_to_region( ... 'us-east-1', ... aws_access_key_id=’<access key>', ... aws_secret_access_key=’<secret key>') >>> conn <boto.dynamodb.layer2.Layer2 object at 0x21ade90> >>> conn.list_tables() ['emp', 'test', 'test1'] Create table: >>> message_table_schema = conn.create_schema( ... hash_key_name='forum_name', ... hash_key_proto_value=str, ... range_key_name='subject', ... range_key_proto_value=str ... ) >>> table = conn.create_table( ... name='messages', ... schema=message_table_schema, ... read_units=10, ... write_units=10 ... ) >>> print table Table(messages)