SlideShare ist ein Scribd-Unternehmen logo
1 von 69
Yan Cui
@theburningmonk
Server-side Developer @
iwi by numbers
• 400k+ DAU
• ~100m requests/day
• 25k+ concurrent users
• 1500+ requests/s
• 7000+ cache opts/s
• 100+ commodity servers (EC2 small instance)
• 75ms average latency
Sign Posts
• Why NOSQL?

• Types of NOSQL DBs

• NOSQL In Practice

• Q&A
A look at the…

CURRENT TRENDS
Digital Universe
2000
                                                       1.8 ZettaBytes!!


1600



1200



 800



 400

       161 ExaBytes

   0
          2006        2007       2008    2009   2010         2011
Big Data

“…data sets whose size is beyond the
 ability of commonly used software tools
 to capture, manage and process within a
 tolerable elapsed time…”
Big Data
  Unit                     Symbol            Bytes
Kilobyte                     KB               1024
Megabyte                    MB              1048576
Gigabyte                     GB            1073741824
Terabyte                     TB          1099511627776
            PAIN-O-Meter



Petabyte                     PB         1125899906842624
Exabyte                      EB       1152921504606846976
Zettabyte                    ZB      1180591620717411303424
Yottabyte                    YB     1208925819614629174706176
Vertical Scaling
Server                                  Cost
PowerEdge T110 II (basic)
                                        $1,350
8 GB, 3.1 Ghz Quad 4T
PowerEdge T110 II (basic)
                                        $12,103
32 GB, 3.4 Ghz Quad 8T
PowerEdge C2100
                                        $19,960
192 GB, 2 x 3 Ghz
IBM System x3850 X5
                                        $646,605
2048 GB, 8 x 2.4 Ghz
Blue Gene/P
                                        $1,300,000
14 teraflops, 4096 CPUs
K Computer (fastest super computer)     $10,000,000
10 petaflops, 705,024 cores, 1,377 TB   annual operating cost
Horizontal Scaling
• Incremental scaling

• Cost grows incrementally

• Easy to scale down

• Linear gains
Hardware Vendor
Here’s an alternative…

INTRODUCING NOSQL
NOSQL is …
• No SQL

• Not Only SQL

• A movement away from relational model

• Consisted of 4 main types of DBs
NOSQL is …
• Hard

• A new dimension of trade-offs

• CAP theorem
CAP Theorem
                           A   Availability:
                               Each client can always
                               read and write data




Consistency:                                       Partition Tolerant:
All clients have the                               System works despite
same view of data                                  network partitions


                  C                             P
NOSQL DBs are …
• Specialized for particular use cases

• Non-relational

• Semi-structured

• Horizontally scalable (usually)
Motivations
• Horizontal Scalability

• Low Latency

• Cost

• Minimize Downtime
Motivations
Use the right tool for the right job!
RDBMS
• CAN scale horizontally (via sharding)
• Manual client side hashing
• Cross-server queries are difficult
• Loses ACIDcity
• Schema update = PAIN
TYPES OF NOSQL DBS
Types Of NOSQL DBs
• Key-Value Store

• Document Store

• Column Database

• Graph Database
Key-Value Store
 “key”           “value”

           101110100110101001100
           110100100100010101011
morpheus   101010101010110000101
           000110011111010110000
           101000111110001100000
Key-Value Store
• It’s a Hash
• Basic get/put/delete ops
• Crazy fast!
• Easy to scale horizontally
• Membase, Redis, ORACLE…
Document Store
 “key”            “document”

           {
               name : “Morpheus”,
morpheus       rank : “Captain”,
               occupation: “Total badass”
           }
Document Store
• Document = self-contained piece of data

• Semi-structured data

• Querying

• MongoDB, RavenDB…
Column Database
Name              Last Name Age    Rank   Occupation Version Language
Thomas Anderson             29
Morpheus                          Captain Total badass
Cypher             Reagan
Agent Smith                                              1.0b   C++
The Architect
Column Database
• Data stored by column

• Semi-structured data

• Cassandra, HBase, …
Graph Database
                               name = “Morpheus”
                               rank = “Captain”
name = “Thomas Anderson”       occupation = “Total badass”            name = “Cypher”
age = 29                                                              last name = “Reagan”        name = “The Architect”

                                        7                                   3
            1                                                                                                  9
                                                disclosure = public


                                                         disclosure = secret
   age = 3 days                                          age = 6 months                                      CODED_BY


                      2                                                                       5
                  name = “Trinity”                                                    name = “Agent Smith”
                                                                                      version = 1.0b
                                                                                      language = C++
Graph Database
• Nodes, properties, edges

• Based on graph theory

• Node adjacency instead of indices

• Neo4j, VertexDB, …
Real-world use cases for NoSQL DBs...

NOSQL IN PRACTICE
Redis
• Remote dictionary server

• Key-Value store

• In-memory, persistent

• Data structures
Redis
                            Sorted Sets
Lists




        Sets           Hashes
Redis
Redis in Practice #1

COUNTERS
Counters



• Potentially massive numbers of ops

• Valuable data, but not mission critical
Counters
• Lots of row contention in SQL

• Requires lots of transactions
Counters
• Redis has atomic incr/decr
  INCR          Increments value by 1
  INCRBY        Increments value by given amount
  DECR          Decrements value by 1
  DECRBY        Decrements value by given amount
Counters

           Image by Mike Rohde
Redis in Practice #2

RANDOM ITEMS
Random Items
• Give user a random article
• SQL implementation
  – select count(*) from TABLE
  – var n = random.Next(0, (count – 1))
  – select * from TABLE where primary_key = n
  – inefficient, complex
Random Items
• Redis has built-in randomize operation
  SRANDMEMBER   Gets a random member from a set
Random Items
• About sets:
 – 0 to N unique elements

 – Unordered

 – Atomic add
Random Items

          Image by Mike Rohde
Redis in Practice #3

PRESENCE
Presence
• Who’s online?

• Needs to be scalable

• Pseudo-real time
Presence
• Each user ‘checks-in’ once every 3 mins
        00:22am 00:23am 00:24am 00:25am 00:26am

          A       C
                           E       A       ?
          B       D


               A, C, D & E are online at 00:26am
Presence
• Redis natively supports set operations
  SADD           Add item(s) to a set
  SREM           Remove item(s) from a set
  SINTER         Intersect multiple sets
  SUNION         Union multiple sets
  SRANDMEMBER    Gets a random member from a set
  ...            ...
Presence

           Image by Mike Rohde
Redis in Practice #4

LEADERBOARDS
Leaderboards
• Gamification

• Users ranked by some score
Leaderboards
• About sorted sets:
 – Similar to a set

 – Every member is associated with a score

 – Elements are taken in order
Leaderboards
• Redis has ‘Sorted Sets’
  ZADD        Add/update item(s) to a sorted set
  ZRANK       Get item’s rank in a sorted set (low -> high)
  ZREVRANK    Get item’s rank in a sorted set (high -> low)
  ZRANGE      Get range of items, by rank (low -> high)
  ZREVRANGE   Get range of items, by rank (high -> low)
  ...         ...
Leaderboards

         Image by Mike Rohde
Redis in Practice #5

QUEUES
Queues
• Redis has push/pop support for lists
  LPOP      Remove and get the 1st item in a list
  LPUSH     Prepend item(s) to a list
  RPOP      Remove and get the last item in a list
  RPUSH     Append item(s) to a list

• Allows you to use list as queue/stack
Queues
• Redis supports ‘blocking’ pop
  BLPOP     Remove and get the 1st item in a list, or
            block until one is available
  BRPOP     Remove and get the last item in a list, or
            block until one is available


• Message queues without polling!
Queues

         Image by Mike Rohde
Redis
• Supports data structures

• No built-in clustering

• Master-slave replication

• Redis Cluster is on the way...
Before we go...

SUMMARIES
Considerations
• In memory?

• Disk-backed persistence?

• Managed? Database As A Service?

• Cluster support?
SQL or NoSQL?
• Wrong question
• What’s your problem?
 – Transactions
 – Amount of data
 – Data structure
http://blog.nahurst.com/visual-guide-to-nosql-systems
Dynamo DB
• Fully managed
• Provisioned through-put
• Predictable cost & performance
• SSD-backed
• Auto-replicated
Google BigQuery
• Game changer for Analytics industry
• Analyze billions of rows in seconds
• SQL-like query syntax
• Prediction API
• NOT a database system
Scalability
• Success can come unexpectedly and
 quickly

• Not just about the DB
Thank You!
@theburningmonk

Weitere ähnliche Inhalte

Ähnlich wie Introduction to NoSQL

Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataRoger Xia
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Saltmarch Media
 
NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011David Funaro
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Chris Richardson
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectMorningstar Tech Talks
 
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modellingSDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modellingKorea Sdec
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Jon Haddad
 
PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterMat Keep
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLRichard Schneeman
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandraBrian Enochson
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesBernd Ocklin
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxData Con LA
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydbDaniel Austin
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to CassandraJon Haddad
 

Ähnlich wie Introduction to NoSQL (20)

Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?
 
NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011
 
Mongodb my
Mongodb myMongodb my
Mongodb my
 
MongoDB
MongoDBMongoDB
MongoDB
 
MongoDB
MongoDBMongoDB
MongoDB
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
How and when to use NoSQL
How and when to use NoSQLHow and when to use NoSQL
How and when to use NoSQL
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
 
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modellingSDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
 
PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL Cluster
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQL
 
Mongodb lab
Mongodb labMongodb lab
Mongodb lab
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydb
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 

Mehr von Yan Cui

How to win the game of trade-offs
How to win the game of trade-offsHow to win the game of trade-offs
How to win the game of trade-offsYan Cui
 
How to choose the right messaging service
How to choose the right messaging serviceHow to choose the right messaging service
How to choose the right messaging serviceYan Cui
 
How to choose the right messaging service for your workload
How to choose the right messaging service for your workloadHow to choose the right messaging service for your workload
How to choose the right messaging service for your workloadYan Cui
 
Patterns and practices for building resilient serverless applications.pdf
Patterns and practices for building resilient serverless applications.pdfPatterns and practices for building resilient serverless applications.pdf
Patterns and practices for building resilient serverless applications.pdfYan Cui
 
Lambda and DynamoDB best practices
Lambda and DynamoDB best practicesLambda and DynamoDB best practices
Lambda and DynamoDB best practicesYan Cui
 
Lessons from running AppSync in prod
Lessons from running AppSync in prodLessons from running AppSync in prod
Lessons from running AppSync in prodYan Cui
 
Serverless observability - a hero's perspective
Serverless observability - a hero's perspectiveServerless observability - a hero's perspective
Serverless observability - a hero's perspectiveYan Cui
 
How to ship customer value faster with step functions
How to ship customer value faster with step functionsHow to ship customer value faster with step functions
How to ship customer value faster with step functionsYan Cui
 
How serverless changes the cost paradigm
How serverless changes the cost paradigmHow serverless changes the cost paradigm
How serverless changes the cost paradigmYan Cui
 
Why your next serverless project should use AWS AppSync
Why your next serverless project should use AWS AppSyncWhy your next serverless project should use AWS AppSync
Why your next serverless project should use AWS AppSyncYan Cui
 
Build social network in 4 weeks
Build social network in 4 weeksBuild social network in 4 weeks
Build social network in 4 weeksYan Cui
 
Patterns and practices for building resilient serverless applications
Patterns and practices for building resilient serverless applicationsPatterns and practices for building resilient serverless applications
Patterns and practices for building resilient serverless applicationsYan Cui
 
How to bring chaos engineering to serverless
How to bring chaos engineering to serverlessHow to bring chaos engineering to serverless
How to bring chaos engineering to serverlessYan Cui
 
Migrating existing monolith to serverless in 8 steps
Migrating existing monolith to serverless in 8 stepsMigrating existing monolith to serverless in 8 steps
Migrating existing monolith to serverless in 8 stepsYan Cui
 
Building a social network in under 4 weeks with Serverless and GraphQL
Building a social network in under 4 weeks with Serverless and GraphQLBuilding a social network in under 4 weeks with Serverless and GraphQL
Building a social network in under 4 weeks with Serverless and GraphQLYan Cui
 
FinDev as a business advantage in the post covid19 economy
FinDev as a business advantage in the post covid19 economyFinDev as a business advantage in the post covid19 economy
FinDev as a business advantage in the post covid19 economyYan Cui
 
How to improve lambda cold starts
How to improve lambda cold startsHow to improve lambda cold starts
How to improve lambda cold startsYan Cui
 
What can you do with lambda in 2020
What can you do with lambda in 2020What can you do with lambda in 2020
What can you do with lambda in 2020Yan Cui
 
A chaos experiment a day, keeping the outage away
A chaos experiment a day, keeping the outage awayA chaos experiment a day, keeping the outage away
A chaos experiment a day, keeping the outage awayYan Cui
 
How to debug slow lambda response times
How to debug slow lambda response timesHow to debug slow lambda response times
How to debug slow lambda response timesYan Cui
 

Mehr von Yan Cui (20)

How to win the game of trade-offs
How to win the game of trade-offsHow to win the game of trade-offs
How to win the game of trade-offs
 
How to choose the right messaging service
How to choose the right messaging serviceHow to choose the right messaging service
How to choose the right messaging service
 
How to choose the right messaging service for your workload
How to choose the right messaging service for your workloadHow to choose the right messaging service for your workload
How to choose the right messaging service for your workload
 
Patterns and practices for building resilient serverless applications.pdf
Patterns and practices for building resilient serverless applications.pdfPatterns and practices for building resilient serverless applications.pdf
Patterns and practices for building resilient serverless applications.pdf
 
Lambda and DynamoDB best practices
Lambda and DynamoDB best practicesLambda and DynamoDB best practices
Lambda and DynamoDB best practices
 
Lessons from running AppSync in prod
Lessons from running AppSync in prodLessons from running AppSync in prod
Lessons from running AppSync in prod
 
Serverless observability - a hero's perspective
Serverless observability - a hero's perspectiveServerless observability - a hero's perspective
Serverless observability - a hero's perspective
 
How to ship customer value faster with step functions
How to ship customer value faster with step functionsHow to ship customer value faster with step functions
How to ship customer value faster with step functions
 
How serverless changes the cost paradigm
How serverless changes the cost paradigmHow serverless changes the cost paradigm
How serverless changes the cost paradigm
 
Why your next serverless project should use AWS AppSync
Why your next serverless project should use AWS AppSyncWhy your next serverless project should use AWS AppSync
Why your next serverless project should use AWS AppSync
 
Build social network in 4 weeks
Build social network in 4 weeksBuild social network in 4 weeks
Build social network in 4 weeks
 
Patterns and practices for building resilient serverless applications
Patterns and practices for building resilient serverless applicationsPatterns and practices for building resilient serverless applications
Patterns and practices for building resilient serverless applications
 
How to bring chaos engineering to serverless
How to bring chaos engineering to serverlessHow to bring chaos engineering to serverless
How to bring chaos engineering to serverless
 
Migrating existing monolith to serverless in 8 steps
Migrating existing monolith to serverless in 8 stepsMigrating existing monolith to serverless in 8 steps
Migrating existing monolith to serverless in 8 steps
 
Building a social network in under 4 weeks with Serverless and GraphQL
Building a social network in under 4 weeks with Serverless and GraphQLBuilding a social network in under 4 weeks with Serverless and GraphQL
Building a social network in under 4 weeks with Serverless and GraphQL
 
FinDev as a business advantage in the post covid19 economy
FinDev as a business advantage in the post covid19 economyFinDev as a business advantage in the post covid19 economy
FinDev as a business advantage in the post covid19 economy
 
How to improve lambda cold starts
How to improve lambda cold startsHow to improve lambda cold starts
How to improve lambda cold starts
 
What can you do with lambda in 2020
What can you do with lambda in 2020What can you do with lambda in 2020
What can you do with lambda in 2020
 
A chaos experiment a day, keeping the outage away
A chaos experiment a day, keeping the outage awayA chaos experiment a day, keeping the outage away
A chaos experiment a day, keeping the outage away
 
How to debug slow lambda response times
How to debug slow lambda response timesHow to debug slow lambda response times
How to debug slow lambda response times
 

Kürzlich hochgeladen

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 

Kürzlich hochgeladen (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 

Introduction to NoSQL

  • 3. iwi by numbers • 400k+ DAU • ~100m requests/day • 25k+ concurrent users • 1500+ requests/s • 7000+ cache opts/s • 100+ commodity servers (EC2 small instance) • 75ms average latency
  • 4. Sign Posts • Why NOSQL? • Types of NOSQL DBs • NOSQL In Practice • Q&A
  • 5. A look at the… CURRENT TRENDS
  • 6. Digital Universe 2000 1.8 ZettaBytes!! 1600 1200 800 400 161 ExaBytes 0 2006 2007 2008 2009 2010 2011
  • 7. Big Data “…data sets whose size is beyond the ability of commonly used software tools to capture, manage and process within a tolerable elapsed time…”
  • 8. Big Data Unit Symbol Bytes Kilobyte KB 1024 Megabyte MB 1048576 Gigabyte GB 1073741824 Terabyte TB 1099511627776 PAIN-O-Meter Petabyte PB 1125899906842624 Exabyte EB 1152921504606846976 Zettabyte ZB 1180591620717411303424 Yottabyte YB 1208925819614629174706176
  • 9.
  • 10. Vertical Scaling Server Cost PowerEdge T110 II (basic) $1,350 8 GB, 3.1 Ghz Quad 4T PowerEdge T110 II (basic) $12,103 32 GB, 3.4 Ghz Quad 8T PowerEdge C2100 $19,960 192 GB, 2 x 3 Ghz IBM System x3850 X5 $646,605 2048 GB, 8 x 2.4 Ghz Blue Gene/P $1,300,000 14 teraflops, 4096 CPUs K Computer (fastest super computer) $10,000,000 10 petaflops, 705,024 cores, 1,377 TB annual operating cost
  • 11. Horizontal Scaling • Incremental scaling • Cost grows incrementally • Easy to scale down • Linear gains
  • 12.
  • 14.
  • 16. NOSQL is … • No SQL • Not Only SQL • A movement away from relational model • Consisted of 4 main types of DBs
  • 17. NOSQL is … • Hard • A new dimension of trade-offs • CAP theorem
  • 18. CAP Theorem A Availability: Each client can always read and write data Consistency: Partition Tolerant: All clients have the System works despite same view of data network partitions C P
  • 19. NOSQL DBs are … • Specialized for particular use cases • Non-relational • Semi-structured • Horizontally scalable (usually)
  • 20. Motivations • Horizontal Scalability • Low Latency • Cost • Minimize Downtime
  • 21. Motivations Use the right tool for the right job!
  • 22. RDBMS • CAN scale horizontally (via sharding) • Manual client side hashing • Cross-server queries are difficult • Loses ACIDcity • Schema update = PAIN
  • 24. Types Of NOSQL DBs • Key-Value Store • Document Store • Column Database • Graph Database
  • 25. Key-Value Store “key” “value” 101110100110101001100 110100100100010101011 morpheus 101010101010110000101 000110011111010110000 101000111110001100000
  • 26. Key-Value Store • It’s a Hash • Basic get/put/delete ops • Crazy fast! • Easy to scale horizontally • Membase, Redis, ORACLE…
  • 27. Document Store “key” “document” { name : “Morpheus”, morpheus rank : “Captain”, occupation: “Total badass” }
  • 28. Document Store • Document = self-contained piece of data • Semi-structured data • Querying • MongoDB, RavenDB…
  • 29. Column Database Name Last Name Age Rank Occupation Version Language Thomas Anderson 29 Morpheus Captain Total badass Cypher Reagan Agent Smith 1.0b C++ The Architect
  • 30. Column Database • Data stored by column • Semi-structured data • Cassandra, HBase, …
  • 31. Graph Database name = “Morpheus” rank = “Captain” name = “Thomas Anderson” occupation = “Total badass” name = “Cypher” age = 29 last name = “Reagan” name = “The Architect” 7 3 1 9 disclosure = public disclosure = secret age = 3 days age = 6 months CODED_BY 2 5 name = “Trinity” name = “Agent Smith” version = 1.0b language = C++
  • 32. Graph Database • Nodes, properties, edges • Based on graph theory • Node adjacency instead of indices • Neo4j, VertexDB, …
  • 33. Real-world use cases for NoSQL DBs... NOSQL IN PRACTICE
  • 34. Redis • Remote dictionary server • Key-Value store • In-memory, persistent • Data structures
  • 35. Redis Sorted Sets Lists Sets Hashes
  • 36. Redis
  • 37. Redis in Practice #1 COUNTERS
  • 38. Counters • Potentially massive numbers of ops • Valuable data, but not mission critical
  • 39. Counters • Lots of row contention in SQL • Requires lots of transactions
  • 40. Counters • Redis has atomic incr/decr INCR Increments value by 1 INCRBY Increments value by given amount DECR Decrements value by 1 DECRBY Decrements value by given amount
  • 41. Counters Image by Mike Rohde
  • 42. Redis in Practice #2 RANDOM ITEMS
  • 43. Random Items • Give user a random article • SQL implementation – select count(*) from TABLE – var n = random.Next(0, (count – 1)) – select * from TABLE where primary_key = n – inefficient, complex
  • 44. Random Items • Redis has built-in randomize operation SRANDMEMBER Gets a random member from a set
  • 45. Random Items • About sets: – 0 to N unique elements – Unordered – Atomic add
  • 46. Random Items Image by Mike Rohde
  • 47. Redis in Practice #3 PRESENCE
  • 48. Presence • Who’s online? • Needs to be scalable • Pseudo-real time
  • 49. Presence • Each user ‘checks-in’ once every 3 mins 00:22am 00:23am 00:24am 00:25am 00:26am A C E A ? B D A, C, D & E are online at 00:26am
  • 50. Presence • Redis natively supports set operations SADD Add item(s) to a set SREM Remove item(s) from a set SINTER Intersect multiple sets SUNION Union multiple sets SRANDMEMBER Gets a random member from a set ... ...
  • 51. Presence Image by Mike Rohde
  • 52. Redis in Practice #4 LEADERBOARDS
  • 54. Leaderboards • About sorted sets: – Similar to a set – Every member is associated with a score – Elements are taken in order
  • 55. Leaderboards • Redis has ‘Sorted Sets’ ZADD Add/update item(s) to a sorted set ZRANK Get item’s rank in a sorted set (low -> high) ZREVRANK Get item’s rank in a sorted set (high -> low) ZRANGE Get range of items, by rank (low -> high) ZREVRANGE Get range of items, by rank (high -> low) ... ...
  • 56. Leaderboards Image by Mike Rohde
  • 57. Redis in Practice #5 QUEUES
  • 58. Queues • Redis has push/pop support for lists LPOP Remove and get the 1st item in a list LPUSH Prepend item(s) to a list RPOP Remove and get the last item in a list RPUSH Append item(s) to a list • Allows you to use list as queue/stack
  • 59. Queues • Redis supports ‘blocking’ pop BLPOP Remove and get the 1st item in a list, or block until one is available BRPOP Remove and get the last item in a list, or block until one is available • Message queues without polling!
  • 60. Queues Image by Mike Rohde
  • 61. Redis • Supports data structures • No built-in clustering • Master-slave replication • Redis Cluster is on the way...
  • 63. Considerations • In memory? • Disk-backed persistence? • Managed? Database As A Service? • Cluster support?
  • 64. SQL or NoSQL? • Wrong question • What’s your problem? – Transactions – Amount of data – Data structure
  • 66. Dynamo DB • Fully managed • Provisioned through-put • Predictable cost & performance • SSD-backed • Auto-replicated
  • 67. Google BigQuery • Game changer for Analytics industry • Analyze billions of rows in seconds • SQL-like query syntax • Prediction API • NOT a database system
  • 68. Scalability • Success can come unexpectedly and quickly • Not just about the DB

Hinweis der Redaktion

  1. 5 exabytes of data from the dawn of civilization to 2003. Now we generate that much data every 2 days.
  2. The challenge facing many developers operating within the web/social space is how to cope with ever increasing volumes of data, and that challenge is commonly referred to as ‘Big Data’. Given that the size of the digital universe is predicated to continue to grow exponentially for the foreseeable future, life is not gonna get easier for us developers anytime soon!
  3. Just how big does your data have to be for it to be considered a ‘Big Data’? Understandably, it is a moving target, but generally speaking, when you cross over the terabyte threshold you’re starting to step into the ‘Big Data’ zone of pain.
  4. So how exactly do we tame the beast that is ‘Big Data’?
  5. The traditional wisdom says that we should get bigger servers! And sure, it’ll work, to some extent, but it’ll cost you! In fact, the further up the food chain you go, the less value you get for your money as the cost of the hardware goes up exponentially.
  6. If you consider scaling purely as a function of cost, then if you can keep your cost under control and make sure that it increases proportionally to the increases in scale then it’s happy days all around! You’re happy, your boss is happy, marketing’s happy, and the shareholders are happy.On the other hand, if you choose to fight big data with big hardware, then your cost to scale ratio is likely to clime significantly, leaving you out of pocket. And when everyone decides to play that game, it’ll undoubtedly make some people very happy...
  7. ...but unless you’re in the business of selling expensive hardware to developers you’re probably not the one laughing...And since most of that hardware investment is made up-front, as a company, possibly a start up, you’ll be taking on a significant risk and god forbid if things don’t pan out for you...
  8. In 2000, Eric Brewer gave a keynote speech at the ACM Symposium on the Principles of Distributed Computing, in which he said that as applications become more web-based we should stop worrying about data consistency, because if we want high availability in these new distributed applications, then guaranteed consistency of data is something we cannot have.There are three core systemic requirements that exists in a special relationship when it comes to designing and deploying applications in a distributed environment – Consistency, Availability and Partition Tolerance.
  9. A service that is Consistent operates fully or not at all. (Consistent here differs from the C in ACID which describes a property of database transactions that ensure data will never be persisted that breaks certain pre-set constraints) This usually translates to the idea that multiple values for the same piece of data are not allowed.Availability means just that – a service is available. Funny thing about availability is that it most often deserts you when you need it the most – during busy periods. A service that’s available but not accessible is no benefit to anyone.A service that is Partition Tolerant can survive network partitions.The CAP theorem says that you can only have two of the three.
  10. Before we move onto NoSQL databases, I just want to make it clear that IT IS POSSIBLE to scale horizontally with traditional RDBMS. However, there’s a number of drawbacks: you have to implement client-side hashing yourself, which is not that hard and even some of the NoSQL DBs don’t provide clustering out of the box and requires manual implementation for client side hashing once you’ve sharded your db, it means queries against a particular table now needs to be made across all the sharded nodes, making the orchestration and collection of results more complex also, cross-node transactions is almost a no-go, and it’s difficult to enforce consistency and isolation in a distributed environment too, some specialized NoSQL DBs are designed to solve that problem but to force a similar solution onto a general purposed RDBMS is a recipe for disaster schema updates on a large db is painful, schema update on a massive multi-node db cluster is a pain worse than death...
  11. Redis is very good at quirky stuff you’d never thought of using a database for before!
  12. Atomicity – a transaction is all or nothing.Consistency – only valid data is written to the database.Isolation – pretend all transactions are happening serially and the data is correct.Durability – what you write is what you get.Problem with ACID is that trying to guarantee atomic transactions across multiple nodes and making sure that all data is consistent and update is HARD. To guarantee ACID under load is down right impossible, which was the premises of Eric Brewer’s CAP theorem as we saw earlier.However, to minimise downtime, we need multiple nodes to handle node failures, and to make a scalable system we also need many nodes to handle lots and lots of reads and writes.
  13. If you can’t have all of the ACID guarantees you can still have two of CAP, which again, stands for:Consistency – data is correct all the timeAvailability – you can read and write your data all the timePartitionTolerance – if one or more node fails the system still works and becomes consistent when the system comes onlineIf you drop the consistency guarantee and accept that things will become ‘eventually consistent’ then you can start building highly scalable systems using an architectural approach known as BASE:Basically Available – system seems to work all the timeSoft State – the state doesn’t have to be consistent all the timeEventually Consistent – becomes consistent at some later time
  14. And lastly, I’d like to make a honorary mention of a new product from Google that’s likely going to be a complete and utter game changer for the analytics industry.With BigQuery, you can easily load billions of rows of data from Google Cloud Storage in CSV format and start running ad-hoc analysis over them in seconds.To make queries against data table in BigQuery, you can use a SQL-like syntax and output the summary data to a Google spreadsheet directly. In fact, you can write your queries in ‘app script’ and trigger them directly from the Google spreadsheet as you would a macro in Excel!There is also a Predication API which makes analysing your data to give predication a snip!However, it’s still early days and there are a lot of limitations on table joins. And you need to remember that BigQuery is NOT a database system, it doesn’t support table indexes or other database management features. But it’s a great tool for running analysis on vast amounts of data at a great speed.