SlideShare ist ein Scribd-Unternehmen logo
1 von 51
NOSQL, COUCHDB
 AND THE CLOUD
    Brad Anderson
       Cloudant




          1
BRAD ANDERSON

• BS   Hotel Management

• Restaurant   Chain Data - econometric modeling, BI/DW

• Open   Source - trac, dsource.org, couchdb

• NOSQLEast     2009

• Cloudant

• http://twitter.com/boorad


                                2
AGENDA

• NOSQL

• COUCHDB

 •   Erlang

• Cloud

 •   Dynamo

 •   MapReduce


                   3
YOU’RE SCREWED




http://www.bigfatmoneybags.com/blog/wp-content/uploads/2009/12/screwed.jpg

                                                                             4
RELATIONAL DATABASES


                                                           RDBMS
• Rigid         Schema / ORM fun
                                                           1970-2009
• Scalability

• Everything                 is a Nail



http://www.flickr.com/photos/36041246@N00/3419197777/

                                                       5
MASTER-SLAVE
•   Master-Slave Replication
    •   One (and only one) master
    •   One or more slaves
    •   All writes go to the master, replicated to slaves
    •   Reads balanced among master and slaves
•   Issues
    •   single point of failure
    •   single point of bottleneck
    •   static topology
                                       6
MASTER-MASTER

•   Master-Master Replication
    •   One or more masters
    •   Writes and reads can go to any master
    •   Writes are replicated among masters
•   Issues
    •   limited performance and scalability (typically due to 2PC)
    •   complexity
    •   static topology

                                      7
VERTICAL PARTITION

•   Vertical Partitioning
    •   Put tables belonging to different functional areas on different
        database nodes
        •   Scale data & load by function
        •   Move joins to the application level
•   Issues
    •   no longer truly relational
    •   a functional area grows too much
                                        8
HORIZONTAL PARTITION

•   Horizontal Partitioning
    •   Split tables by key and put partitions (shards) on different nodes
        •   Scale data & load by key
        •   Move joins to the application level
•   Issues
    •   no longer truly relational
    •   a partition grows too much

                                        9
CACHING

•   Put a cache in front of your database
    •   Distribute
    •   Write-through for scaling reads
    •   Write-behind for scaling reads and writes
•   Issues
    •   “only” scales read/write load
    •   invalidation

                                        10
OKAY, NOT SCREWED




http://www.bigfatmoneybags.com/blog/wp-content/uploads/2009/12/screwed.jpg

                                                                             11
NOSQL

        NOT ONLY SQL
   A moniker for different data storage systems
         solving very different problems,
all where a relational database is not the right fit.

                         12
RIGHT FIT


• Google   indexes 400 Pb / day (2007)

• CERN, LHC       generates 100 Pb / sec

• Unique   data created each year (IDC, 2007)
 •   2007 40 Eb

 •   2010 988 Eb (exponential growth)



                                    13
FOUR CATEGORIES
• Key/Value     Stores
   •   Dynomite, Voldemort, Tokyo

• Document       Stores
   •   CouchDB, MongoDB

• Column     Stores / BigTable
   •   HBase, Hypertable

• Graph    Databases
   •   Neo4j, AllegroGraph, VertexDB


                                       14
BIG TAKEAWAY

                                                           function

       function                                             data
                                                function              function

                                                 data                  data



                                     function                                    function
data              data
                                      data                                        data
data              data

data              data
                                     function                                    function
data              data
                                      data                                        data
data              data

                                                function              function

                                                 data                  data
                                                           function

                                                            data




                  Bring the function to the data
                                15
16
HUH? ERLANG?


• Programming    Language created at Ericsson (20 yrs
 old now)

• Designed   for scalable, long-lived systems

• Compiled, Functional, Dynamically Typed, Open
 Source




                                  17
3 BIGGIES
• Massively      Concurrent

    •   green threads, very lightweight != os threads


• Seamlessly       Distributed

    •   node = os thread = VM, processes can live anywhere


• Fault Tolerant

    •   99.9999999 = 32ms downtime per year - AXD301



                                                 18
Of
                  fi   cia
                         lB
                              et
                                a!




CouchDB
 Apache


          19
COUCHDB
• Schema-free      document database server

• Robust, highly   concurrent, fault-tolerant

• RESTful   JSON API

• Futon   web admin console

• MapReduce     system for generating custom views

• Bi-directional   incremental replication

• couchapp: lightweight
                    HTML+JavaScript apps served directly
 from CouchDB using views to transform JSON
                                   20
FROM INTEREST TO ADOPTION




• 100+   production users          • Active
                                          commercial
•3
                                    development
     books being written
                                   • Rapidly   maturing
• Vibrant, open   community   21
OF THE WEB

   Django may be built for the Web, but
  CouchDB is built of the Web. I've never
seen software that so completely embraces
  the philosophies behind HTTP ... this is
 what the software of the future looks like.


                Jacob Kaplan-Moss
                 October 17 2007

   http://jacobian.org/writing/of-the-web/
                       22
DOCUMENTS




• Documents   are JSON Objects

• Underscore-prefixed   fields are reserved

• Documents   can have binary attachments

• MVCC   _rev deterministically generated from doc content
                               23
ROBUST

• Never   overwrite previously committed data

• In
   the event of a server crash or power failure, just restart
 CouchDB -- there is no “repair”

• Take   snapshots with “cp”

• Configurable levels of durability: can choose to fsync after
 every update, or less often to gain better throughput



                                24
CONCURRENT

• Erlang
       approach: lightweight processes to model the natural
 concurrency in a problem

• For   CouchDB that means one process per TCP connection

• Lock-free
          architecture; each process works with an MVCC
 snapshot of a DB.

• Performance   degrades gracefully under heavy concurrent load


                               25
REST API
• Create
 PUT /mydb/mydocid

• Retrieve
 GET /mydb/mydocid

• Update
 PUT /mydb/mydocid

• Delete
 DELETE /mydb/mydocid

                        26
27
VIEWS
• Custom, persistent   representations of document data

• “Closeto the metal” -- no dynamic queries in production, so
 you know exactly what you’re getting

• Generated using MapReduce functions written in JavaScript
 (and other languages)

     view must have a map function and may also have a
• Each
 reduce function

• Leverages   view collation, rich view query API
                                 28
DOCUMENTS BY AUTHOR




         29
WORD COUNT




    30
INCREMENTAL
• Computing   a view can be expensive, so CouchDB saves the
 result in a B-tree and keeps it up-to-date

• Leafnodes store map results, inner nodes store reductions of
 children




 http://horicky.blogspot.com/2008/10/couchdb-implementation.html
                               31
REPLICATION
• Peer-based, bi-directional   replication using normal HTTP calls

• Mediated  by a replicator process which can live on the
 source, target, or somewhere else entirely

• Replicate
          a subset of documents in a DB meeting criteria
 defined in a custom filter function (coming soon)

• Applications   (_design documents) replicate along with the
 data

• Ideal   for offline applications -- “ground computing”
                                   32
CLOUD




  33
SHOWROOM
 A cluster of couches




          34
ARCHITECTURE

• Each   cluster is a ring of nodes (Dynamo, Dynomite)

• Any    node can handle request (consistent hashing)

  • O(1), with   a hop

• nodes    own partitions (ring is divided)

• data   are distributed evenly across partitions and replicas

• mapreduce     functions are passed to nodes for execution
RESEARCH


• Google’s   MapReduce, http://bit.ly/bJbyq5

• Amazon’s   Dynamo, http://bit.ly/b7FlsN

• CAP   theorem, http://bit.ly/bERr2H
CLUSTER CONTROLS

•N   - Replication
                     Q
•Q   - Partitions = 2

•R   - Read Quorum

•W   - Write Quorum



• These   constants define the cluster
N


                                              Consistency
Throughput
                                               Durability




    N = Number of replicas per item stored in cluster
Q


Throughput                                    Scalability




     2^Q = Number of partitions (shards) in cluster
           T = Number of nodes in cluster
       2^Q / T = Number of partitions per node
R


Latency                                      Consistency




          R = Number of successful reads before
               returning value(s) to client
W


Latency                                   Durability




      W = Number of successful writes before
           returning ‘success’ to client
Load Balancer




                                Node 1

            24                                           No
       de                A     B     C       D              de
    No                                           B
                                                                 2
                     A
                 Z                                   C
     Y                                                      D
X                                                                    E


                                                                         C       N
                                                                                  od
                                                                                     e
                                                                             D           3

                                                                                 E

                                                                                             F




                                                                                                 D



                                                                                                             No
                                                                                                             de
                                                                                                     E



                                                                                                              4
                                                                                                         F
                                                                                                             G
request

    PUT http://boorad.cloudant.com/dbname/blah?w=2




                             Load Balancer




                                Node 1

            24                                           No
       de                A     B     C       D              de
    No                                           B
                                                                 2
                     A
                 Z                                   C
      Y                                                     D
X                                                                    E


                                                                         C       N
                                                                                  od
                                                                                     e
                                                                             D           3

                                                                                 E

                                                                                             F




                                                                                                 D



                                                                                                             No
                                                                                                             de
                                                                                                     E



                                                                                                              4
                                                                                                         F
                                                                                                             G
request

    PUT http://boorad.cloudant.com/dbname/blah?w=2




                             Load Balancer




                                Node 1

            24                                           No
       de                A     B     C       D              de
    No                                           B
                                                                 2
                     A
                 Z                                   C
      Y                                                     D
X                                                                    E


                                                                         C       N
                                                                                  od
                                                                                     e
                                                                             D           3

                                                                                 E

                                                                                             F




                                                                                                 D



                                                                                                             No
                                                                                                             de
                                                                                                     E



                                                                                                              4
                                                                                                         F
                                                                                                             G
request

    PUT http://boorad.cloudant.com/dbname/blah?w=2




                             Load Balancer




                                Node 1

            24                                           No
       de                A     B     C       D              de
    No                                           B
                                                                 2
                     A
                 Z                                   C
      Y                                                     D
X                        hash(blah) = E                              E


                                                                         C       N
                                                                                  od
                                                                                     e
                                                                             D           3

                                                                                 E

                                                                                             F




                                                                                                 D



                                                                                                             No
                                                                                                             de
                                                                                                     E



                                                                                                              4
                                                                                                         F
                                                                                                             G
request

    PUT http://boorad.cloudant.com/dbname/blah?w=2

                                                                                                         N=3
                                                                                                         W=2
                             Load Balancer
                                                                                                         R=2

                                Node 1

            24                                           No
       de                A     B     C       D              de
    No                                           B
                                                                 2
                     A
                 Z                                   C
      Y                                                     D
X                        hash(blah) = E                              E


                                                                         C       N
                                                                                  od
                                                                                     e
                                                                             D           3

                                                                                 E

                                                                                             F




                                                                                                 D



                                                                                                             No
                                                                                                             de
                                                                                                     E



                                                                                                              4
                                                                                                         F
                                                                                                             G
request

    PUT http://boorad.cloudant.com/dbname/blah?w=2

                                                                                                         N=3
                                                                                                         W=2
                             Load Balancer
                                                                                                         R=2

                                Node 1

            24                                           No
       de                A     B     C       D              de
    No                                           B
                                                                 2
                     A
                 Z                                   C
      Y                                                     D
X                        hash(blah) = E                              E


                                                                         C       N
                                                                                  od
                                                                                     e
                                                                             D           3

                                                                                 E

                                                                                             F




                                                                                                 D



                                                                                                             No
                                                                                                             de
                                                                                                     E



                                                                                                              4
                                                                                                         F
                                                                                                             G
request

    PUT http://boorad.cloudant.com/dbname/blah?w=2

                                                                                                         N=3
                                                                                                         W=2
                             Load Balancer
                                                                                                         R=2


            24
                                Node 1

                                                         No
                                                                                                 node down
       de                A     B     C       D              de
    No                                           B
                                                                 2
                     A
                 Z                                   C
      Y                                                     D
X                        hash(blah) = E                              E


                                                                         C       N
                                                                                  od
                                                                                     e
                                                                             D           3

                                                                                 E

                                                                                             F




                                                                                                 D



                                                                                                             No
                                                                                                             de
                                                                                                     E



                                                                                                              4
                                                                                                         F
                                                                                                             G
RESULT

• For   standalone or cluster
  •   one REST API

  •   one URL

• For   cluster
  •   redundant data

  •   distributed queries

  •   scale out

                                43
QUESTIONS?
CREDITS



• Emil    Eifrem, http://bit.ly/5D40WQ

• Sergio    Bossa, http://bit.ly/c9UoRZ

• Cliff   Moon, http://bit.ly/bX887c




                                   45

Weitere ähnliche Inhalte

Was ist angesagt?

Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Chris Richardson
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)baggioss
 
Storage infrastructure using HBase behind LINE messages
Storage infrastructure using HBase behind LINE messagesStorage infrastructure using HBase behind LINE messages
Storage infrastructure using HBase behind LINE messagesLINE Corporation (Tech Unit)
 
Developing polyglot persistence applications #javaone 2012
Developing polyglot persistence applications  #javaone 2012Developing polyglot persistence applications  #javaone 2012
Developing polyglot persistence applications #javaone 2012Chris Richardson
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBayMongoDB
 
Petabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructurePetabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructureelliando dias
 
HBase @ Twitter
HBase @ TwitterHBase @ Twitter
HBase @ Twitterctrezzo
 
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Cloudera, Inc.
 
Learning To Relax
Learning To RelaxLearning To Relax
Learning To RelaxCloudant
 
Hadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionHadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionBenoit Perroud
 
What's behind facebook
What's behind facebookWhat's behind facebook
What's behind facebookAjen 陳
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...MongoDB
 
Yahoo Pipes Middleware In The Cloud
Yahoo Pipes Middleware In The CloudYahoo Pipes Middleware In The Cloud
Yahoo Pipes Middleware In The CloudConSanFrancisco123
 
MongoDB Case Study at NoSQL Now 2012
MongoDB Case Study at NoSQL Now 2012MongoDB Case Study at NoSQL Now 2012
MongoDB Case Study at NoSQL Now 2012Sean Laurent
 
Big data for the rest of us
Big data for the rest of usBig data for the rest of us
Big data for the rest of usSteven Francia
 
MariaDB - a MySQL Replacement #SELF2014
MariaDB - a MySQL Replacement #SELF2014MariaDB - a MySQL Replacement #SELF2014
MariaDB - a MySQL Replacement #SELF2014Colin Charles
 
Your backend architecture is what matters slideshare
Your backend architecture is what matters slideshareYour backend architecture is what matters slideshare
Your backend architecture is what matters slideshareColin Charles
 

Was ist angesagt? (20)

Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)
 
Storage infrastructure using HBase behind LINE messages
Storage infrastructure using HBase behind LINE messagesStorage infrastructure using HBase behind LINE messages
Storage infrastructure using HBase behind LINE messages
 
Developing polyglot persistence applications #javaone 2012
Developing polyglot persistence applications  #javaone 2012Developing polyglot persistence applications  #javaone 2012
Developing polyglot persistence applications #javaone 2012
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBay
 
Petabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructurePetabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructure
 
HBase @ Twitter
HBase @ TwitterHBase @ Twitter
HBase @ Twitter
 
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
 
Learning To Relax
Learning To RelaxLearning To Relax
Learning To Relax
 
Hadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionHadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment Evolution
 
What's behind facebook
What's behind facebookWhat's behind facebook
What's behind facebook
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
 
Yahoo Pipes Middleware In The Cloud
Yahoo Pipes Middleware In The CloudYahoo Pipes Middleware In The Cloud
Yahoo Pipes Middleware In The Cloud
 
VF NZ
VF NZVF NZ
VF NZ
 
MongoDB Case Study at NoSQL Now 2012
MongoDB Case Study at NoSQL Now 2012MongoDB Case Study at NoSQL Now 2012
MongoDB Case Study at NoSQL Now 2012
 
Big data for the rest of us
Big data for the rest of usBig data for the rest of us
Big data for the rest of us
 
Why MariaDB?
Why MariaDB?Why MariaDB?
Why MariaDB?
 
Qcon
QconQcon
Qcon
 
MariaDB - a MySQL Replacement #SELF2014
MariaDB - a MySQL Replacement #SELF2014MariaDB - a MySQL Replacement #SELF2014
MariaDB - a MySQL Replacement #SELF2014
 
Your backend architecture is what matters slideshare
Your backend architecture is what matters slideshareYour backend architecture is what matters slideshare
Your backend architecture is what matters slideshare
 

Ähnlich wie NOSQL, CouchDB, and the Cloud

An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBWilliam LaForest
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the CloudTony Tam
 
Intro to NoSQL and MongoDB
Intro to NoSQL and MongoDBIntro to NoSQL and MongoDB
Intro to NoSQL and MongoDBDATAVERSITY
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreducehansen3032
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social WebBogdan Gaza
 
NoSQL on the move
NoSQL on the moveNoSQL on the move
NoSQL on the moveCodemotion
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introductionScott Miao
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DBHeriyadi Janwar
 
Discover MongoDB - Israel
Discover MongoDB - IsraelDiscover MongoDB - Israel
Discover MongoDB - IsraelMichael Fiedler
 
Lviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQLLviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQLzenyk
 

Ähnlich wie NOSQL, CouchDB, and the Cloud (20)

An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
 
Drill njhug -19 feb2013
Drill njhug -19 feb2013Drill njhug -19 feb2013
Drill njhug -19 feb2013
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the Cloud
 
Hadoop DB
Hadoop DBHadoop DB
Hadoop DB
 
Intro to NoSQL and MongoDB
Intro to NoSQL and MongoDBIntro to NoSQL and MongoDB
Intro to NoSQL and MongoDB
 
MongoDB
MongoDBMongoDB
MongoDB
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social Web
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
NoSQL on the move
NoSQL on the moveNoSQL on the move
NoSQL on the move
 
Drop acid
Drop acidDrop acid
Drop acid
 
Anti-social Databases
Anti-social DatabasesAnti-social Databases
Anti-social Databases
 
Wmware NoSQL
Wmware NoSQLWmware NoSQL
Wmware NoSQL
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 
Discover MongoDB - Israel
Discover MongoDB - IsraelDiscover MongoDB - Israel
Discover MongoDB - Israel
 
Lviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQLLviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQL
 
A peek into the future
A peek into the futureA peek into the future
A peek into the future
 

Mehr von boorad

Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solrboorad
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013boorad
 
Hadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkHadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkboorad
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Stormboorad
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Casesboorad
 
PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batchboorad
 
TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batchboorad
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Stormboorad
 
Large Scale Data Analysis Tools
Large Scale Data Analysis ToolsLarge Scale Data Analysis Tools
Large Scale Data Analysis Toolsboorad
 
DevNexus 2011
DevNexus 2011DevNexus 2011
DevNexus 2011boorad
 
Why Erlang? - Bar Camp Atlanta 2008
Why Erlang?  - Bar Camp Atlanta 2008Why Erlang?  - Bar Camp Atlanta 2008
Why Erlang? - Bar Camp Atlanta 2008boorad
 

Mehr von boorad (11)

Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 
Hadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkHadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talk
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Storm
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batch
 
TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batch
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Storm
 
Large Scale Data Analysis Tools
Large Scale Data Analysis ToolsLarge Scale Data Analysis Tools
Large Scale Data Analysis Tools
 
DevNexus 2011
DevNexus 2011DevNexus 2011
DevNexus 2011
 
Why Erlang? - Bar Camp Atlanta 2008
Why Erlang?  - Bar Camp Atlanta 2008Why Erlang?  - Bar Camp Atlanta 2008
Why Erlang? - Bar Camp Atlanta 2008
 

Kürzlich hochgeladen

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 

Kürzlich hochgeladen (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

NOSQL, CouchDB, and the Cloud

  • 1. NOSQL, COUCHDB AND THE CLOUD Brad Anderson Cloudant 1
  • 2. BRAD ANDERSON • BS Hotel Management • Restaurant Chain Data - econometric modeling, BI/DW • Open Source - trac, dsource.org, couchdb • NOSQLEast 2009 • Cloudant • http://twitter.com/boorad 2
  • 3. AGENDA • NOSQL • COUCHDB • Erlang • Cloud • Dynamo • MapReduce 3
  • 5. RELATIONAL DATABASES RDBMS • Rigid Schema / ORM fun 1970-2009 • Scalability • Everything is a Nail http://www.flickr.com/photos/36041246@N00/3419197777/ 5
  • 6. MASTER-SLAVE • Master-Slave Replication • One (and only one) master • One or more slaves • All writes go to the master, replicated to slaves • Reads balanced among master and slaves • Issues • single point of failure • single point of bottleneck • static topology 6
  • 7. MASTER-MASTER • Master-Master Replication • One or more masters • Writes and reads can go to any master • Writes are replicated among masters • Issues • limited performance and scalability (typically due to 2PC) • complexity • static topology 7
  • 8. VERTICAL PARTITION • Vertical Partitioning • Put tables belonging to different functional areas on different database nodes • Scale data & load by function • Move joins to the application level • Issues • no longer truly relational • a functional area grows too much 8
  • 9. HORIZONTAL PARTITION • Horizontal Partitioning • Split tables by key and put partitions (shards) on different nodes • Scale data & load by key • Move joins to the application level • Issues • no longer truly relational • a partition grows too much 9
  • 10. CACHING • Put a cache in front of your database • Distribute • Write-through for scaling reads • Write-behind for scaling reads and writes • Issues • “only” scales read/write load • invalidation 10
  • 12. NOSQL NOT ONLY SQL A moniker for different data storage systems solving very different problems, all where a relational database is not the right fit. 12
  • 13. RIGHT FIT • Google indexes 400 Pb / day (2007) • CERN, LHC generates 100 Pb / sec • Unique data created each year (IDC, 2007) • 2007 40 Eb • 2010 988 Eb (exponential growth) 13
  • 14. FOUR CATEGORIES • Key/Value Stores • Dynomite, Voldemort, Tokyo • Document Stores • CouchDB, MongoDB • Column Stores / BigTable • HBase, Hypertable • Graph Databases • Neo4j, AllegroGraph, VertexDB 14
  • 15. BIG TAKEAWAY function function data function function data data function function data data data data data data data data function function data data data data data data function function data data function data Bring the function to the data 15
  • 16. 16
  • 17. HUH? ERLANG? • Programming Language created at Ericsson (20 yrs old now) • Designed for scalable, long-lived systems • Compiled, Functional, Dynamically Typed, Open Source 17
  • 18. 3 BIGGIES • Massively Concurrent • green threads, very lightweight != os threads • Seamlessly Distributed • node = os thread = VM, processes can live anywhere • Fault Tolerant • 99.9999999 = 32ms downtime per year - AXD301 18
  • 19. Of fi cia lB et a! CouchDB Apache 19
  • 20. COUCHDB • Schema-free document database server • Robust, highly concurrent, fault-tolerant • RESTful JSON API • Futon web admin console • MapReduce system for generating custom views • Bi-directional incremental replication • couchapp: lightweight HTML+JavaScript apps served directly from CouchDB using views to transform JSON 20
  • 21. FROM INTEREST TO ADOPTION • 100+ production users • Active commercial •3 development books being written • Rapidly maturing • Vibrant, open community 21
  • 22. OF THE WEB Django may be built for the Web, but CouchDB is built of the Web. I've never seen software that so completely embraces the philosophies behind HTTP ... this is what the software of the future looks like. Jacob Kaplan-Moss October 17 2007 http://jacobian.org/writing/of-the-web/ 22
  • 23. DOCUMENTS • Documents are JSON Objects • Underscore-prefixed fields are reserved • Documents can have binary attachments • MVCC _rev deterministically generated from doc content 23
  • 24. ROBUST • Never overwrite previously committed data • In the event of a server crash or power failure, just restart CouchDB -- there is no “repair” • Take snapshots with “cp” • Configurable levels of durability: can choose to fsync after every update, or less often to gain better throughput 24
  • 25. CONCURRENT • Erlang approach: lightweight processes to model the natural concurrency in a problem • For CouchDB that means one process per TCP connection • Lock-free architecture; each process works with an MVCC snapshot of a DB. • Performance degrades gracefully under heavy concurrent load 25
  • 26. REST API • Create PUT /mydb/mydocid • Retrieve GET /mydb/mydocid • Update PUT /mydb/mydocid • Delete DELETE /mydb/mydocid 26
  • 27. 27
  • 28. VIEWS • Custom, persistent representations of document data • “Closeto the metal” -- no dynamic queries in production, so you know exactly what you’re getting • Generated using MapReduce functions written in JavaScript (and other languages) view must have a map function and may also have a • Each reduce function • Leverages view collation, rich view query API 28
  • 31. INCREMENTAL • Computing a view can be expensive, so CouchDB saves the result in a B-tree and keeps it up-to-date • Leafnodes store map results, inner nodes store reductions of children http://horicky.blogspot.com/2008/10/couchdb-implementation.html 31
  • 32. REPLICATION • Peer-based, bi-directional replication using normal HTTP calls • Mediated by a replicator process which can live on the source, target, or somewhere else entirely • Replicate a subset of documents in a DB meeting criteria defined in a custom filter function (coming soon) • Applications (_design documents) replicate along with the data • Ideal for offline applications -- “ground computing” 32
  • 34. SHOWROOM A cluster of couches 34
  • 35. ARCHITECTURE • Each cluster is a ring of nodes (Dynamo, Dynomite) • Any node can handle request (consistent hashing) • O(1), with a hop • nodes own partitions (ring is divided) • data are distributed evenly across partitions and replicas • mapreduce functions are passed to nodes for execution
  • 36. RESEARCH • Google’s MapReduce, http://bit.ly/bJbyq5 • Amazon’s Dynamo, http://bit.ly/b7FlsN • CAP theorem, http://bit.ly/bERr2H
  • 37. CLUSTER CONTROLS •N - Replication Q •Q - Partitions = 2 •R - Read Quorum •W - Write Quorum • These constants define the cluster
  • 38. N Consistency Throughput Durability N = Number of replicas per item stored in cluster
  • 39. Q Throughput Scalability 2^Q = Number of partitions (shards) in cluster T = Number of nodes in cluster 2^Q / T = Number of partitions per node
  • 40. R Latency Consistency R = Number of successful reads before returning value(s) to client
  • 41. W Latency Durability W = Number of successful writes before returning ‘success’ to client
  • 42. Load Balancer Node 1 24 No de A B C D de No B 2 A Z C Y D X E C N od e D 3 E F D No de E 4 F G
  • 43. request PUT http://boorad.cloudant.com/dbname/blah?w=2 Load Balancer Node 1 24 No de A B C D de No B 2 A Z C Y D X E C N od e D 3 E F D No de E 4 F G
  • 44. request PUT http://boorad.cloudant.com/dbname/blah?w=2 Load Balancer Node 1 24 No de A B C D de No B 2 A Z C Y D X E C N od e D 3 E F D No de E 4 F G
  • 45. request PUT http://boorad.cloudant.com/dbname/blah?w=2 Load Balancer Node 1 24 No de A B C D de No B 2 A Z C Y D X hash(blah) = E E C N od e D 3 E F D No de E 4 F G
  • 46. request PUT http://boorad.cloudant.com/dbname/blah?w=2 N=3 W=2 Load Balancer R=2 Node 1 24 No de A B C D de No B 2 A Z C Y D X hash(blah) = E E C N od e D 3 E F D No de E 4 F G
  • 47. request PUT http://boorad.cloudant.com/dbname/blah?w=2 N=3 W=2 Load Balancer R=2 Node 1 24 No de A B C D de No B 2 A Z C Y D X hash(blah) = E E C N od e D 3 E F D No de E 4 F G
  • 48. request PUT http://boorad.cloudant.com/dbname/blah?w=2 N=3 W=2 Load Balancer R=2 24 Node 1 No node down de A B C D de No B 2 A Z C Y D X hash(blah) = E E C N od e D 3 E F D No de E 4 F G
  • 49. RESULT • For standalone or cluster • one REST API • one URL • For cluster • redundant data • distributed queries • scale out 43
  • 51. CREDITS • Emil Eifrem, http://bit.ly/5D40WQ • Sergio Bossa, http://bit.ly/c9UoRZ • Cliff Moon, http://bit.ly/bX887c 45

Hinweis der Redaktion

  1. 20 yrs old, open source since mid-90’s, iirc. like a mobile telephone grid compiled (but to bytecode for a VM) open source
  2. Why Erlang? Here are my three big ticket items - massively concurrent - seamlessly distributed into multi-machine clusters - extremely fault tolerant Great for my projects - data storage & retrieval - scalable web apps Maybe not so hot for computationally intensive projects - unless they lend themselves to parallelism