SlideShare ist ein Scribd-Unternehmen logo
1 von 89
Open source, high performance database




Introduction to NoSQL and MongoDB
Will LaForest
Senior Director of 10gen Federal
will@10gen.com
@WLaForest




                                         1
SQL                              Dynamic
                   invented                          Web Content                       released

                                                                                    10gen
                                                             Web applications      founded
                          Oracle              Client
IBM’s IMS                founded              Server                           SOA

                                                                        BigTable



1965        1970     1975     1980     1985   1990       1995         2000     2005      2010




     Codd publishes            PC’s gain                  3 tier                  Cloud
 relational model paper         traction               architecture             Computing
         in 1970
                                                                Brewer’s Cap            NoSQL
                                              WWW                   born               Movement
                                              born
                                                                                                  2
Attribute




Tuple




                    Relation
                               3
4
• Data stored in a RDBMS is very compact (disk was
  more expensive)
• SQL and RDBMS made queries flexible with rigid
  schemas
• Rigid schemas helps optimize joins and storage
• Massive ecosystem of tools, libraries and integrations
• Been around 40 years!




                                                           5
• Gartner uses the 3Vs to define
• Volume - Big/difficult/extreme volume is relative
• Variety
   –   Changing or evolving data
   –   Uncontrolled formats
   –   Does not easily adhere to a single schema
   –   Unknown at design time
• Velocity
   – High or volatile inbound data
   – High query and read operations
   – Low latency
                                                      6
VOLUME & NEW
                             ARCHITECTURES
                             • Systems scaling horizontally,
                               not vertically
                             • Commodity servers
                             • Cloud Computing



DATA VARIETY &
VOLATILITY
• Extremely difficult to
  find a single fixed
  schema
• Don’t know data
  schema a-priori          TRANSACTIONAL MODEL
                           • N x Inserts or updates
                           • Distributed transactions


                                                           7
• Non-relational been hanging around (MUMPS?)
• Modern NoSQL theory and offerings started in early
  2000s
• Modern usage of term introduced in 2009
• NoSQL = Not Only SQL
• A collection of very different products
• Alternatives to relational databases when they are a
  bad fit
• Motives
   – Horizontally scalable (commodity server/cloud computing)
   – Flexibility
                                                                8
• Value (Data) mapped to a key (think primary)
• Some typed some just BLOBs
• Redis, MemcachDB, Voldemort


        “Will”       • “@WLaForest”
      “Chris”        • 12
     “Robert” • BLOB
                                                 9
•   Data stored on disk in a column oriented fashion
•   Predominantly hash based indexing
•   Data partitioned by range or consistent hashing
•   Google BigTable, HBase, Cassandra, Accumulo




                                                       10
• Key-Value Stores
   – Key-Value Stores
   – Value (Data) mapped to a key (think primary)
• Big Table Descendants
   – Looks like a distributed multi-dimension map
   – Data stored in a column oriented fashion
   – Predominantly hash based indexing
• Document oriented stores
   – Data stored as either JSON or XML documents
• What do they all have in common?

                                                    11
• Not a database
• Map reduce on HDFS or other data source
• When you want to use it
  – Can’t use a index
  – Distributing custom algorithms
  – ETL
• Great for grinding through data
• Many NoSQL offerings have native map reduce
  functionality


                                                12
13
• 2007
  – Eliot Horowitz & Dwight Merriman tired of reinventing the
    wheel
  – 10gen founded
  – MongoDB Development begins
• 2009
  – Initial release of MongoDB
• 73M+ in funding
  – Funded by Sequoia, NEA, Union Square Ventures, Flybridge
    Capital


                                                                14
• Pre-production Subscriptions for MongoDB
   – Fixed cost = $30k
   – 6 month term, unlimited servers
• MongoDB Subscriptions
   – 32 month term, $4k per server
   – 24x7 support for development and production, 1 hour response time
   – Onboarding call and quarterly review
• Consulting
   – Cost is $300/hr
• Training
   – MongoDB Developer - $1500/student, 2 day course
   – MongoDB Administrator - $1500/student, 2 day course
   – MongoDB Essential - $2250/student, 3 day course
• MongoDB Monitoring Service
                                                                         15
• 4 servers with 2 processors each
• Total processors equals 8
• Oracle Enterprise Edition
   – $47,500 per processor plus maintenance
   – 8 x 47,500 = $380,000 + $76,000
   – Total Cost = $456,000
• MongoDB Subscription
   –   $4,000 per server (no processor count)
   –   No license fee or maintenance charge
   –   4 x 4,000 = $16,000
   –   Total Cost = $16,000
• MongoDB is $440,000 less expensive
                                                16
#2 on Indeed’s Fastest Growing Jobs        Jaspersoft BigData Index

                                                          Demand for
                                                          MongoDB, the
                                                          document-oriented
                                                          NoSQL database, saw
                                                          the biggest spike
                                                          with over 200%
                                                          growth in 2011.




                                                 451 Group
         Google Searches              “MongoDB increasing its dominance”




                                                                                17
#2 ON INDEED’S FASTEST GROWING JOBS




                                      18
“MongoDB INCREASING ITS DOMINANCE”




                                     19
• Scale horizontally over commodity hardware
• RDBMSs great so keep what works
   – Rich data models
   – Adhoc queries
   – Fully featured indexes
• What doesn’t distribute well?
   – Long running multi-row transactions
   – Joins
   – Both artifacts of the relational data model
• Do not homogenize programming interfaces
• Local storage first class citizen for DB storage
                                                     20
•   Data stored as documents (JSON)
•   Schema free
•   CRUD operations – (Create Read Update Delete)
•   Atomic document operations
•   Ad hoc Queries like SQL
    –   Equality
    –   Regular expression searches
    –   Ranges
    –   Geospatial
•   Secondary indexes
•   Sharding (sometimes called partitioning) for scalability
•   Replication – HA and read scalability
                                                               21
•   MongoDB does not need any defined data schema.
•   Every document could have different data!

    {name: “will”,           name: “jeff”,   {name: “brendan”,
     eyes: “blue”,           eyes: “blue”,    aliases: [“el diablo”]}
     birthplace: “NY”,       height: 72,
     aliases: [“bill”, “la   boss: “ben”}
    ciacco”],                                {name: “matt”,
     gender: ”???”,                           pizza: “DiGiorno”,
     boss: ”ben”}            name: “ben”,     height: 72,
                             hat: ”yes”}      boss: 555.555.1212}




                                                                        22
Seek = 5+ ms          Read = really really fast




               Post

                           Comment
Author




                                                  23
Post


  Author



  Comment
  Comment
   Comment
    Comment
    Comment




              24
RDBMS      MongoDB



Database   Database


Table      Collection


Row        Document




                        25
• We are running instances on the order of:
  -   100B objects
  -   50TB storage
  -   50K qps per server
  -   ~1200 servers




                                              26
• SSL
   – between client and server
   – Intra-cluster communication
• Authorization at the database level
   – Read Only/Read+Write/Administrator
• Security Roadmap (tentative)
   –   Pluggable authentication 2.4
   –   Auditing 2.4
   –   Cell level security 2.6
   –   Common Criteria certification

                                          27
28
var p = { author: “roger”,
     date: new Date(),
     text: “Spirited Away”,
     tags: *“Tezuka”, “Manga”+-

> db.posts.save(p)




                                  29
>db.posts.find()

 { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
   author : "roger",
   date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)",
   text : "Spirited Away",
   tags : [ "Tezuka", "Manga" ] }

Notes:
 - _id is unique, but can be anything you’d like
                                                       30
Create index on any Field in Document

 // 1 means ascending, -1 means descending

 >db.posts.ensureIndex({author: 1})

 >db.posts.find({author: 'roger'})

 { _id     : ObjectId("4c4ba5c0672c685e5e8aabf3"),
   author : "roger",
   ... }


                                                     31
• Conditional Operators
  – $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type
  – $lt, $lte, $gt, $gte

  // find posts with any tags
  > db.posts.find( {tags: {$exists: true }} )

  // find posts matching a regular expression
  > db.posts.find( {author: /^rog*/i } )

  // count posts by author
  > db.posts.find( {author: ‘roger’- ).count()
                                                                   32
• $set, $unset, $inc, $push, $pushAll, $pull, $pullAll, $bit

> comment = { author: “fred”,
           date: new Date(),
           text: “Best Movie Ever”-

> db.posts.update( { _id: “...” -,
              $push: {comments: comment} );



                                                               33
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
  author : "roger",
  date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)",
  text : "Spirited Away",
  tags : [ "Tezuka", "Manga" ],
  comments : [
   {
       author : "Fred",
       date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)",
       text : "Best Movie Ever"
   }
  ]}
                                                           34
// Index nested documents
> db.posts.ensureIndex( “comments.author”:1 )
 db.posts.find(,‘comments.author’:’Fred’-)

// Index on tags
> db.posts.ensureIndex( tags: 1)
> db.posts.find( { tags: ’Manga’ - )

// geospatial index
> db.posts.ensureIndex( “author.location”: “2d” )
> db.posts.find( “author.location” : , $near : [22,42] } )
                                                             35
36
•   Native Map/Reduce in JS in MongoDB
    –   Distributes across the cluster with good data locality
•   New aggregation framework
    –   Declarative (no JS required)
    –   Pipeline approach (like Unix ps -ax | tee processes.txt | more)
•   Hadoop
    –   Intersect the indexing of MongoDB with the brute force parallelization
        of hadoop
    –   Hadoop MongoDB connector




                                                                                 37
38
$project                 $match                  $limit              $skip
   $unwind                  $group                  $sort


{                                                    db.article.aggregate(
 title : “this is my title” ,                           { $project : {
 author : “bob” ,                                           author : 1,
 posted : new Date () ,                                     tags : 1,
 pageViews : 5 ,                                        }},
 tags : [ “fun” , “good” , “fun” ] ,                    { $unwind : "$tags" },
 comments : [                                           { $group : {
      { author :“joe” , text : “this is cool” } ,           _id : “$tags”,
      { author :“sam” , text : “this is bad” }              authors : { $addToSet : "$author" }
 ],                                                     }}
 other : { foo : 5 }                                 );
}



                                                                                              39
40
Input data   MAP   Intermediate data   REDUCE   Output data




              1                          1




              2                          2




              3                          3




                                                              41
Input data   MAP   Intermediate data   REDUCE   Output data




              1                          1




              2                          2




              3                          3




                                                              42
43
Write
                 Primary
         Read
                             Asynchronous
                 Secondary   Replication
         Read
Driver




                 Secondary
         Read



                                            44
Primary


                Secondary
         Read
Driver




                Secondary
         Read



                            45
Primary

         Write
                 Primary     Automatic
         Read                Leader Election
Driver




                 Secondary
         Read



                                               46
Secondary
         Read
         Write
         Read    Primary
Driver




                 Secondary
         Read



                             47
Additional Details


                           Clients




            MongoS         MongoS               MongoS                     Config


                                                                           Config

Key Range      Key Range             Key Range           Key Range
0..30          31..60                61..90              91.. 100          Config

 Primary        Primary               Primary             Primary



Secondary      Secondary             Secondary           Secondary



Secondary      Secondary             Secondary           Secondary



                                                                                    48
•   Sharding Details
•   Replica Set Details
•   Consistency Details
•   Common Deployment Scenarios
•   Citations




                                  49
50
Write
                 Primary


                 Secondary
         Read

                 Secondary
Driver




         Read



                             51
Write
                 Primary
Driver




         Read

                 Secondary


                 Secondary



                             52
1. Write
                    Primary


                    Secondary   1. Replicate
         2. Read

                    Secondary
Driver




         2. Read



                                               53
• Fire and forget
• Wait for error
• Wait for fsync
• Wait for journal sync
• Wait for replication




                          54
Driver           Primary
         write

                           apply in memory




                                             55
Driver             Primary
            write
         getLastError
                             apply in memory




                                               56
Driver              Primary
            write
         getLastError
                              apply in memory
           j:true
                              Write to journal




                                                 57
Driver             Primary
            write
         getLastError
                             apply in memory
          fsync:true


                             fsync




                                               58
Driver             Primary                     Secondary
            write
         getLastError
                             apply in memory
           w:2
                                 replicate




                                                           59
Value          Meaning

<n:integer>    Replicate to N members of
               replica set
“majority”     Replicate to a majority of
               replica set members
<m:modeName>   Use cutom error mode
               name




                                            60
61
> db.runCommand( { shardcollection: “test.users”,
          key: { email: 1 }} )
      {
          name: “Jared”,
          email: “jsr@10gen.com”,
      }
      {
          name: “Scott”,
          email: “scott@10gen.com”,
      }
      {
          name: “Dan”,
          email: “dan@10gen.com”,
      }


                                                    62
-∞   +∞




      63
-∞                                              +∞




     dan@10gen.com            scott@10gen.com


              jsr@10gen.com




                                                 64
Split!




-∞                                              +∞




     dan@10gen.com            scott@10gen.com


              jsr@10gen.com




                                                 65
This is a                   Split!        This is a
     chunk                                     chunk


-∞                                                          +∞




                 dan@10gen.com            scott@10gen.com


                          jsr@10gen.com




                                                             66
-∞                                              +∞




     dan@10gen.com            scott@10gen.com


              jsr@10gen.com




                                                 67
Split!




-∞                                              +∞




     dan@10gen.com            scott@10gen.com


              jsr@10gen.com




                                                 68
-∞                adam@10gen.com    1
adam@10gen.com    jared@10gen.com   1
jared@10gen.com   scott@10gen.com   1
scott@10gen.com   +∞                1


• Stored in the config serers
• Cached in mongos
• Used to route requests and keep cluster balanced


                                                     69
mongos
                                                                          config
                                     balancer
                                                                          config
Chunks!
                                                                          config




  1   2    3    4     13   14   15   16         25   26   27   28    37   38   39   40

  5   6    7    8     17   18   19   20         29   30   31   32    41   42   43   44

  9   10   11   12    21   22   23   24         33   34   35   36    45   46   47   48


 Shard 1             Shard 2                Shard 3                 Shard 4



                                                                                         70
mongos
                                                                          config
                                     balancer
                                                                          config


                    Imbalance                                             config




 1   2    3    4

 5   6    7    8

 9   10   11   12     21   22   23   24         33   34   35   36    45   46   47   48


Shard 1             Shard 2                 Shard 3                 Shard 4



                                                                                         71
mongos
                                                                          config
                                     balancer
                                                                          config

                               Move chunk 1 to                            config
                               Shard 2




 1   2    3    4

 5   6    7    8

 9   10   11   12    21   22    23   24         33   34   35   36    45   46   47   48


Shard 1             Shard 2                 Shard 3                 Shard 4



                                                                                         72
mongos
                                                                         config
                                    balancer
                                                                         config

                                                                         config




 1   2    3    4

 5   6    7    8

 9   10   11   12    21   22   23   24         33   34   35   36    45   46   47   48


Shard 1             Shard 2                Shard 3                 Shard 4



                                                                                        73
mongos
                                                                         config
                                    balancer
                                                                         config
                                         Chunks 1,2, and 3
                                         have migrated
                                                                         config




               4

 5   6    7    8     1                         2                    3

 9   10   11   12    21   22   23   24         33   34   35   36    45   46   47   48


Shard 1             Shard 2                Shard 3                 Shard 4




                                                                                        74
75
1             1.Query arrives at
                                           mongos
                                4
                    mongos
                                         2.mongos routes query
                                           to a single shard

                                         3.Shard returns results
                       2
                                           of query
                       3
                                         4.Results returned to
                                           client

Shard 1   Shard 2              Shard 3




                                                                   76
1               1.Query arrives at
                                               mongos
                                 4
                    mongos                   2.mongos broadcasts
                                               query to all shards

                                             3.Each shard returns
           2                                   results for query
                     2               2
                                         3
           3             3                   4.Results combined
                                               and returned to client

Shard 1   Shard 2                Shard 3




                                                                        77
1               1.Query arrives at mongos

                                   6           2.mongos broadcasts query
                    mongos                       to all shards
                               5               3.Each shard locally sorts
                                                 results

             2                                 4.Results returned to
                       2                         mongos
                                   2
             4             4           4
                                               5.mongos merge sorts
                                                 individual results
   3          3                            3
                                               6.Combined sorted result
Shard 1   Shard 2              Shard 3           returned to client




                                                                        78
Inserts   Requires shard   db.users.insert({
          key                name: “Jared”,
                             email: “jsr@10gen.com”})

Removes   Routed           db.users.delete({
                             email: “jsr@10gen.com”})

          Scattered        db.users.delete({name: “Jared”})


Updates   Routed           db.users.update(
                             {email: “jsr@10gen.com”},
                             {$set: { state: “CA”}})
          Scattered        db.users.update(
                             {state: “FZ”},
                             {$set:{ state: “CA”}} )
                                                              79
By Shard      Routed            db.users.find(
                                  {email: “jsr@10gen.com”})
Key

Sorted by     Routed in order   db.users.find().sort({email:-1})
shard key
Find by non   Scatter Gather    db.users.find({state:”CA”})
shard key
Sorted by     Distributed merge db.users.find().sort({state:1})
              sort
non shard
key

                                                                   80
81
Data Center




     Primary   Secondary   Secondary




                                       82
Data Center




     Primary   Secondary   Secondary
                           hidden=true




                            backups




                                         83
Active Data Center                  Standby Data Center




  Primary            Secondary          Secondary
  priority = 1       priority = 1




                                                          84
West Coast DC   Central DC      East Coast DC




 Secondary       Primary           Secondary
                 priority = 1




                                                85
86
•   History of Database Management (http://bit.ly/w3r0dv)
•   EMC IDC Study (http://bit.ly/y1mJgJ)
•   Gartner & Big Data (http://bit.ly/xvRP3a)
•   SQL (http://en.wikipedia.org/wiki/SQL)
•   Database Management Systems
    http://en.wikipedia.org/wiki/Dbms)
• Dynamo: Amazon’s Highly Available Key-value Store
    (http://bit.ly/A8F8oy)
• CAP Theorem (http://bit.ly/zvA6O6)
• NoSQL Google File System and BigTable
    (http://oreil.ly/wOXliP)
• NoSQL Movement whitepaper (http://bit.ly/A8RBuJ)
• Sample ERD diagram (http://bit.ly/xV30v)

                                                            87
88
• Impossible for a distributed computer system to
  simultaneously provide all three of the following
  guarantees
   – Consistency - All nodes see the same data at the same
     time.
   – Availability - A guarantee that every request receives a
     response about whether it was successful or failed.
   – Partition tolerance - No set of failures less than total
     network failure is allowed to cause the system to respond
     incorrectly



                                                                 89

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & BahrainIntroduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & Bahrain
Neo4j
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
jexp
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use Cases
DATAVERSITY
 

Was ist angesagt? (20)

A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDB
 
Introduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & BahrainIntroduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & Bahrain
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use Cases
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 
Time Series Data with InfluxDB
Time Series Data with InfluxDBTime Series Data with InfluxDB
Time Series Data with InfluxDB
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDB
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Data Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph DatabasesData Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph Databases
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 

Andere mochten auch

Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
Derek Stainer
 

Andere mochten auch (8)

NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
NoSQL databases pros and cons
NoSQL databases pros and consNoSQL databases pros and cons
NoSQL databases pros and cons
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 

Ähnlich wie An Introduction to Big Data, NoSQL and MongoDB

How to Get Started with Your MongoDB Pilot Project
How to Get Started with Your MongoDB Pilot ProjectHow to Get Started with Your MongoDB Pilot Project
How to Get Started with Your MongoDB Pilot Project
DATAVERSITY
 
Big data hadoop-no sql and graph db-final
Big data hadoop-no sql and graph db-finalBig data hadoop-no sql and graph db-final
Big data hadoop-no sql and graph db-final
ramazan fırın
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
MongoDB
 

Ähnlich wie An Introduction to Big Data, NoSQL and MongoDB (20)

Anti-social Databases
Anti-social DatabasesAnti-social Databases
Anti-social Databases
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social Web
 
NOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the CloudNOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the Cloud
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
DevNation Atlanta
DevNation AtlantaDevNation Atlanta
DevNation Atlanta
 
How to Get Started with Your MongoDB Pilot Project
How to Get Started with Your MongoDB Pilot ProjectHow to Get Started with Your MongoDB Pilot Project
How to Get Started with Your MongoDB Pilot Project
 
Why Organizations are Looking at Alternative Database Technologies – Introduc...
Why Organizations are Looking at Alternative Database Technologies – Introduc...Why Organizations are Looking at Alternative Database Technologies – Introduc...
Why Organizations are Looking at Alternative Database Technologies – Introduc...
 
MongoDB
MongoDBMongoDB
MongoDB
 
MongoDB
MongoDBMongoDB
MongoDB
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
 
Wmware NoSQL
Wmware NoSQLWmware NoSQL
Wmware NoSQL
 
Drop acid
Drop acidDrop acid
Drop acid
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
NoSQL
NoSQLNoSQL
NoSQL
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
Introducing MongoDB into your Organization
Introducing MongoDB into your OrganizationIntroducing MongoDB into your Organization
Introducing MongoDB into your Organization
 
Big data hadoop-no sql and graph db-final
Big data hadoop-no sql and graph db-finalBig data hadoop-no sql and graph db-final
Big data hadoop-no sql and graph db-final
 
Webinar: How Banks Manage Reference Data with MongoDB
 Webinar: How Banks Manage Reference Data with MongoDB Webinar: How Banks Manage Reference Data with MongoDB
Webinar: How Banks Manage Reference Data with MongoDB
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
 
Big Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and CassasdraBig Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and Cassasdra
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

An Introduction to Big Data, NoSQL and MongoDB

  • 1. Open source, high performance database Introduction to NoSQL and MongoDB Will LaForest Senior Director of 10gen Federal will@10gen.com @WLaForest 1
  • 2. SQL Dynamic invented Web Content released 10gen Web applications founded Oracle Client IBM’s IMS founded Server SOA BigTable 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 Codd publishes PC’s gain 3 tier Cloud relational model paper traction architecture Computing in 1970 Brewer’s Cap NoSQL WWW born Movement born 2
  • 3. Attribute Tuple Relation 3
  • 4. 4
  • 5. • Data stored in a RDBMS is very compact (disk was more expensive) • SQL and RDBMS made queries flexible with rigid schemas • Rigid schemas helps optimize joins and storage • Massive ecosystem of tools, libraries and integrations • Been around 40 years! 5
  • 6. • Gartner uses the 3Vs to define • Volume - Big/difficult/extreme volume is relative • Variety – Changing or evolving data – Uncontrolled formats – Does not easily adhere to a single schema – Unknown at design time • Velocity – High or volatile inbound data – High query and read operations – Low latency 6
  • 7. VOLUME & NEW ARCHITECTURES • Systems scaling horizontally, not vertically • Commodity servers • Cloud Computing DATA VARIETY & VOLATILITY • Extremely difficult to find a single fixed schema • Don’t know data schema a-priori TRANSACTIONAL MODEL • N x Inserts or updates • Distributed transactions 7
  • 8. • Non-relational been hanging around (MUMPS?) • Modern NoSQL theory and offerings started in early 2000s • Modern usage of term introduced in 2009 • NoSQL = Not Only SQL • A collection of very different products • Alternatives to relational databases when they are a bad fit • Motives – Horizontally scalable (commodity server/cloud computing) – Flexibility 8
  • 9. • Value (Data) mapped to a key (think primary) • Some typed some just BLOBs • Redis, MemcachDB, Voldemort “Will” • “@WLaForest” “Chris” • 12 “Robert” • BLOB 9
  • 10. Data stored on disk in a column oriented fashion • Predominantly hash based indexing • Data partitioned by range or consistent hashing • Google BigTable, HBase, Cassandra, Accumulo 10
  • 11. • Key-Value Stores – Key-Value Stores – Value (Data) mapped to a key (think primary) • Big Table Descendants – Looks like a distributed multi-dimension map – Data stored in a column oriented fashion – Predominantly hash based indexing • Document oriented stores – Data stored as either JSON or XML documents • What do they all have in common? 11
  • 12. • Not a database • Map reduce on HDFS or other data source • When you want to use it – Can’t use a index – Distributing custom algorithms – ETL • Great for grinding through data • Many NoSQL offerings have native map reduce functionality 12
  • 13. 13
  • 14. • 2007 – Eliot Horowitz & Dwight Merriman tired of reinventing the wheel – 10gen founded – MongoDB Development begins • 2009 – Initial release of MongoDB • 73M+ in funding – Funded by Sequoia, NEA, Union Square Ventures, Flybridge Capital 14
  • 15. • Pre-production Subscriptions for MongoDB – Fixed cost = $30k – 6 month term, unlimited servers • MongoDB Subscriptions – 32 month term, $4k per server – 24x7 support for development and production, 1 hour response time – Onboarding call and quarterly review • Consulting – Cost is $300/hr • Training – MongoDB Developer - $1500/student, 2 day course – MongoDB Administrator - $1500/student, 2 day course – MongoDB Essential - $2250/student, 3 day course • MongoDB Monitoring Service 15
  • 16. • 4 servers with 2 processors each • Total processors equals 8 • Oracle Enterprise Edition – $47,500 per processor plus maintenance – 8 x 47,500 = $380,000 + $76,000 – Total Cost = $456,000 • MongoDB Subscription – $4,000 per server (no processor count) – No license fee or maintenance charge – 4 x 4,000 = $16,000 – Total Cost = $16,000 • MongoDB is $440,000 less expensive 16
  • 17. #2 on Indeed’s Fastest Growing Jobs Jaspersoft BigData Index Demand for MongoDB, the document-oriented NoSQL database, saw the biggest spike with over 200% growth in 2011. 451 Group Google Searches “MongoDB increasing its dominance” 17
  • 18. #2 ON INDEED’S FASTEST GROWING JOBS 18
  • 19. “MongoDB INCREASING ITS DOMINANCE” 19
  • 20. • Scale horizontally over commodity hardware • RDBMSs great so keep what works – Rich data models – Adhoc queries – Fully featured indexes • What doesn’t distribute well? – Long running multi-row transactions – Joins – Both artifacts of the relational data model • Do not homogenize programming interfaces • Local storage first class citizen for DB storage 20
  • 21. Data stored as documents (JSON) • Schema free • CRUD operations – (Create Read Update Delete) • Atomic document operations • Ad hoc Queries like SQL – Equality – Regular expression searches – Ranges – Geospatial • Secondary indexes • Sharding (sometimes called partitioning) for scalability • Replication – HA and read scalability 21
  • 22. MongoDB does not need any defined data schema. • Every document could have different data! {name: “will”, name: “jeff”, {name: “brendan”, eyes: “blue”, eyes: “blue”, aliases: [“el diablo”]} birthplace: “NY”, height: 72, aliases: [“bill”, “la boss: “ben”} ciacco”], {name: “matt”, gender: ”???”, pizza: “DiGiorno”, boss: ”ben”} name: “ben”, height: 72, hat: ”yes”} boss: 555.555.1212} 22
  • 23. Seek = 5+ ms Read = really really fast Post Comment Author 23
  • 24. Post Author Comment Comment Comment Comment Comment 24
  • 25. RDBMS MongoDB Database Database Table Collection Row Document 25
  • 26. • We are running instances on the order of: - 100B objects - 50TB storage - 50K qps per server - ~1200 servers 26
  • 27. • SSL – between client and server – Intra-cluster communication • Authorization at the database level – Read Only/Read+Write/Administrator • Security Roadmap (tentative) – Pluggable authentication 2.4 – Auditing 2.4 – Cell level security 2.6 – Common Criteria certification 27
  • 28. 28
  • 29. var p = { author: “roger”, date: new Date(), text: “Spirited Away”, tags: *“Tezuka”, “Manga”+- > db.posts.save(p) 29
  • 30. >db.posts.find() { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "roger", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "Spirited Away", tags : [ "Tezuka", "Manga" ] } Notes: - _id is unique, but can be anything you’d like 30
  • 31. Create index on any Field in Document // 1 means ascending, -1 means descending >db.posts.ensureIndex({author: 1}) >db.posts.find({author: 'roger'}) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "roger", ... } 31
  • 32. • Conditional Operators – $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type – $lt, $lte, $gt, $gte // find posts with any tags > db.posts.find( {tags: {$exists: true }} ) // find posts matching a regular expression > db.posts.find( {author: /^rog*/i } ) // count posts by author > db.posts.find( {author: ‘roger’- ).count() 32
  • 33. • $set, $unset, $inc, $push, $pushAll, $pull, $pullAll, $bit > comment = { author: “fred”, date: new Date(), text: “Best Movie Ever”- > db.posts.update( { _id: “...” -, $push: {comments: comment} ); 33
  • 34. { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "roger", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "Spirited Away", tags : [ "Tezuka", "Manga" ], comments : [ { author : "Fred", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : "Best Movie Ever" } ]} 34
  • 35. // Index nested documents > db.posts.ensureIndex( “comments.author”:1 ) db.posts.find(,‘comments.author’:’Fred’-) // Index on tags > db.posts.ensureIndex( tags: 1) > db.posts.find( { tags: ’Manga’ - ) // geospatial index > db.posts.ensureIndex( “author.location”: “2d” ) > db.posts.find( “author.location” : , $near : [22,42] } ) 35
  • 36. 36
  • 37. Native Map/Reduce in JS in MongoDB – Distributes across the cluster with good data locality • New aggregation framework – Declarative (no JS required) – Pipeline approach (like Unix ps -ax | tee processes.txt | more) • Hadoop – Intersect the indexing of MongoDB with the brute force parallelization of hadoop – Hadoop MongoDB connector 37
  • 38. 38
  • 39. $project $match $limit $skip $unwind $group $sort { db.article.aggregate( title : “this is my title” , { $project : { author : “bob” , author : 1, posted : new Date () , tags : 1, pageViews : 5 , }}, tags : [ “fun” , “good” , “fun” ] , { $unwind : "$tags" }, comments : [ { $group : { { author :“joe” , text : “this is cool” } , _id : “$tags”, { author :“sam” , text : “this is bad” } authors : { $addToSet : "$author" } ], }} other : { foo : 5 } ); } 39
  • 40. 40
  • 41. Input data MAP Intermediate data REDUCE Output data 1 1 2 2 3 3 41
  • 42. Input data MAP Intermediate data REDUCE Output data 1 1 2 2 3 3 42
  • 43. 43
  • 44. Write Primary Read Asynchronous Secondary Replication Read Driver Secondary Read 44
  • 45. Primary Secondary Read Driver Secondary Read 45
  • 46. Primary Write Primary Automatic Read Leader Election Driver Secondary Read 46
  • 47. Secondary Read Write Read Primary Driver Secondary Read 47
  • 48. Additional Details Clients MongoS MongoS MongoS Config Config Key Range Key Range Key Range Key Range 0..30 31..60 61..90 91.. 100 Config Primary Primary Primary Primary Secondary Secondary Secondary Secondary Secondary Secondary Secondary Secondary 48
  • 49. Sharding Details • Replica Set Details • Consistency Details • Common Deployment Scenarios • Citations 49
  • 50. 50
  • 51. Write Primary Secondary Read Secondary Driver Read 51
  • 52. Write Primary Driver Read Secondary Secondary 52
  • 53. 1. Write Primary Secondary 1. Replicate 2. Read Secondary Driver 2. Read 53
  • 54. • Fire and forget • Wait for error • Wait for fsync • Wait for journal sync • Wait for replication 54
  • 55. Driver Primary write apply in memory 55
  • 56. Driver Primary write getLastError apply in memory 56
  • 57. Driver Primary write getLastError apply in memory j:true Write to journal 57
  • 58. Driver Primary write getLastError apply in memory fsync:true fsync 58
  • 59. Driver Primary Secondary write getLastError apply in memory w:2 replicate 59
  • 60. Value Meaning <n:integer> Replicate to N members of replica set “majority” Replicate to a majority of replica set members <m:modeName> Use cutom error mode name 60
  • 61. 61
  • 62. > db.runCommand( { shardcollection: “test.users”, key: { email: 1 }} ) { name: “Jared”, email: “jsr@10gen.com”, } { name: “Scott”, email: “scott@10gen.com”, } { name: “Dan”, email: “dan@10gen.com”, } 62
  • 63. -∞ +∞ 63
  • 64. -∞ +∞ dan@10gen.com scott@10gen.com jsr@10gen.com 64
  • 65. Split! -∞ +∞ dan@10gen.com scott@10gen.com jsr@10gen.com 65
  • 66. This is a Split! This is a chunk chunk -∞ +∞ dan@10gen.com scott@10gen.com jsr@10gen.com 66
  • 67. -∞ +∞ dan@10gen.com scott@10gen.com jsr@10gen.com 67
  • 68. Split! -∞ +∞ dan@10gen.com scott@10gen.com jsr@10gen.com 68
  • 69. -∞ adam@10gen.com 1 adam@10gen.com jared@10gen.com 1 jared@10gen.com scott@10gen.com 1 scott@10gen.com +∞ 1 • Stored in the config serers • Cached in mongos • Used to route requests and keep cluster balanced 69
  • 70. mongos config balancer config Chunks! config 1 2 3 4 13 14 15 16 25 26 27 28 37 38 39 40 5 6 7 8 17 18 19 20 29 30 31 32 41 42 43 44 9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48 Shard 1 Shard 2 Shard 3 Shard 4 70
  • 71. mongos config balancer config Imbalance config 1 2 3 4 5 6 7 8 9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48 Shard 1 Shard 2 Shard 3 Shard 4 71
  • 72. mongos config balancer config Move chunk 1 to config Shard 2 1 2 3 4 5 6 7 8 9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48 Shard 1 Shard 2 Shard 3 Shard 4 72
  • 73. mongos config balancer config config 1 2 3 4 5 6 7 8 9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48 Shard 1 Shard 2 Shard 3 Shard 4 73
  • 74. mongos config balancer config Chunks 1,2, and 3 have migrated config 4 5 6 7 8 1 2 3 9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48 Shard 1 Shard 2 Shard 3 Shard 4 74
  • 75. 75
  • 76. 1 1.Query arrives at mongos 4 mongos 2.mongos routes query to a single shard 3.Shard returns results 2 of query 3 4.Results returned to client Shard 1 Shard 2 Shard 3 76
  • 77. 1 1.Query arrives at mongos 4 mongos 2.mongos broadcasts query to all shards 3.Each shard returns 2 results for query 2 2 3 3 3 4.Results combined and returned to client Shard 1 Shard 2 Shard 3 77
  • 78. 1 1.Query arrives at mongos 6 2.mongos broadcasts query mongos to all shards 5 3.Each shard locally sorts results 2 4.Results returned to 2 mongos 2 4 4 4 5.mongos merge sorts individual results 3 3 3 6.Combined sorted result Shard 1 Shard 2 Shard 3 returned to client 78
  • 79. Inserts Requires shard db.users.insert({ key name: “Jared”, email: “jsr@10gen.com”}) Removes Routed db.users.delete({ email: “jsr@10gen.com”}) Scattered db.users.delete({name: “Jared”}) Updates Routed db.users.update( {email: “jsr@10gen.com”}, {$set: { state: “CA”}}) Scattered db.users.update( {state: “FZ”}, {$set:{ state: “CA”}} ) 79
  • 80. By Shard Routed db.users.find( {email: “jsr@10gen.com”}) Key Sorted by Routed in order db.users.find().sort({email:-1}) shard key Find by non Scatter Gather db.users.find({state:”CA”}) shard key Sorted by Distributed merge db.users.find().sort({state:1}) sort non shard key 80
  • 81. 81
  • 82. Data Center Primary Secondary Secondary 82
  • 83. Data Center Primary Secondary Secondary hidden=true backups 83
  • 84. Active Data Center Standby Data Center Primary Secondary Secondary priority = 1 priority = 1 84
  • 85. West Coast DC Central DC East Coast DC Secondary Primary Secondary priority = 1 85
  • 86. 86
  • 87. History of Database Management (http://bit.ly/w3r0dv) • EMC IDC Study (http://bit.ly/y1mJgJ) • Gartner & Big Data (http://bit.ly/xvRP3a) • SQL (http://en.wikipedia.org/wiki/SQL) • Database Management Systems http://en.wikipedia.org/wiki/Dbms) • Dynamo: Amazon’s Highly Available Key-value Store (http://bit.ly/A8F8oy) • CAP Theorem (http://bit.ly/zvA6O6) • NoSQL Google File System and BigTable (http://oreil.ly/wOXliP) • NoSQL Movement whitepaper (http://bit.ly/A8RBuJ) • Sample ERD diagram (http://bit.ly/xV30v) 87
  • 88. 88
  • 89. • Impossible for a distributed computer system to simultaneously provide all three of the following guarantees – Consistency - All nodes see the same data at the same time. – Availability - A guarantee that every request receives a response about whether it was successful or failed. – Partition tolerance - No set of failures less than total network failure is allowed to cause the system to respond incorrectly 89

Hinweis der Redaktion

  1. Database requirements are changing … because of i) volume ii) Type of data iii) Agile Development, iv) New architectures.. V) New Apps
  2. Detailed explanation of sharding in optional slides