SlideShare ist ein Scribd-Unternehmen logo
1 von 25
NoSQL Databases


                    J. Pokorný
                    KSI MFF UK
                    pokorny@ksi.mff.cuni.cz



DATAKON 15.-18.9.2011                         1
Introduction
   NoSQL databases in context of cloud computing
   Relaxing ACID properties
   NoSQL databases
        Categories of NoSQL databases
        Typical NoSQL API
        Representatives of NoSQL databases
   Conclusions




DATAKON 2011                  J. Pokorný            2
Cloud computing, cloud databases
   Cloud computing
       data intensive applications on hundreds of thousands of
        commodity servers and storage devices
       basic features:
              elasticity,
              fault-tolerance
              automatic provisioning
   Cloud databases: traditional scaling up (adding new
    expensive big servers) is not possible
     requires higher level of skills

     is not reliable in some cases

   Architectural principle: scaling out (or horizontal
    scaling) based on data partitioning, i.e. dividing the
    database across many (inexpensive) machines

DATAKON 2011                       J. Pokorný                     3
Cloud computing, cloud databases

   Technique: data sharding, i.e. horizontal
    partitioning of data (e.g. hash or range partitioning)
   Consequences:
       manage parallel access in the application
       scales well for both reads and writes
       not transparent, application needs to be partition-aware




DATAKON 2011                    J. Pokorný                         4
Relaxing ACID properties
   Cloud computing: ACID is hard to achieve, moreover, it is
    not always required, e.g. for blogs, status updates, product
    listings, etc.
   Availability
        Traditionally, thought of as the server/process available
         99.999 % of time
        For a large-scale node system, there is a high probability that
         a node is either down or that there is a network partitioning

    Partition tolerance
        ensures that write and read operations are redirected to
         available replicas when segments of the network become
         disconnected
DATAKON 2011                      J. Pokorný                               5
Eventual Consistency
   Eventual Consistency
       When no updates occur for a long period of time, eventually
        all updates will propagate through the system and all the
        nodes will be consistent
       For a given accepted update and a given node, eventually
        either the update reaches the node or the node is removed
        from service
   BASE (Basically Available, Soft state, Eventual
    consistency) properties, as opposed to ACID
              Soft state: copies of a data item may be inconsistent
              Eventually Consistent – copies becomes consistent at some
               later time if there are no more updates to that data item
              Basically Available – possibilities of faults but not a fault of
               the whole system
DATAKON 2011                           J. Pokorný                                 6
CAP Theorem
   Suppose three properties of a system
       Consistency (all copies have same value)
       Availability (system can run even if parts have failed)
       Partitions (network can break into two or more parts, each with
        active systems that can not influence other parts)
   Brewer’s CAP “Theorem”: for any system sharing data it
    is impossible to guarantee simultaneously all of these
    three properties
   Very large systems will partition at some point
       it is necessary to decide between C and A
       traditional DBMS prefer C over A and P
       most Web applications choose A (except in specific applications
        such as order processing)

DATAKON 2011                       J. Pokorný                             7
CAP Theorem
   Drop A or C of ACID
       relaxing C makes replication easy, facilitates fault
        tolerance,
       relaxing A reduces (or eliminates) need for distributed
        concurrency control.




DATAKON 2011                   J. Pokorný                         8
NoSQL databases
   The name stands for Not Only SQL
   Common features:
       non-relational
       usually do not require a fixed table schema
       horizontal scalable
       mostly open source
   More characteristics
       relax one or more of the ACID properties (see CAP theorem)
       replication support
       easy API (if SQL, then only its very restricted variant)
   Do not fully support relational features
       no join operations (except within partitions),
       no referential integrity constraints across partitions.


DATAKON 2011                         J. Pokorný                      9
Categories of NoSQL databases
     key-value stores
     column NoSQL databases
     document-based
     XML databases (myXMLDB, Tamino,
      Sedna)
     graph database (neo4j, InfoGrid)




DATAKON 2011          J. Pokorný         10
Categories of NoSQL databases
     key-value stores
     column NoSQL databases
     document-based
     XML databases (myXMLDB, Tamino, Sedna)
     graph database (neo4j, InfoGrid)




DATAKON 2011         J. Pokorný           11
Key-Value Data Stores

   Example: SimpleDB
       Based on Amazon’s Single Storage Service (S3)
       items (represent objects) having one or more
        pairs (name, value), where name denotes an
        attribute.
       An attribute can have multiple values.
       items are combined into domains.




DATAKON 2011               J. Pokorný                   12
Column-oriented*
   store data in column order
   allow key-value pairs to be stored (and retrieved on
    key) in a massively parallel system
       data model: families of attributes defined in a schema, new
        attributes can be added
       storing principle: big hashed distributed tables
       properties: partitioning (horizontally and/or vertically), high
        availability etc. completely transparent to application



* Better: extendible records

DATAKON 2011                      J. Pokorný                          13
Column-oriented                                                column family


                  “Contents:”         “anchor:cnnsi.com”        “anchor:my.look.ca”



                       <html>             t3
“mff.ksi.www”
                      <html>
                     <html>          t5
                                               “MFF”       t9         “MFF.cz”        t8
                    <html>      t6




    Example: BigTable
         indexed by row key, column key and timestamp. i.e. (row:
          string , column: string , time: int64 )  String.
         rows are ordered in lexicographic order by row key.
         row range for a table is dynamically partitioned, each row
          range is called a tablet.
         columns: syntax is family:qualifier

 DATAKON 2011                         J. Pokorný                                      14
A table representation of a row in
  BigTable
 Row key         Time stamp   Column name               Column family Grandchildren

http://ksi....         t1        "Jack"            "Claire" 7
                       t2        "Jack"            "Claire" 7   "Barbara" 6


                       t3        "Jack"            "Claire" 7   "Barbara" 6   "Magda" 3




  DATAKON 2011                        J. Pokorný                                      15
Column-oriented

   Example: Cassandra
       keyspace: Usually the name of the application;
        e.g., 'Twitter', 'Wordpress‘.
       column family: structure containing an unlimited
        number of rows
       column: a tuple with name, value and time stamp
       key: name of record
       super column: contains more columns



DATAKON 2011                J. Pokorný                     16
Document-based
   based on JSON format: a data model which
    supports lists, maps, dates, Boolean with nesting
   Really: indexed semistructured documents
   Example: Mongo
       {Name:"Jaroslav",
        Address:"Malostranske nám. 25, 118 00 Praha 1“
        Grandchildren: [Claire: "7", Barbara: "6", "Magda: "3", "Kirsten:
           "1", "Otis: "3", Richard: "1"]
        }




DATAKON 2011                        J. Pokorný                              17
Typical NoSQL API
   Basic API access:
       get(key) -- Extract the value given a key
       put(key, value) -- Create or update the value
        given its key
       delete(key) -- Remove the key and its
        associated value
       execute(key, operation, parameters) --
        Invoke an operation to the value (given its
        key) which is a special data structure (e.g.
        List, Set, Map .... etc).
DATAKON 2011               J. Pokorný                   18
Representatives of NoSQL databases
key-valued
  Name          Producer              Data model                         Querying


SimpleDB       Amazon       set of couples (key, {attribute}), restricted SQL; select, delete,
                            where attribute is a couple        GetAttributes, and
                            (name, value)                      PutAttributes operations

Redis          Salvatore    set of couples (key, value),       primitive operations for each
               Sanfilippo   where value is simple typed        value type
                            value, list, ordered (according to
                            ranking) or unordered set, hash
                            value
Dynamo         Amazon       like SimpleDB                      simple get operation and put
                                                               in a context
Voldemort      LinkeId      like SimpleDB                      similar to Dynamo


DATAKON 2011                                J. Pokorný                                     19
Representatives of NoSQL databases
 column-oriented
  Name          Producer             Data model                      Querying


BigTable        Google     set of couples (key, {value})   selection (by combination of
                                                           row, column, and time stamp
                                                           ranges)
HBase           Apache     groups of columns (a BigTable   JRUBY IRB-based shell
                           clone)                          (similar to SQL)
Hypertable      Hypertable like BigTable                   HQL (Hypertext Query
                                                           Language)
CASSANDRA Apache           columns, groups of columns      simple selections on key,
          (originally      corresponding to a key          range queries, column or
          Facebook)        (supercolumns)                  columns ranges
PNUTS     Yahoo            (hashed or ordered) tables,     selection and projection from a
                           typed arrays, flexible schema   single table (retrieve an
                                                           arbitrary single record by
                                                           primary key, range queries,
                                                           complex predicates, ordering,
                                                           top-k)

 DATAKON 2011                             J. Pokorný                                  20
Representatives of NoSQL databases
document-based
    Name       Producer          Data model                     Querying


MongoDB        10gen        object-structured           manipulations with objects in
                            documents stored in         collections (find object or
                            collections;                objects via simple selections
                            each object has a primary   and logical expressions,
                            key called ObjectId         delete, update,)

Couchbase      Couchbase1   document as a list of       by key and key range, views
                            named (structured) items    via Javascript and
                            (JSON document)             MapReduce



after merging Membase and CouchOne
1




DATAKON 2011                          J. Pokorný                                    21
DATAKON 2011   J. Pokorný   22
Conclusions (1)
   NoSQL database cover only a part of data-intensive
    cloud applications (mainly Web applications).
   Problems with cloud computing:
       SaaS applications require enterprise-level functionality,
        including ACID transactions, security, and other features
        associated with commercial RDBMS technology, i.e.
        NoSQL should not be the only option in the cloud.
       Hybrid solutions:
              Voldemort with MySQL as one of storage backend
              deal with NoSQL data as semistructured data
               ⇒ integrating RDBMS and NoSQL via SQL/XML

DATAKON 2011                       J. Pokorný                       23
Conclusions (2)
   next generation of highly scalable and elastic
    RDBMS: NewSQL databases (from April 2011)
       they are designed to scale out horizontally on shared
        nothing machines,
       still provide ACID guarantees,
       applications interact with the database primarily using SQL,
       the system employs a lock-free concurrency control
        scheme to avoid user shut down,
       the system provides higher performance than available
        from the traditional systems.
   Examples: MySQL Cluster (most mature solution),
    VoltDB, Clustrix, ScalArc, …

DATAKON 2011                    J. Pokorný                         24
Conclusions (3)
   New buzzword: SPRAIN – 6 key factors for
    alternative data management:
       Scalability
       Performance
       relaxed consistency
       Agility
       Intracacy
       Necessity
   Conclusion in conclusions: These are exciting times
    for Data Management research and development

DATAKON 2011                  J. Pokorný              25

Weitere ähnliche Inhalte

Was ist angesagt?

BGOUG 2012 - Design concepts for xml applications that will perform
BGOUG 2012 - Design concepts for xml applications that will performBGOUG 2012 - Design concepts for xml applications that will perform
BGOUG 2012 - Design concepts for xml applications that will performMarco Gralike
 
Got database access? Own the network!
Got database access? Own the network!Got database access? Own the network!
Got database access? Own the network!Bernardo Damele A. G.
 
Sql server difference faqs- 6
Sql server difference faqs- 6Sql server difference faqs- 6
Sql server difference faqs- 6Umar Ali
 
BGOUG 2012 - XML Index Strategies
BGOUG 2012 - XML Index StrategiesBGOUG 2012 - XML Index Strategies
BGOUG 2012 - XML Index StrategiesMarco Gralike
 
OpenDremel's Metaxa Architecture
OpenDremel's Metaxa ArchitectureOpenDremel's Metaxa Architecture
OpenDremel's Metaxa ArchitectureCamuel Gilyadov
 
Oracle Database 11g Release 2 - XMLDB New Features
Oracle Database 11g Release 2 - XMLDB New FeaturesOracle Database 11g Release 2 - XMLDB New Features
Oracle Database 11g Release 2 - XMLDB New FeaturesMarco Gralike
 
9780538745840 ppt ch07
9780538745840 ppt ch079780538745840 ppt ch07
9780538745840 ppt ch07Terry Yoast
 
Jdbc 4.0 New Features And Enhancements
Jdbc 4.0 New Features And EnhancementsJdbc 4.0 New Features And Enhancements
Jdbc 4.0 New Features And Enhancementsscacharya
 
Oracle Database 12c - New Features for Developers and DBAs
Oracle Database 12c  - New Features for Developers and DBAsOracle Database 12c  - New Features for Developers and DBAs
Oracle Database 12c - New Features for Developers and DBAsAlex Zaballa
 
SQL injection: Not Only AND 1=1 (updated)
SQL injection: Not Only AND 1=1 (updated)SQL injection: Not Only AND 1=1 (updated)
SQL injection: Not Only AND 1=1 (updated)Bernardo Damele A. G.
 
Design Concepts For Xml Applications That Will Perform
Design Concepts For Xml Applications That Will PerformDesign Concepts For Xml Applications That Will Perform
Design Concepts For Xml Applications That Will PerformMarco Gralike
 
Advanced SQL injection to operating system full control (short version)
Advanced SQL injection to operating system full control (short version)Advanced SQL injection to operating system full control (short version)
Advanced SQL injection to operating system full control (short version)Bernardo Damele A. G.
 
MySQL 8 -- A new beginning : Sunshine PHP/PHP UK (updated)
MySQL 8 -- A new beginning : Sunshine PHP/PHP UK (updated)MySQL 8 -- A new beginning : Sunshine PHP/PHP UK (updated)
MySQL 8 -- A new beginning : Sunshine PHP/PHP UK (updated)Dave Stokes
 
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured DataHotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured DataMarco Gralike
 
Db2 Important questions to read
Db2 Important questions to readDb2 Important questions to read
Db2 Important questions to readPrasanth Dusi
 

Was ist angesagt? (19)

BGOUG 2012 - Design concepts for xml applications that will perform
BGOUG 2012 - Design concepts for xml applications that will performBGOUG 2012 - Design concepts for xml applications that will perform
BGOUG 2012 - Design concepts for xml applications that will perform
 
Got database access? Own the network!
Got database access? Own the network!Got database access? Own the network!
Got database access? Own the network!
 
Sql server difference faqs- 6
Sql server difference faqs- 6Sql server difference faqs- 6
Sql server difference faqs- 6
 
DataBase Management System Lab File
DataBase Management System Lab FileDataBase Management System Lab File
DataBase Management System Lab File
 
BGOUG 2012 - XML Index Strategies
BGOUG 2012 - XML Index StrategiesBGOUG 2012 - XML Index Strategies
BGOUG 2012 - XML Index Strategies
 
OpenDremel's Metaxa Architecture
OpenDremel's Metaxa ArchitectureOpenDremel's Metaxa Architecture
OpenDremel's Metaxa Architecture
 
Oracle Database 11g Release 2 - XMLDB New Features
Oracle Database 11g Release 2 - XMLDB New FeaturesOracle Database 11g Release 2 - XMLDB New Features
Oracle Database 11g Release 2 - XMLDB New Features
 
9780538745840 ppt ch07
9780538745840 ppt ch079780538745840 ppt ch07
9780538745840 ppt ch07
 
Sqlmap
SqlmapSqlmap
Sqlmap
 
Postgresql
PostgresqlPostgresql
Postgresql
 
Jdbc 4.0 New Features And Enhancements
Jdbc 4.0 New Features And EnhancementsJdbc 4.0 New Features And Enhancements
Jdbc 4.0 New Features And Enhancements
 
Oracle Database 12c - New Features for Developers and DBAs
Oracle Database 12c  - New Features for Developers and DBAsOracle Database 12c  - New Features for Developers and DBAs
Oracle Database 12c - New Features for Developers and DBAs
 
SQL injection: Not Only AND 1=1 (updated)
SQL injection: Not Only AND 1=1 (updated)SQL injection: Not Only AND 1=1 (updated)
SQL injection: Not Only AND 1=1 (updated)
 
Design Concepts For Xml Applications That Will Perform
Design Concepts For Xml Applications That Will PerformDesign Concepts For Xml Applications That Will Perform
Design Concepts For Xml Applications That Will Perform
 
Advanced SQL injection to operating system full control (short version)
Advanced SQL injection to operating system full control (short version)Advanced SQL injection to operating system full control (short version)
Advanced SQL injection to operating system full control (short version)
 
MySQL 8 -- A new beginning : Sunshine PHP/PHP UK (updated)
MySQL 8 -- A new beginning : Sunshine PHP/PHP UK (updated)MySQL 8 -- A new beginning : Sunshine PHP/PHP UK (updated)
MySQL 8 -- A new beginning : Sunshine PHP/PHP UK (updated)
 
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured DataHotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
 
Db2 Important questions to read
Db2 Important questions to readDb2 Important questions to read
Db2 Important questions to read
 
Sqlmap
SqlmapSqlmap
Sqlmap
 

Ähnlich wie NoSql databases

Cassandra - An Introduction
Cassandra - An IntroductionCassandra - An Introduction
Cassandra - An IntroductionMikio L. Braun
 
OrientDB introduction - NoSQL
OrientDB introduction - NoSQLOrientDB introduction - NoSQL
OrientDB introduction - NoSQLLuca Garulli
 
Object Relational Mapping with LINQ To SQL
Object Relational Mapping with LINQ To SQLObject Relational Mapping with LINQ To SQL
Object Relational Mapping with LINQ To SQLShahriar Hyder
 
Oracle Database 12c "New features"
Oracle Database 12c "New features" Oracle Database 12c "New features"
Oracle Database 12c "New features" Anar Godjaev
 
Linq to xml
Linq to xmlLinq to xml
Linq to xmlMickey
 
Introduction to NoSQL and Couchbase
Introduction to NoSQL and CouchbaseIntroduction to NoSQL and Couchbase
Introduction to NoSQL and CouchbaseDipti Borkar
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...DataStax
 
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012Dipti Borkar
 
Transition from relational to NoSQL Philly DAMA Day
Transition from relational to NoSQL Philly DAMA DayTransition from relational to NoSQL Philly DAMA Day
Transition from relational to NoSQL Philly DAMA DayDipti Borkar
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Helena Edelson
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkDatabricks
 
SQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerSQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerMichael Rys
 
Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...Tao Cheng
 

Ähnlich wie NoSql databases (20)

Cassandra - An Introduction
Cassandra - An IntroductionCassandra - An Introduction
Cassandra - An Introduction
 
OrientDB introduction - NoSQL
OrientDB introduction - NoSQLOrientDB introduction - NoSQL
OrientDB introduction - NoSQL
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Oracle's history
Oracle's historyOracle's history
Oracle's history
 
Object Relational Mapping with LINQ To SQL
Object Relational Mapping with LINQ To SQLObject Relational Mapping with LINQ To SQL
Object Relational Mapping with LINQ To SQL
 
Oracle Database 12c "New features"
Oracle Database 12c "New features" Oracle Database 12c "New features"
Oracle Database 12c "New features"
 
Interoperability
InteroperabilityInteroperability
Interoperability
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Linq to xml
Linq to xmlLinq to xml
Linq to xml
 
Introduction to NoSQL and Couchbase
Introduction to NoSQL and CouchbaseIntroduction to NoSQL and Couchbase
Introduction to NoSQL and Couchbase
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
 
Java se7 features
Java se7 featuresJava se7 features
Java se7 features
 
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
 
Transition from relational to NoSQL Philly DAMA Day
Transition from relational to NoSQL Philly DAMA DayTransition from relational to NoSQL Philly DAMA Day
Transition from relational to NoSQL Philly DAMA Day
 
MongoDB
MongoDBMongoDB
MongoDB
 
Big data
Big dataBig data
Big data
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
SQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerSQL and NoSQL in SQL Server
SQL and NoSQL in SQL Server
 
Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...
 

Mehr von Murat Çakal

Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
Mongodb open source_high_performance_database
Mongodb open source_high_performance_databaseMongodb open source_high_performance_database
Mongodb open source_high_performance_databaseMurat Çakal
 
Building web applications with mongo db presentation
Building web applications with mongo db presentationBuilding web applications with mongo db presentation
Building web applications with mongo db presentationMurat Çakal
 
Trouble with nosql_dbs
Trouble with nosql_dbsTrouble with nosql_dbs
Trouble with nosql_dbsMurat Çakal
 

Mehr von Murat Çakal (9)

REST vs. SOAP
REST vs. SOAPREST vs. SOAP
REST vs. SOAP
 
Cassandra NoSQL
Cassandra NoSQLCassandra NoSQL
Cassandra NoSQL
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
Mongodb open source_high_performance_database
Mongodb open source_high_performance_databaseMongodb open source_high_performance_database
Mongodb open source_high_performance_database
 
Building web applications with mongo db presentation
Building web applications with mongo db presentationBuilding web applications with mongo db presentation
Building web applications with mongo db presentation
 
Wmware NoSQL
Wmware NoSQLWmware NoSQL
Wmware NoSQL
 
Trouble with nosql_dbs
Trouble with nosql_dbsTrouble with nosql_dbs
Trouble with nosql_dbs
 
RDBMS vs NoSQL
RDBMS vs NoSQLRDBMS vs NoSQL
RDBMS vs NoSQL
 
No sql
No sqlNo sql
No sql
 

NoSql databases

  • 1. NoSQL Databases J. Pokorný KSI MFF UK pokorny@ksi.mff.cuni.cz DATAKON 15.-18.9.2011 1
  • 2. Introduction  NoSQL databases in context of cloud computing  Relaxing ACID properties  NoSQL databases  Categories of NoSQL databases  Typical NoSQL API  Representatives of NoSQL databases  Conclusions DATAKON 2011 J. Pokorný 2
  • 3. Cloud computing, cloud databases  Cloud computing  data intensive applications on hundreds of thousands of commodity servers and storage devices  basic features:  elasticity,  fault-tolerance  automatic provisioning  Cloud databases: traditional scaling up (adding new expensive big servers) is not possible  requires higher level of skills  is not reliable in some cases  Architectural principle: scaling out (or horizontal scaling) based on data partitioning, i.e. dividing the database across many (inexpensive) machines DATAKON 2011 J. Pokorný 3
  • 4. Cloud computing, cloud databases  Technique: data sharding, i.e. horizontal partitioning of data (e.g. hash or range partitioning)  Consequences:  manage parallel access in the application  scales well for both reads and writes  not transparent, application needs to be partition-aware DATAKON 2011 J. Pokorný 4
  • 5. Relaxing ACID properties  Cloud computing: ACID is hard to achieve, moreover, it is not always required, e.g. for blogs, status updates, product listings, etc.  Availability  Traditionally, thought of as the server/process available 99.999 % of time  For a large-scale node system, there is a high probability that a node is either down or that there is a network partitioning  Partition tolerance  ensures that write and read operations are redirected to available replicas when segments of the network become disconnected DATAKON 2011 J. Pokorný 5
  • 6. Eventual Consistency  Eventual Consistency  When no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent  For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service  BASE (Basically Available, Soft state, Eventual consistency) properties, as opposed to ACID  Soft state: copies of a data item may be inconsistent  Eventually Consistent – copies becomes consistent at some later time if there are no more updates to that data item  Basically Available – possibilities of faults but not a fault of the whole system DATAKON 2011 J. Pokorný 6
  • 7. CAP Theorem  Suppose three properties of a system  Consistency (all copies have same value)  Availability (system can run even if parts have failed)  Partitions (network can break into two or more parts, each with active systems that can not influence other parts)  Brewer’s CAP “Theorem”: for any system sharing data it is impossible to guarantee simultaneously all of these three properties  Very large systems will partition at some point  it is necessary to decide between C and A  traditional DBMS prefer C over A and P  most Web applications choose A (except in specific applications such as order processing) DATAKON 2011 J. Pokorný 7
  • 8. CAP Theorem  Drop A or C of ACID  relaxing C makes replication easy, facilitates fault tolerance,  relaxing A reduces (or eliminates) need for distributed concurrency control. DATAKON 2011 J. Pokorný 8
  • 9. NoSQL databases  The name stands for Not Only SQL  Common features:  non-relational  usually do not require a fixed table schema  horizontal scalable  mostly open source  More characteristics  relax one or more of the ACID properties (see CAP theorem)  replication support  easy API (if SQL, then only its very restricted variant)  Do not fully support relational features  no join operations (except within partitions),  no referential integrity constraints across partitions. DATAKON 2011 J. Pokorný 9
  • 10. Categories of NoSQL databases  key-value stores  column NoSQL databases  document-based  XML databases (myXMLDB, Tamino, Sedna)  graph database (neo4j, InfoGrid) DATAKON 2011 J. Pokorný 10
  • 11. Categories of NoSQL databases  key-value stores  column NoSQL databases  document-based  XML databases (myXMLDB, Tamino, Sedna)  graph database (neo4j, InfoGrid) DATAKON 2011 J. Pokorný 11
  • 12. Key-Value Data Stores  Example: SimpleDB  Based on Amazon’s Single Storage Service (S3)  items (represent objects) having one or more pairs (name, value), where name denotes an attribute.  An attribute can have multiple values.  items are combined into domains. DATAKON 2011 J. Pokorný 12
  • 13. Column-oriented*  store data in column order  allow key-value pairs to be stored (and retrieved on key) in a massively parallel system  data model: families of attributes defined in a schema, new attributes can be added  storing principle: big hashed distributed tables  properties: partitioning (horizontally and/or vertically), high availability etc. completely transparent to application * Better: extendible records DATAKON 2011 J. Pokorný 13
  • 14. Column-oriented column family “Contents:” “anchor:cnnsi.com” “anchor:my.look.ca” <html> t3 “mff.ksi.www” <html> <html> t5 “MFF” t9 “MFF.cz” t8 <html> t6  Example: BigTable  indexed by row key, column key and timestamp. i.e. (row: string , column: string , time: int64 )  String.  rows are ordered in lexicographic order by row key.  row range for a table is dynamically partitioned, each row range is called a tablet.  columns: syntax is family:qualifier DATAKON 2011 J. Pokorný 14
  • 15. A table representation of a row in BigTable Row key Time stamp Column name Column family Grandchildren http://ksi.... t1 "Jack" "Claire" 7 t2 "Jack" "Claire" 7 "Barbara" 6 t3 "Jack" "Claire" 7 "Barbara" 6 "Magda" 3 DATAKON 2011 J. Pokorný 15
  • 16. Column-oriented  Example: Cassandra  keyspace: Usually the name of the application; e.g., 'Twitter', 'Wordpress‘.  column family: structure containing an unlimited number of rows  column: a tuple with name, value and time stamp  key: name of record  super column: contains more columns DATAKON 2011 J. Pokorný 16
  • 17. Document-based  based on JSON format: a data model which supports lists, maps, dates, Boolean with nesting  Really: indexed semistructured documents  Example: Mongo  {Name:"Jaroslav", Address:"Malostranske nám. 25, 118 00 Praha 1“ Grandchildren: [Claire: "7", Barbara: "6", "Magda: "3", "Kirsten: "1", "Otis: "3", Richard: "1"] } DATAKON 2011 J. Pokorný 17
  • 18. Typical NoSQL API  Basic API access:  get(key) -- Extract the value given a key  put(key, value) -- Create or update the value given its key  delete(key) -- Remove the key and its associated value  execute(key, operation, parameters) -- Invoke an operation to the value (given its key) which is a special data structure (e.g. List, Set, Map .... etc). DATAKON 2011 J. Pokorný 18
  • 19. Representatives of NoSQL databases key-valued Name Producer Data model Querying SimpleDB Amazon set of couples (key, {attribute}), restricted SQL; select, delete, where attribute is a couple GetAttributes, and (name, value) PutAttributes operations Redis Salvatore set of couples (key, value), primitive operations for each Sanfilippo where value is simple typed value type value, list, ordered (according to ranking) or unordered set, hash value Dynamo Amazon like SimpleDB simple get operation and put in a context Voldemort LinkeId like SimpleDB similar to Dynamo DATAKON 2011 J. Pokorný 19
  • 20. Representatives of NoSQL databases column-oriented Name Producer Data model Querying BigTable Google set of couples (key, {value}) selection (by combination of row, column, and time stamp ranges) HBase Apache groups of columns (a BigTable JRUBY IRB-based shell clone) (similar to SQL) Hypertable Hypertable like BigTable HQL (Hypertext Query Language) CASSANDRA Apache columns, groups of columns simple selections on key, (originally corresponding to a key range queries, column or Facebook) (supercolumns) columns ranges PNUTS Yahoo (hashed or ordered) tables, selection and projection from a typed arrays, flexible schema single table (retrieve an arbitrary single record by primary key, range queries, complex predicates, ordering, top-k) DATAKON 2011 J. Pokorný 20
  • 21. Representatives of NoSQL databases document-based Name Producer Data model Querying MongoDB 10gen object-structured manipulations with objects in documents stored in collections (find object or collections; objects via simple selections each object has a primary and logical expressions, key called ObjectId delete, update,) Couchbase Couchbase1 document as a list of by key and key range, views named (structured) items via Javascript and (JSON document) MapReduce after merging Membase and CouchOne 1 DATAKON 2011 J. Pokorný 21
  • 22. DATAKON 2011 J. Pokorný 22
  • 23. Conclusions (1)  NoSQL database cover only a part of data-intensive cloud applications (mainly Web applications).  Problems with cloud computing:  SaaS applications require enterprise-level functionality, including ACID transactions, security, and other features associated with commercial RDBMS technology, i.e. NoSQL should not be the only option in the cloud.  Hybrid solutions:  Voldemort with MySQL as one of storage backend  deal with NoSQL data as semistructured data ⇒ integrating RDBMS and NoSQL via SQL/XML DATAKON 2011 J. Pokorný 23
  • 24. Conclusions (2)  next generation of highly scalable and elastic RDBMS: NewSQL databases (from April 2011)  they are designed to scale out horizontally on shared nothing machines,  still provide ACID guarantees,  applications interact with the database primarily using SQL,  the system employs a lock-free concurrency control scheme to avoid user shut down,  the system provides higher performance than available from the traditional systems.  Examples: MySQL Cluster (most mature solution), VoltDB, Clustrix, ScalArc, … DATAKON 2011 J. Pokorný 24
  • 25. Conclusions (3)  New buzzword: SPRAIN – 6 key factors for alternative data management:  Scalability  Performance  relaxed consistency  Agility  Intracacy  Necessity  Conclusion in conclusions: These are exciting times for Data Management research and development DATAKON 2011 J. Pokorný 25

Hinweis der Redaktion

  1. -&gt; http://horicky.blogspot.com/2009/11/nosql-patterns.html