Look Mom nosql

•Als PPTX, PDF herunterladen•

2 gefällt mir•9,250 views

Charles Nurse gives an introduction to NoSQL databases. He discusses the CAP theorem and how it relates to consistency, availability and partition tolerance. He also explains map reduce and how it is used to aggregate large amounts of distributed data. Finally, he summarizes different types of NoSQL databases like key-value stores, column-oriented databases and document databases, and provides an example with RavenDB, a .NET based document database.

Technologie Weiterbildung und Persönlichkeitsentwicklung

Look Mom - NoSQL

Charles Nurse
Senior Architect, DotNetNuke Corp.

Look Mom - NoSQL

Who Am I
• ASP.NET MVP (since Jan 2007)
• ASPInsider (since Jan 2008)
• Senior Architect, DotNetNuke Corporation

• Web: www.dotnetnuke.com
• Blog: www.charlesnurse.com
• Email: charles.nurse@dnncorp.com
• Twitter: @cnurse

Look Mom - NoSQL

Agenda
• Intro to NoSQL Databases
 CAP Theorem
 Map Reduce
• NoSQL Database Types
• RavenDB
• Demo

Look Mom - NoSQL

Intro to NoSQL Databases
• Driven by the demands of “Big Data”
 Google
 Facebook
 Amazon

• Huge amounts of data
 Distributed Environment
 Availability

• CAP Theorem

Look Mom - NoSQL

CAP Theorem
• CAP Theorem states
 “It is impossible for a distributed computer system to simultaneously provide
all three of the guarantees”

• Consistency
• Availability
• Partition Tolerance

Look Mom - NoSQL

CAP Theorem
• Consistency
 All nodes in a distributed system see the same data at the same time
 eCommerce
 Weapons Systems

• Availability
 All requests receive a response about whether it was successful or failed

• Partition Tolerance
 The system continues to operate despite arbitrary message loss or failures
of part of the system

Look Mom - NoSQL

CAP Theorem
• Relational Databases emphasise Consistency, so either
Availability or Partition Tolerance will suffer

• NoSQL Databases emphasise Availability and Partition
Tolerance
 Eventual Consistency
 Google Searches do not need to show documents created in the last
few seconds
 Facebook News Feed – do not need to show updates from the last few
seconds

Look Mom - NoSQL

Map Reduce
• NoSQL databases support distributed systems

• Map Reduce helps aggregate data using a pair of functions
 Map function
 Maps input data into its final form
 Can be executed in parallel on each system

 Reduce function
 Operates on results of the Map functions
 Executed repeatedly until results are obtained

$Look Mom - NoSQL Map Reduce (Example) • Blog Documents • Map { from post in docs.posts "type": "post", select new { "name": "Raven's Map/Reduce post.blog_id, functionality", comments_length = comments.length "blog_id": 1342, }; "post_id": 29293921, "tags": ["raven", "nosql"], • Reduce "post_content": "<p>...</p>", from agg in results "comments": [ { group agg by agg.key into g "source_ip": '124.2.21.2', select new { "author": "martin", agg.blog_id, "text": "..." comments_length = g.Sum(x=>x.comments_length) }] }; }$

Look Mom - NoSQL

Map Reduce (Example)
• Apply Map

Look Mom - NoSQL

Map Reduce (Example)
• Reduce
 Step 1

Look Mom - NoSQL

Map Reduce (Example)
• Reduce
 Step 2

Look Mom - NoSQL

Map Reduce (Example)
• Reduce
 Step 3

Look Mom - NoSQL

NoSQL Database Types
• Sorted Ordered Column-Oriented Stores

• Key/Value Stores

• Document Databases

Look Mom - NoSQL

NoSQL Database Types
• Sorted Ordered Column-Oriented Stores

• Pioneered by Google
 BigTable

• Hbase
 Apache Foundation
 Used by Facebook, Stumble Upon, Hulu

• HyperTable
 Baidu

Look Mom - NoSQL

NoSQL Database Types
• Sorted Ordered Column-Oriented Stores

• Compare with Relational Databases.

Look Mom - NoSQL

NoSQL Database Types
• Sorted Ordered Column-Oriented Stores

Look Mom - NoSQL

NoSQL Database Types
• Key/Value Stores

• Membase – built on MemCacheD
 Zynga

• Redis
 Craigslist

• Cassandra
 Facebook, Digg, Reddit, Twitter

Look Mom - NoSQL

NoSQL Database Types
• Document Databases

• MongoDB
 Foursquare
 Github

• CouchDB
 Apple, BBC, CERN

• RavenDB
 Built in .NET, with LINQ

Look Mom - NoSQL

RavenDB
• Document Database
 Using JSON
• Built in .NET
• LINQ Support
• Full-text Search
 Built on Lucene
• Two versions
 Server
 Embedded

Look Mom - NoSQL

Next Steps
• Get RavenDB
 http://ravendb.net/

• Get MongoDB
 www.mongodb.org/
 .NET driver
 https://github.com/mongodb/mongo-csharp-driver/downloads
 FluentMongo – LINQ for MongoDB
 https://github.com/craiggwilson/fluent-mongo
 NoRM
 http://normproject.org/

Look Mom - NoSQL

Thank You

• Email: charles.nurse@dnncorp.com
• Blog: www.charlesnurse.com
• Twitter: @cnurse

Empfohlen

Search and analyze your data with elasticsearchAnton Udovychenko

Introduction to CosmosDB - Azure Bootcamp 2018Josh Carlisle

GuacamoleArangoDB Database

Dev-Friendly OpsJosh Schramm

There and Back Again: How We Drank the Chef Kool-Aid, Sobered Up, and Learned...Chef

Rails on HBaseEffectiveUI

CosmosDB for DBAs & DevelopersNiko Neugebauer

Hadoop for the Absolute BeginnerIke Ellis

Empfohlen

Search and analyze your data with elasticsearchAnton Udovychenko

Introduction to CosmosDB - Azure Bootcamp 2018Josh Carlisle

GuacamoleArangoDB Database

Dev-Friendly OpsJosh Schramm

There and Back Again: How We Drank the Chef Kool-Aid, Sobered Up, and Learned...Chef

Rails on HBaseEffectiveUI

CosmosDB for DBAs & DevelopersNiko Neugebauer

Hadoop for the Absolute BeginnerIke Ellis

Harnessing Spark and Cassandra with GroovySteve Pember

Cassandra Introduction & FeaturesPhil Peace

How & When to Use NoSQL at Websummit DublinAmazon Web Services

C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...DataStax Academy

Amazon Web Services Building Blocks for Drupal Applications and HostingAcquia

Application Development with Apache Cassandra as a ServiceWSO2

Hosting Drupal on Amazon EC2Kornel Lugosi

Rails on HBasezpinter

An Introduction to Amazon’s DynamoDBKnoldus Inc.

Developers summit cassandraで見るNoSQLRyu Kobayashi

개발자를 위한 Amazon Lightsail Deep-Dive창훈 정

Building Enterprise Search Engines using Open Source TechnologiesRahul Singh

Tool Academy: Web Archivingnullhandle

How and when to use NoSQLAmazon Web Services

Sql source controlAndyPickett

Bigdata antipatternsAnurag S

Microsoft's Big Play for Big DataAndrew Brust

SAP Open Source meetup/Speedment - Palo Alto 2015Speedment, Inc.

Rupy2012 ArangoDB Workshop Part2ArangoDB Database

MongoDB - An Agile NoSQL DatabaseGaurav Awasthi

Solr cloud the 'search first' nosql database extended deep divelucenerevolution

NoSQL and MongoDBRajesh Menon

Weitere ähnliche Inhalte

Was ist angesagt?

Harnessing Spark and Cassandra with GroovySteve Pember

Cassandra Introduction & FeaturesPhil Peace

How & When to Use NoSQL at Websummit DublinAmazon Web Services

C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...DataStax Academy

Amazon Web Services Building Blocks for Drupal Applications and HostingAcquia

Application Development with Apache Cassandra as a ServiceWSO2

Hosting Drupal on Amazon EC2Kornel Lugosi

Rails on HBasezpinter

An Introduction to Amazon’s DynamoDBKnoldus Inc.

Developers summit cassandraで見るNoSQLRyu Kobayashi

개발자를 위한 Amazon Lightsail Deep-Dive창훈 정

Building Enterprise Search Engines using Open Source TechnologiesRahul Singh

Tool Academy: Web Archivingnullhandle

How and when to use NoSQLAmazon Web Services

Sql source controlAndyPickett

Bigdata antipatternsAnurag S

Microsoft's Big Play for Big DataAndrew Brust

SAP Open Source meetup/Speedment - Palo Alto 2015Speedment, Inc.

Rupy2012 ArangoDB Workshop Part2ArangoDB Database

MongoDB - An Agile NoSQL DatabaseGaurav Awasthi

Was ist angesagt? (20)

Harnessing Spark and Cassandra with Groovy

Cassandra Introduction & Features

How & When to Use NoSQL at Websummit Dublin

C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...

Amazon Web Services Building Blocks for Drupal Applications and Hosting

Application Development with Apache Cassandra as a Service

Hosting Drupal on Amazon EC2

Rails on HBase

An Introduction to Amazon’s DynamoDB

Developers summit cassandraで見るNoSQL

개발자를 위한 Amazon Lightsail Deep-Dive

Building Enterprise Search Engines using Open Source Technologies

Tool Academy: Web Archiving

How and when to use NoSQL

Sql source control

Bigdata antipatterns

Microsoft's Big Play for Big Data

SAP Open Source meetup/Speedment - Palo Alto 2015

Rupy2012 ArangoDB Workshop Part2

MongoDB - An Agile NoSQL Database

Ähnlich wie Look Mom nosql

Solr cloud the 'search first' nosql database extended deep divelucenerevolution

NoSQL and MongoDBRajesh Menon

Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...OpenBlend society

Untangling - fall2017 - week 8Derek Jacoby

Introducción a NoSQLMongoDB

No sql Databasemymail2ashok

2. Lecture2_NOSQL_KeyValue.pptShaimaaMohamedGalal

NOSQL Meets Relational - The MySQL Ecosystem Gains More FlexibilityIvan Zoratti

UNIT I Introduction to NoSQL.pptxRahul Borate

RevisionDavid Sherlock

Data Modeling for NoSQLTony Tam

Introduction to NoSQLPolarSeven Pty Ltd

NoSql BrownbagSandeep Kumar

UNIT I Introduction to NoSQL.pptxRahul Borate

No sq lv1_0Tuan Luong

Object Relational Database Management SystemAmar Myana

My Sql And Search At CraigslistMySQLConference

NoSql Data Managementsameerfaizan

MongoDB using Grails plugin by puneet behlTO THE NEW | Technology

Introduction to MongoDBSean Laurent

Ähnlich wie Look Mom nosql (20)

Solr cloud the 'search first' nosql database extended deep dive

NoSQL and MongoDB

Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...

Untangling - fall2017 - week 8

Introducción a NoSQL

No sql Database

2. Lecture2_NOSQL_KeyValue.ppt

NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility

UNIT I Introduction to NoSQL.pptx

Revision

Data Modeling for NoSQL

Introduction to NoSQL

NoSql Brownbag

UNIT I Introduction to NoSQL.pptx

No sq lv1_0

Object Relational Database Management System

My Sql And Search At Craigslist

NoSql Data Management

MongoDB using Grails plugin by puneet behl

Introduction to MongoDB

Kürzlich hochgeladen

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

How to write a Business Continuity PlanDatabarracks

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati

Take control of your SAP testing with UiPath Test SuiteDianaGray10

From Family Reminiscence to Scholarly Archive .Alan Dix

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

Kürzlich hochgeladen (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

How to write a Business Continuity Plan

"Debugging python applications inside k8s environment", Andrii Soldatenko

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

Unleash Your Potential - Namagunga Girls Coding Club

Streamlining Python Development: A Guide to a Modern Project Setup

TeamStation AI System Report LATAM IT Salaries 2024

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Developer Data Modeling Mistakes: From Postgres to NoSQL

Advanced Test Driven-Development @ php[tek] 2024

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day

Take control of your SAP testing with UiPath Test Suite

From Family Reminiscence to Scholarly Archive .

Unraveling Multimodality with Large Language Models.pdf

DMCC Future of Trade Web3 - Special Edition

DevoxxFR 2024 Reproducible Builds with Apache Maven

Connect Wave/ connectwave Pitch Deck Presentation

Are Multi-Cloud and Serverless Good or Bad?

Designing IA for AI - Information Architecture Conference 2024

Look Mom nosql

1. Look Mom - NoSQL Charles Nurse Senior Architect, DotNetNuke Corp.

2. Look Mom - NoSQL Who Am I • ASP.NET MVP (since Jan 2007) • ASPInsider (since Jan 2008) • Senior Architect, DotNetNuke Corporation • Web: www.dotnetnuke.com • Blog: www.charlesnurse.com • Email: charles.nurse@dnncorp.com • Twitter: @cnurse

3. Look Mom - NoSQL Agenda • Intro to NoSQL Databases  CAP Theorem  Map Reduce • NoSQL Database Types • RavenDB • Demo

4. Look Mom - NoSQL Intro to NoSQL Databases • Driven by the demands of “Big Data”  Google  Facebook  Amazon • Huge amounts of data  Distributed Environment  Availability • CAP Theorem

5. Look Mom - NoSQL CAP Theorem • CAP Theorem states  “It is impossible for a distributed computer system to simultaneously provide all three of the guarantees” • Consistency • Availability • Partition Tolerance

6. Look Mom - NoSQL CAP Theorem • Consistency  All nodes in a distributed system see the same data at the same time  eCommerce  Weapons Systems • Availability  All requests receive a response about whether it was successful or failed • Partition Tolerance  The system continues to operate despite arbitrary message loss or failures of part of the system

7. Look Mom - NoSQL CAP Theorem • Relational Databases emphasise Consistency, so either Availability or Partition Tolerance will suffer • NoSQL Databases emphasise Availability and Partition Tolerance  Eventual Consistency  Google Searches do not need to show documents created in the last few seconds  Facebook News Feed – do not need to show updates from the last few seconds

8. Look Mom - NoSQL Map Reduce • NoSQL databases support distributed systems • Map Reduce helps aggregate data using a pair of functions  Map function  Maps input data into its final form  Can be executed in parallel on each system  Reduce function  Operates on results of the Map functions  Executed repeatedly until results are obtained

9. Look Mom - NoSQL Map Reduce (Example) • Blog Documents • Map { from post in docs.posts "type": "post", select new { "name": "Raven's Map/Reduce post.blog_id, functionality", comments_length = comments.length "blog_id": 1342, }; "post_id": 29293921, "tags": ["raven", "nosql"], • Reduce "post_content": "<p>...</p>", from agg in results "comments": [ { group agg by agg.key into g "source_ip": '124.2.21.2', select new { "author": "martin", agg.blog_id, "text": "..." comments_length = g.Sum(x=>x.comments_length) }] }; }

10. Look Mom - NoSQL Map Reduce (Example) • Apply Map

11. Look Mom - NoSQL Map Reduce (Example) • Reduce  Step 1

12. Look Mom - NoSQL Map Reduce (Example) • Reduce  Step 2

13. Look Mom - NoSQL Map Reduce (Example) • Reduce  Step 3

14. Look Mom - NoSQL Map Reduce (Example) • Apply Map

15. Look Mom - NoSQL Map Reduce (Example) • Reduce  Step 1

16. Look Mom - NoSQL Map Reduce (Example) • Reduce  Step 2

17. Look Mom - NoSQL Map Reduce (Example) • Reduce  Step 3

18. Look Mom - NoSQL NoSQL Database Types • Sorted Ordered Column-Oriented Stores • Key/Value Stores • Document Databases

19. Look Mom - NoSQL NoSQL Database Types • Sorted Ordered Column-Oriented Stores • Pioneered by Google  BigTable • Hbase  Apache Foundation  Used by Facebook, Stumble Upon, Hulu • HyperTable  Baidu

20. Look Mom - NoSQL NoSQL Database Types • Sorted Ordered Column-Oriented Stores • Compare with Relational Databases.

21. Look Mom - NoSQL NoSQL Database Types • Sorted Ordered Column-Oriented Stores

22. Look Mom - NoSQL NoSQL Database Types • Sorted Ordered Column-Oriented Stores

23. Look Mom - NoSQL NoSQL Database Types • Key/Value Stores • Membase – built on MemCacheD  Zynga • Redis  Craigslist • Cassandra  Facebook, Digg, Reddit, Twitter

24. Look Mom - NoSQL NoSQL Database Types • Document Databases • MongoDB  Foursquare  Github • CouchDB  Apple, BBC, CERN • RavenDB  Built in .NET, with LINQ

25. Look Mom - NoSQL RavenDB • Document Database  Using JSON • Built in .NET • LINQ Support • Full-text Search  Built on Lucene • Two versions  Server  Embedded

26. Look Mom - NoSQL Demo

27. Look Mom - NoSQL Next Steps • Get RavenDB  http://ravendb.net/ • Get MongoDB  www.mongodb.org/  .NET driver  https://github.com/mongodb/mongo-csharp-driver/downloads  FluentMongo – LINQ for MongoDB  https://github.com/craiggwilson/fluent-mongo  NoRM  http://normproject.org/

28. Look Mom - NoSQL Thank You • Email: charles.nurse@dnncorp.com • Blog: www.charlesnurse.com • Twitter: @cnurse

Hinweis der Redaktion

ConsistencyA service that is consistent operates fully or not at all. Gilbert and Lynch use the word "atomic" instead of consistent in their proof, which makes more sense technically because, strictly speaking, consistent is the C in ACID as applied to the ideal properties of database transactions and means that data will never be persisted that breaks certain pre-set constraints. But if you consider it a preset constraint of distributed systems that multiple values for the same piece of data are not allowed then I think the leak in the abstraction is plugged (plus, if Brewer had used the word atomic, it would be called the AAP theorem and we'd all be in hospital every time we tried to pronounce it).In the book buying example you can add the book to your basket, or fail. Purchase it, or not. You can't half-add or half-purchase a book. There's one copy in stock and only one person will get it the next day. If both customers can continue through the order process to the end (i.e. make payment) the lack of consistency between what's in stock and what's in the system will cause an issue. Maybe not a huge issue in this case - someone's either going to be bored on vacation or spilling soup - but scale this up to thousands of inconsistencies and give them a monetary value (e.g. trades on a financial exchange where there's an inconsistency between what you think you've bought or sold and what the exchange record states) and it's a huge issue.We might solve consistency by utilising a database. At the correct moment in the book order process the number of War and Peace books-in-stock is decremented by one. When the other customer reaches this point, the cupboard is bare and the order process will alert them to this without continuing to payment. The first operates fully, the second not at all.Databases are great at this because they focus on ACID properties and give us Consistency by also giving us Isolation, so that when Customer One is reducing books-in-stock by one, and simultaneously increasing books-in-basket by one, any intermediate states are isolated from Customer Two, who has to wait a few milliseconds while the data store is made consistent.AvailabilityAvailability means just that - the service is available (to operate fully or not as above). When you buy the book you want to get a response, not some browser message about the web site being uncommunicative. Gilbert & Lynch in their proof of CAP Theorem make the good point that availability most often deserts you when you need it most - sites tend to go down at busy periods precisely because they are busy. A service that's available but not being accessed is of no benefit to anyone.Partition ToleranceIf your application and database runs on one box then (ignoring scale issues and assuming all your code is perfect) your server acts as a kind of atomic processor in that it either works or doesn't (i.e. if it has crashed it's not available, but it won't cause data inconsistency either).Once you start to spread data and logic around different nodes then there's a risk of partitions forming. A partition happens when, say, a network cable gets chopped, and Node A can no longer communicate with Node B. With the kind of distribution capabilities the web provides, temporary partitions are a relatively common occurrence and, as I said earlier, they're also not that rare inside global corporations with multiple data centres.Gilbert & Lynch defined partition tolerance as:No set of failures less than total network failure is allowed to cause the system to respond incorrectlyand noted Brewer's comment that a one-node partition is equivalent to a server crash, because if nothing can connect to it, it may as well not be there.
Map / Reduce is just a pair of functions, operating over a list of data. In C#, Linq actually gives us a great chance to do things in a way that make it very easy to understand and work with. Let us say that we want to be about to get a count of comments per blog. We can do that using the following Map / Reduce queries:There are a couple of things to note here:The first query is the map query, it maps the input document into the final format.The second query is the reduce query, it operate over a set of results and produce an answer.Note that the reduce query must return its result in the same format that it received it, why will be explained shortly.The first value in the result is the key, which is what we are aggregating on (think the group by clause in SQL).
We have some blog posts in a multi-blogger environment (Wordpress, Blogger etc)Distributed over 4 systems.For simplicity we have distributed them equally.We want to find the total number of comments each blog has.
The next step is to start reducing the results, in real Map/Reduce algorithms, we partition the original input, and work toward the final result. In this case, imagine that the output of the first step was divided into groups of 3 (so 4 groups overall), and then the reduce query was applied to it, giving us:
You can see why it was called reduce, for every batch, we apply a sum by blog_id to get a new Total Comments value. We started with 11 rows, and we ended up with just 10. That is where it gets interesting, because we are still not done, we can still reduce the data further.This is what we do in the third step, reducing the data further still. That is why the input & output format of the reduce query must match, we will feed the output of several the reduce queries as the input of a new one. You can also see that now we moved from having 10 rows to have just 8
And now we are done, we can't reduce the data any further because all the keys are unique.There is another interesting property of Map / Reduce, let us say that I just added a comment to a post, that would obviously invalidate the results of the query, right?Well, yes, but not all of them. Assuming that I added a comment to the post whose id is 10, what would I need to do to recalculate the right result?Map Doc #10 againReduce Step 2, Batch #3 againReduce Step 3, Batch #1 againReduce Step 4
And now we are done, we can't reduce the data any further because all the keys are unique.There is another interesting property of Map / Reduce, let us say that I just added a comment to a post, that would obviously invalidate the results of the query, right?Well, yes, but not all of them. Assuming that I added a comment to the post whose id is 10, what would I need to do to recalculate the right result?Map Doc #10 againReduce Step 2, Batch #3 againReduce Step 3, Batch #1 againReduce Step 4
Google, who were one of the main pioneers of the trend to NoSQL databases, due to the sheer volume of data they store uses a system (Bigtable) that stores data in a column-oriented way. This compares with typical Relational systems which are row-oriented.Each unit of data can be thought of as a set of key-value pairs, with the unit being identified by a “primary key”. Bigtable calls this the “row key”. The units of data are sorted and ordered on the basis of this row key. So far this isn’t really much different than a Table in a Relational model that has a primary key field and a clustered index.What makes the store “column-oriented” is that the various pieces of information that define the “record” of data, can be divided into groups of columns or column families. For example, if we are saving information about a person, we may define first_name and last_name fields, which can be grouped in a name column family. Likewise we could define street_address, city and zip_code, which can be grouped in a address column family, and sex and age which can be grouped in a profile column family.We now have 3 column families or buckets of information. In a column-oriented store column families are typically defined at configuration or startup but the individual columns need not be pre-defined.Within each bucket, only key/value pairs are defined. The column key identifies the column family or bucket to use and the row key identifies the individual columns within the bucket.Like many NoSQL databases there is not really a concept of NULL data. New columns can be added at any time as it is just another key/value pair in the bucket.While data that relates to the same row key will often be stored in a contiguous fashion, this set up allows for data to be partitioned across multiple computer nodes.
A relational database assumes that each column defined in the table schema will have a value for each row that is present in the table. NULL values are usually represented with a special marker (e.g. \\N). The primary key and column identifier are implicitly associated with each cell based on its physical position within the layout. The following diagram illustrates how a relational database table might be laid out on disk.
Hypertable (and Bigtable) takes its design from the Log Structured Merge Tree. It flattens out the table structure into a sorted list of key/value pairs, each one representing a cell in the table. The key includes the full row and column identifier, which means each cell is provided complete addressing information. Cells that are NULL are simply not included in the list which makes this design particularly well-suited for sparse data. The following diagram illustrates how Hypertable stores table data on-disk.Though there can be a fair amount of redundancy in the row keys and column identifiers, Hypertable employs key-prefix and block data compression which considerably mitigates this problem.
Document Databases are not document management systems. A document in this case is a loosely structured set of key/value pairs in documents, typically using JSON (JavaScript Object Notation), not a Word Processing Document or Spreadsheet.One core benefit for object-oriented developers is that we can think of a document as mapping to an object, including any contained collections/objects, although in reality what we mean here are objects that are considered to be “aggregate roots”.Document Databases treat a document as a whole – rather than splitting it into its constituent key/value pairs.