SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
A Walk down NOSQL Lane
      in the Cloud

    New York City Cloud Computing Group
                          February 2011
                       Alexander Sicular
                               @siculars
Who is this blowhard?
Columbia University pays my mortgage

For the better part of a decade in Medical
Informatics

Am not shilling for any of these companies

Am not a computer scientist

Am a computer science enthusiast
particularly in the area of Informatics
When I put my data in
the “cloud”, to me it
 just means that it’s
    virtualized in
   someone else’s
     server room
...the Silver Lining
Many, many providers and only growing

  Amazon, Rackspace, Joyent, CouchOne,
  Cloudant, Azure, GAE, Heroku, no.de

Outsourced management

Zero capex

Controlled costs
...With a Chance of
         Rain?
Vendor lock in

Unreliable performance

  i/o

  cpu, memory

Bare metal > software virtualization
NoSQL or NOSQL?
Not Only SQL

Non/post relational

Big tent policy

Umbrella term

Fragmented



                      http://www.flickr.com/photos/morgennebel/2933723145/
Your Usage Patterns
Read vs. Write

Mutable vs. Immutable

Product Considerations:

  In place updates

  Write Only Logs
This vs. That
Riak wiki comparisons page
http://wiki.basho.com/Riak-Comparisons.html


Popular one page comparison of a number of
NOSQL players by Kristof Kovacs:
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
NOSQL concepts are
  Not Brand New
Memcached since 2003                       http://memcached.org



Google papers 2004-2006

Amazon Dynamo 2007

Consistent Hashing 2007 http://www.last.fm/user/RJ/journal/
2007/04/10/rz_libketama_-_a_consistent_hashing_algo_for_memcache_clients


Using relational systems as a key-value blob
store
    2009 FriendFeed (not the first)         http://bret.appspot.com/entry/how-
    friendfeed-uses-mysql
Why NOSQL
Support for “Vary Large” data sets

Schemaless

Denormalized

Green field

New applications



                      http://www.flickr.com/photos/gailtang/1243984297/
Academia
Google:

  Bigtable        http://labs.google.com/papers/bigtable.html



  GFS     http://labs.google.com/papers/gfs.html



  M/R     http://labs.google.com/papers/mapreduce.html



Amazon:

  Dynamo         http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf




NOSQL Summer                   http://nosqlsummer.org/papers
Under the Hood
      Terminology
Write Only Log           http://en.wikipedia.org/wiki/Log-structured_file_system



Merkle Trees        http://en.wikipedia.org/wiki/Hash_tree



B-trees   http://en.wikipedia.org/wiki/B-tree



Vector clock       http://en.wikipedia.org/wiki/Vector_clock



Bloom filters       http://en.wikipedia.org/wiki/Bloom_filters



Big O Notation         http://en.wikipedia.org/wiki/Big_o_notation



Consistent Hashing              http://en.wikipedia.org/wiki/Consistent_hashing
CAP Theorem
           http://en.wikipedia.org/wiki/CAP_theorem




Consistency

Availability

Partition Tolerance

   Pick two?

                                             http://guide.couchdb.org/draft/consistency.html
CouchDB
CouchOne, Cloudant    HTTP interface

Erlang                Offline usage

Extreme replication   Sharded scaling
scenarios

Works on phones

Updated indexing
(b-tree)
CouchDB Internal
  Architecture




  http://nosqlpedia.com/wiki/File:CouchDB-Arch.JPG
MongoDB
10Gen, MongoHQ,      Soft landing for
MongoLab             those coming from
                     mysql (relational
C++                  databases)

huMONGOus            Native javascript

Sharded scaling,     Secondary indexes
replicated master/
slave

Located in NYC
(go visit them)
MongoDB Sharding
     Diagram




http://www.snailinaturtleneck.com/blog/2010/03/30/sharding-with-the-fishes/
MySQL to Mongo Query similarity




       http://nosqlpedia.com/wiki/File:MongoDB.JPG
Riak
Basho, Joyent               Multiple backends

Erlang                      Homogeneous

Distributed                 CAP tunable

HTTP, protobuf

Native javascript,
erlang
Hadoop
Cloudera, Apache       Huge ecosystem
Foundation
                          Yahoo, FB, Twitter,
Java                      Fortune 500

High latency              Pig, Hive, Flume

Batch oriented

HDFS is GFS based

Open source Google
stack via the Google
papers
HBase
Java

Low latency store

sits on top of Hadoop

Modeled after Google Bigtable

Column oriented

Thrift, protobuf

Backend for new Facebook Messaging service
Cassandra
Apache

Java

Column oriented

Like Bigtable and Dynamo

Originated at Facebook

At Twitter, Distributed counting
http://www.infoq.com/presentations/NoSQL-at-Twitter-by-Ryan-King
http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
Redis
OpenRedis              incredibly fast

C                      memcached on
                       steroids
REmote
DIctionary             replicated
Server                 master/slave

Specific data
structures
Commonalities
Open Source

Adherence to common or standard:

  data formats

    json, bson, utf8, binary

  data trandport mechanisms

    http, thrift, protobuf,
    simple wire protocols
Ok. So Now What?
Analyze your requirements

Mailing lists

IRC, twitter

Project pages, wiki

Github/Google Code/Bitbucket:

  project page

  specific language clients
Variety Pack
Hybrid architectures will become the norm

Twitter - mysql, cassandra, hadoop

Google - mysql, GAE (BT)

Facebook - mysql,
cassandra, hbase,
memcached

Yahoo - mysql, hadoop

LinkedIn - voldemort       http://www.flickr.com/photos/uncleweed/82245324/
Questions?




New York City Cloud Computing Group
                      February 2011
                   Alexander Sicular
                           @siculars

Weitere ähnliche Inhalte

Was ist angesagt?

Cassandra at mahalo_com_scale_la_meetup_de
Cassandra at mahalo_com_scale_la_meetup_deCassandra at mahalo_com_scale_la_meetup_de
Cassandra at mahalo_com_scale_la_meetup_demahalomeetup
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio, Inc.
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit
 
Visualizing a Database Structure with SchemaSpy
Visualizing a Database Structure with SchemaSpyVisualizing a Database Structure with SchemaSpy
Visualizing a Database Structure with SchemaSpyGuo Albert
 
There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9Gleb Otochkin
 
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...Amazon Web Services
 
Presto Fast SQL on Anything
Presto Fast SQL on AnythingPresto Fast SQL on Anything
Presto Fast SQL on AnythingAlluxio, Inc.
 
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012Amazon Web Services
 
Fluid: When Alluxio Meets Kubernetes
Fluid: When Alluxio Meets KubernetesFluid: When Alluxio Meets Kubernetes
Fluid: When Alluxio Meets KubernetesAlluxio, Inc.
 
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...Insight Technology, Inc.
 
Why Your MongoDB Needs Redis
Why Your MongoDB Needs RedisWhy Your MongoDB Needs Redis
Why Your MongoDB Needs RedisItamar Haber
 
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Alluxio, Inc.
 
Can the elephants handle the no sql onslaught
Can the elephants handle the no sql onslaughtCan the elephants handle the no sql onslaught
Can the elephants handle the no sql onslaughtAung Thu Rha Hein
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case Kai Sasaki
 
Accelerating Hive with Alluxio on S3
Accelerating Hive with Alluxio on S3Accelerating Hive with Alluxio on S3
Accelerating Hive with Alluxio on S3Alluxio, Inc.
 
What does it take to make google work at scale
What does it take to make google work at scale What does it take to make google work at scale
What does it take to make google work at scale 君 廖
 
What does it take to make google work at scale
What does it take to make google work at scale What does it take to make google work at scale
What does it take to make google work at scale xlight
 
Data Warehouse on Kubernetes: lessons from Clickhouse Operator
Data Warehouse on Kubernetes: lessons from Clickhouse OperatorData Warehouse on Kubernetes: lessons from Clickhouse Operator
Data Warehouse on Kubernetes: lessons from Clickhouse OperatorAltinity Ltd
 

Was ist angesagt? (20)

Prestogres internals
Prestogres internalsPrestogres internals
Prestogres internals
 
Cassandra at mahalo_com_scale_la_meetup_de
Cassandra at mahalo_com_scale_la_meetup_deCassandra at mahalo_com_scale_la_meetup_de
Cassandra at mahalo_com_scale_la_meetup_de
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
 
Visualizing a Database Structure with SchemaSpy
Visualizing a Database Structure with SchemaSpyVisualizing a Database Structure with SchemaSpy
Visualizing a Database Structure with SchemaSpy
 
Open source data ingestion
Open source data ingestionOpen source data ingestion
Open source data ingestion
 
There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9
 
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
 
Presto Fast SQL on Anything
Presto Fast SQL on AnythingPresto Fast SQL on Anything
Presto Fast SQL on Anything
 
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
 
Fluid: When Alluxio Meets Kubernetes
Fluid: When Alluxio Meets KubernetesFluid: When Alluxio Meets Kubernetes
Fluid: When Alluxio Meets Kubernetes
 
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
 
Why Your MongoDB Needs Redis
Why Your MongoDB Needs RedisWhy Your MongoDB Needs Redis
Why Your MongoDB Needs Redis
 
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
 
Can the elephants handle the no sql onslaught
Can the elephants handle the no sql onslaughtCan the elephants handle the no sql onslaught
Can the elephants handle the no sql onslaught
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
 
Accelerating Hive with Alluxio on S3
Accelerating Hive with Alluxio on S3Accelerating Hive with Alluxio on S3
Accelerating Hive with Alluxio on S3
 
What does it take to make google work at scale
What does it take to make google work at scale What does it take to make google work at scale
What does it take to make google work at scale
 
What does it take to make google work at scale
What does it take to make google work at scale What does it take to make google work at scale
What does it take to make google work at scale
 
Data Warehouse on Kubernetes: lessons from Clickhouse Operator
Data Warehouse on Kubernetes: lessons from Clickhouse OperatorData Warehouse on Kubernetes: lessons from Clickhouse Operator
Data Warehouse on Kubernetes: lessons from Clickhouse Operator
 

Ähnlich wie A Walk Down NOSQL Lane in the Cloud

Technology Stack Discussion
Technology Stack DiscussionTechnology Stack Discussion
Technology Stack DiscussionZaiyang Li
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h basehdhappy001
 
Mongodb - drupal dev days
Mongodb - drupal dev daysMongodb - drupal dev days
Mongodb - drupal dev daysPierre Joye
 
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...Amazon Web Services
 
Lecture-20.pptx
Lecture-20.pptxLecture-20.pptx
Lecture-20.pptxmohaaalsa
 
Gaming across multiple devices
Gaming across multiple devicesGaming across multiple devices
Gaming across multiple devicesPatric Boscolo
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and HowBigBlueHat
 
Moving from Relational to Document Store
Moving from Relational to Document StoreMoving from Relational to Document Store
Moving from Relational to Document StoreGraham Tackley
 
Using MongoDB to Build a Fast and Scalable Content Repository
Using MongoDB to Build a Fast and Scalable Content RepositoryUsing MongoDB to Build a Fast and Scalable Content Repository
Using MongoDB to Build a Fast and Scalable Content RepositoryMongoDB
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesJon Meredith
 
I’ve outgrown my basic stack. Now what?
I’ve outgrown my basic stack. Now what?I’ve outgrown my basic stack. Now what?
I’ve outgrown my basic stack. Now what?Francis David Cleary
 
NoSQL on microsoft azure april 2014
NoSQL on microsoft azure   april 2014NoSQL on microsoft azure   april 2014
NoSQL on microsoft azure april 2014Brian Benz
 
If NoSQL is your answer, you are probably asking the wrong question.
If NoSQL is your answer, you are probably asking the wrong question.If NoSQL is your answer, you are probably asking the wrong question.
If NoSQL is your answer, you are probably asking the wrong question.Lukas Smith
 
Why we chose mongodb for guardian.co.uk
Why we chose mongodb for guardian.co.ukWhy we chose mongodb for guardian.co.uk
Why we chose mongodb for guardian.co.ukGraham Tackley
 
DRILETT_AWS_VPC_Presentation_2MB
DRILETT_AWS_VPC_Presentation_2MBDRILETT_AWS_VPC_Presentation_2MB
DRILETT_AWS_VPC_Presentation_2MBDavid Rilett
 
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Frank Munz
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irdatastack
 

Ähnlich wie A Walk Down NOSQL Lane in the Cloud (20)

Technology Stack Discussion
Technology Stack DiscussionTechnology Stack Discussion
Technology Stack Discussion
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h base
 
Mongodb - drupal dev days
Mongodb - drupal dev daysMongodb - drupal dev days
Mongodb - drupal dev days
 
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
 
Lecture-20.pptx
Lecture-20.pptxLecture-20.pptx
Lecture-20.pptx
 
Gaming across multiple devices
Gaming across multiple devicesGaming across multiple devices
Gaming across multiple devices
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
 
Moving from Relational to Document Store
Moving from Relational to Document StoreMoving from Relational to Document Store
Moving from Relational to Document Store
 
Using MongoDB to Build a Fast and Scalable Content Repository
Using MongoDB to Build a Fast and Scalable Content RepositoryUsing MongoDB to Build a Fast and Scalable Content Repository
Using MongoDB to Build a Fast and Scalable Content Repository
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
I’ve outgrown my basic stack. Now what?
I’ve outgrown my basic stack. Now what?I’ve outgrown my basic stack. Now what?
I’ve outgrown my basic stack. Now what?
 
NoSQL on microsoft azure april 2014
NoSQL on microsoft azure   april 2014NoSQL on microsoft azure   april 2014
NoSQL on microsoft azure april 2014
 
The Enterprise Cloud
The Enterprise CloudThe Enterprise Cloud
The Enterprise Cloud
 
If NoSQL is your answer, you are probably asking the wrong question.
If NoSQL is your answer, you are probably asking the wrong question.If NoSQL is your answer, you are probably asking the wrong question.
If NoSQL is your answer, you are probably asking the wrong question.
 
Why we chose mongodb for guardian.co.uk
Why we chose mongodb for guardian.co.ukWhy we chose mongodb for guardian.co.uk
Why we chose mongodb for guardian.co.uk
 
Introducing Mache
Introducing MacheIntroducing Mache
Introducing Mache
 
DRILETT_AWS_VPC_Presentation_2MB
DRILETT_AWS_VPC_Presentation_2MBDRILETT_AWS_VPC_Presentation_2MB
DRILETT_AWS_VPC_Presentation_2MB
 
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 

Kürzlich hochgeladen

Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 

Kürzlich hochgeladen (20)

Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 

A Walk Down NOSQL Lane in the Cloud

  • 1. A Walk down NOSQL Lane in the Cloud New York City Cloud Computing Group February 2011 Alexander Sicular @siculars
  • 2. Who is this blowhard? Columbia University pays my mortgage For the better part of a decade in Medical Informatics Am not shilling for any of these companies Am not a computer scientist Am a computer science enthusiast particularly in the area of Informatics
  • 3. When I put my data in the “cloud”, to me it just means that it’s virtualized in someone else’s server room
  • 4. ...the Silver Lining Many, many providers and only growing Amazon, Rackspace, Joyent, CouchOne, Cloudant, Azure, GAE, Heroku, no.de Outsourced management Zero capex Controlled costs
  • 5. ...With a Chance of Rain? Vendor lock in Unreliable performance i/o cpu, memory Bare metal > software virtualization
  • 6. NoSQL or NOSQL? Not Only SQL Non/post relational Big tent policy Umbrella term Fragmented http://www.flickr.com/photos/morgennebel/2933723145/
  • 7. Your Usage Patterns Read vs. Write Mutable vs. Immutable Product Considerations: In place updates Write Only Logs
  • 8. This vs. That Riak wiki comparisons page http://wiki.basho.com/Riak-Comparisons.html Popular one page comparison of a number of NOSQL players by Kristof Kovacs: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
  • 9. NOSQL concepts are Not Brand New Memcached since 2003 http://memcached.org Google papers 2004-2006 Amazon Dynamo 2007 Consistent Hashing 2007 http://www.last.fm/user/RJ/journal/ 2007/04/10/rz_libketama_-_a_consistent_hashing_algo_for_memcache_clients Using relational systems as a key-value blob store 2009 FriendFeed (not the first) http://bret.appspot.com/entry/how- friendfeed-uses-mysql
  • 10. Why NOSQL Support for “Vary Large” data sets Schemaless Denormalized Green field New applications http://www.flickr.com/photos/gailtang/1243984297/
  • 11. Academia Google: Bigtable http://labs.google.com/papers/bigtable.html GFS http://labs.google.com/papers/gfs.html M/R http://labs.google.com/papers/mapreduce.html Amazon: Dynamo http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf NOSQL Summer http://nosqlsummer.org/papers
  • 12. Under the Hood Terminology Write Only Log http://en.wikipedia.org/wiki/Log-structured_file_system Merkle Trees http://en.wikipedia.org/wiki/Hash_tree B-trees http://en.wikipedia.org/wiki/B-tree Vector clock http://en.wikipedia.org/wiki/Vector_clock Bloom filters http://en.wikipedia.org/wiki/Bloom_filters Big O Notation http://en.wikipedia.org/wiki/Big_o_notation Consistent Hashing http://en.wikipedia.org/wiki/Consistent_hashing
  • 13. CAP Theorem http://en.wikipedia.org/wiki/CAP_theorem Consistency Availability Partition Tolerance Pick two? http://guide.couchdb.org/draft/consistency.html
  • 14. CouchDB CouchOne, Cloudant HTTP interface Erlang Offline usage Extreme replication Sharded scaling scenarios Works on phones Updated indexing (b-tree)
  • 15. CouchDB Internal Architecture http://nosqlpedia.com/wiki/File:CouchDB-Arch.JPG
  • 16. MongoDB 10Gen, MongoHQ, Soft landing for MongoLab those coming from mysql (relational C++ databases) huMONGOus Native javascript Sharded scaling, Secondary indexes replicated master/ slave Located in NYC (go visit them)
  • 17. MongoDB Sharding Diagram http://www.snailinaturtleneck.com/blog/2010/03/30/sharding-with-the-fishes/
  • 18. MySQL to Mongo Query similarity http://nosqlpedia.com/wiki/File:MongoDB.JPG
  • 19. Riak Basho, Joyent Multiple backends Erlang Homogeneous Distributed CAP tunable HTTP, protobuf Native javascript, erlang
  • 20. Hadoop Cloudera, Apache Huge ecosystem Foundation Yahoo, FB, Twitter, Java Fortune 500 High latency Pig, Hive, Flume Batch oriented HDFS is GFS based Open source Google stack via the Google papers
  • 21. HBase Java Low latency store sits on top of Hadoop Modeled after Google Bigtable Column oriented Thrift, protobuf Backend for new Facebook Messaging service
  • 22. Cassandra Apache Java Column oriented Like Bigtable and Dynamo Originated at Facebook At Twitter, Distributed counting http://www.infoq.com/presentations/NoSQL-at-Twitter-by-Ryan-King http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
  • 23. Redis OpenRedis incredibly fast C memcached on steroids REmote DIctionary replicated Server master/slave Specific data structures
  • 24. Commonalities Open Source Adherence to common or standard: data formats json, bson, utf8, binary data trandport mechanisms http, thrift, protobuf, simple wire protocols
  • 25. Ok. So Now What? Analyze your requirements Mailing lists IRC, twitter Project pages, wiki Github/Google Code/Bitbucket: project page specific language clients
  • 26. Variety Pack Hybrid architectures will become the norm Twitter - mysql, cassandra, hadoop Google - mysql, GAE (BT) Facebook - mysql, cassandra, hbase, memcached Yahoo - mysql, hadoop LinkedIn - voldemort http://www.flickr.com/photos/uncleweed/82245324/
  • 27. Questions? New York City Cloud Computing Group February 2011 Alexander Sicular @siculars