SlideShare ist ein Scribd-Unternehmen logo
1 von 23
NoSQL
LQS ylnO toN



               Devoxx 2009
How did we get here?
Daisy
(this is what we do)
                                                      Java




                       open source cms
                                         MySQL               Lucene
                                             system- and
                                                                 full-text indexes
                                              metadata




                                                 filesystem       actual content
rather nice query language
rather nice publishing features


wanted to move away from
WCMS
fast access control and facet
browser requires the active
document set to reside in
memory cache
                                  SCALE?
Findings
 we learned a lot (about content management) over the past 6
 years (like: versioning, staging, multilinguality, searching, access
 control, publishing)
 people don’t like Cocoon / XSLT, prefer templating instead
 a lot of our project specifics were about finding the correct
 storage model for specific data structures (not all data was fit for
 our CMS)
 customers with growing ambitions → Daisy has to follow !
Overhaul
get rid of Cocoon → Kauri
quest for differentation
    “the internet barrier”
    combine massive storage
    with useful query-ability



                                REST, Maven, Spring,
                                webapps
                                www.kauriproject.org
The Internet Barrier
                       fundamental technology gap
             NoSQL
                       buy/use vs build
                       focus on architecture and
                       infrastructure

             SQL       rise of semi- or free-
                       structured information
                       layered approach
                       http://github.com/blog/530-how-we-made-github-fast
Conclusions

let’s start from scratch (ouch)
a different architecture / foundations
  scale big and be available
  modularity (pluggability)
  we’re not a banking application, so consistency might be less
  important
Storage challenges
sparse data structures
flexible, evolving data structures
lack of good fault-tolerance setups
cope with scale
CAP vs BASE
(Google) BigTable and (Amazon) Dynamo
ACID                                  vs             BASE
atomicity consistency isolation durability              basically available, soft state, eventually consistent

    Strong Consistency                                      Weak Consistency

    Isolation                                               Availability First

    Focus on “commit”                                       Best Effort

    Nested Transactions                                     Approximative Answers OK

    Availability?                                           Aggressive (optimistic)

    Conservative (pessimistic)                              Simpler!

    Difficult Evolution (schema)                             Faster

                                                            Easier Evolution

                                             spectrum                                     slide: Mark Brewer
Our CAP multilemma
                       scale



    consistency                                  partition
                   availability
      of data                                   tolerance
                   ping means results   ‘cluster splits’ should not block




                  fault tolerance
C?A?P?

Initial gut feeling: cAP
  A was a given
  C would be a function of our datastore choice
  however the P seemed like a nice-to-have (aka over-ambitious
  use-case)
CAPondering

  HBase vs Cassandra
     consistency vs SPOF ?
     possible higher latency vs possibly frailer community ?
  Cocoon trauma


http://www.cs.cornell.edu/projects/ladis2009/talks/ramakrishnan-keynote-ladis2009.pdf
Comparison Matrix
                        Partitioning              Availability                     Replication                             Storage




                                                                                                   Consistency
                                                                      Sync/async


                                                                                      Local/geo
            Hash/sort




                                                                                                                   Durability
                            Dynamic




                                                                                                                                Reads/
                                                 Failures
                                                 handled
                                       Routing




                                                                                                                                writes
                                                            During
                                                            failure
                                                  Colo+                                Local+                     Double         Buffer
PNUTS       H+S                         Rtr
                                                            Read+     Async                       Timeline +
                             Y                    server    write                      geo        Eventual        WAL            pages

                                                                                      Local+                                     Buffer
MySQL       H+S              N          Cli       Colo+     Read      Async                        ACID           WAL
                                                                                                                                 pages
                                                  server                              nearby
                                                            Read+                     Local+       N/A (no       Triple
                             Y
                                                  Colo+               Sync                                                        Files
HDFS        Other                       Rtr       server    write                     nearby      updates)       replication


                                                  Colo+     Read+                     Local+      Multi-         Triple         LSM/
BigTable        S            Y          Rtr                 write
                                                                      Sync            nearby      version        replication    SSTable
                                                  server
                                                 Colo+      Read+                     Local+                                     Buffer
Dynamo          H            Y        P2P                   write     Async           nearby
                                                                                                  Eventual         WAL
                                                                                                                                 pages
                                                 server
                                                            Read+     Sync+           Local+                      Triple        LSM/
Cassandra   H+S              Y        P2P         Colo+                                           Eventual        WAL           SSTable
                                                  server    write     Async           nearby

Megastore       S            Y          Rtr       Colo+     Read+
                                                                      Sync            Local+
                                                                                                   ACID/
                                                                                                                 Triple
                                                                                                                 replication
                                                                                                                                LSM/
                                                                                                                                SSTable
                                                  server    write                     nearby       other
Azure           S            N          Cli      Server     Read+                                                                Buffer
                                                            write
                                                                      Sync             Local        ACID           WAL
                                                                                                                                 pages



                                                                                                                 100                  100
HBase
sorted
distributed         persisted
column-oriented     storage system
multi-dimensional
highly-available
high-performance    adds random access reads
                    and writes atop HDFS
HBase
Apache Hadoop sub-project
hadoop.apache.org/hbase
0.20.1 (12/Oct 2009)
People
Inventors                    Project leads
  Google BigTable ☺            Michael Stack (Powerset/
  Jim Kellerman (Powerset/     Microsoft)
  Microsoft)                   Jonathan Gray (Streamy.com)
  Mike Cafarella (UMich)       Ryan Rawson (StumbleUpon)
                               Jean-Daniel Cryans (SU)
                               Bryan Duxbury (Rapleaf)
© lars george
HBase data model
Distributed multi-dimensional   Keys are arbitrary strings
sparse map
                                Access to row data is atomic
Multi-dimensional keys:
(table, row, family:column,
timestamp) → value
Date: Thu, 12 Nov 2009 18:19:50 -0800
Message-ID: <78568af10911121819x292527b2t7f8b7d857c3650b2@mail.gmail.com>
Subject: Re: newbie: need help on understanding HBase
From: Ryan Rawson <ryanobjc@gmail.com>
To: hbase-user@hadoop.apache.org

HBase is semi-column oriented. Column families is the storage model -
everything in a column family is stored in a file linearly in HDFS.
That means accessing data from a column family is really cheap and
easy. Adding more column families adds more files - it has the
performance profile of adding new tables, except you dont actually
have additional tables, so the conceptual complexity stays low.

Data is stored at the "intersection" of the rowid, and the column
family + qualifier. This is sometimes called a "Cell" - contains a
timestamp as well. You can have multiple versions all timestamped.
The timestamp is by default the int64/java system milli time. I have
to recommend against setting the timestamp explicitly if you can avoid
it. So when you retrieve a row, you can get everything, a list of
column qualifiers or a list of families or any combo. (eg: list of
these qualifiers out of family A and everything from family B)

[...]

The terms to use are:
- Column family (or just family): the unit of locality in hbase.
Everything in a family is stored in 1 (or a set) of files. A table is
a name and a list of families with attributes for those families (eg:
compression). A family is a string.
- Column qualifier (or just qualifier): allows you to store multiple
values for the same row in 1 family. This value is a byte array and
can be anything. The API converts null => new byte[0]{}. This is the
tricky bit, since most people dont think of "column names" as being
dynamic.
- Cell - the old name for a value + timestamp. The new API (see:
class Result) doesn't use this term, instead provides a different path
to read data.

You can use HBase as a normal datastore and use static names for the
qualifiers, and that is just fine. But if you need something special
to get past the lack of relations, you can start to do fun things with
the qualifier as data. Building a secondary index for example. The
row key would be the secondary value (eg: city) and the qualifier
would be the primary key (eg: userid) and the value would be a
placeholder to indicate the value exists.
E-R


Blog


       partitioning

                            sequential scans
Getting data in and out
 Java API
 Thrift multi-language API
 Stargate REST connector
 HBase shell (JRuby IRB-
 based)
 Processing: MapReduce and
 Cascading
Questions?
stevenn@outerthought.org
twitter: @stevenn / @outerthought

Weitere ähnliche Inhalte

Was ist angesagt?

Obtenga más de Microsoft SQL Server 2012 en el entorno de nube privada
Obtenga más de Microsoft SQL Server  2012 en el entorno de nube privadaObtenga más de Microsoft SQL Server  2012 en el entorno de nube privada
Obtenga más de Microsoft SQL Server 2012 en el entorno de nube privadaEduardo Castro
 
Windows Sql Azure Cloud Computing Platform
Windows Sql Azure Cloud Computing PlatformWindows Sql Azure Cloud Computing Platform
Windows Sql Azure Cloud Computing PlatformEduardo Castro
 
Converged infrastructure ucc
Converged infrastructure  uccConverged infrastructure  ucc
Converged infrastructure ucctamar1981
 
MyCassandra (Full English Version)
MyCassandra (Full English Version)MyCassandra (Full English Version)
MyCassandra (Full English Version)Shun Nakamura
 
Nic teaming and converged fabric
Nic teaming and converged fabricNic teaming and converged fabric
Nic teaming and converged fabrichypervnu
 
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...Shun Nakamura
 
Ramakrishnan Keynote Ladis2009
Ramakrishnan Keynote Ladis2009Ramakrishnan Keynote Ladis2009
Ramakrishnan Keynote Ladis2009yarapavan
 
NYC Meetup November 15, 2012
NYC Meetup November 15, 2012NYC Meetup November 15, 2012
NYC Meetup November 15, 2012NuoDB
 
PhillyDB Hbase and MapR M7 - March 2013
PhillyDB Hbase and MapR M7 - March 2013PhillyDB Hbase and MapR M7 - March 2013
PhillyDB Hbase and MapR M7 - March 2013PhillyDB
 
Sap On Esx Backup Methodology
Sap On Esx   Backup MethodologySap On Esx   Backup Methodology
Sap On Esx Backup MethodologyMaarten Daniels
 
Charlie Talk - Everything At The Click Of A Button
Charlie Talk - Everything At The Click Of A ButtonCharlie Talk - Everything At The Click Of A Button
Charlie Talk - Everything At The Click Of A ButtonAtlassian
 
Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Benoit Hudzia
 
Cloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 TutorialCloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 TutorialStuart Charlton
 
CA Nimsoft xen desktop monitoring
CA Nimsoft xen desktop monitoring CA Nimsoft xen desktop monitoring
CA Nimsoft xen desktop monitoring CA Nimsoft
 
chiba-research 2010-01-22 at rakuten meeting
chiba-research 2010-01-22 at rakuten meetingchiba-research 2010-01-22 at rakuten meeting
chiba-research 2010-01-22 at rakuten meetingTatsuhiro Chiba
 
Présentation Archive eXchange Format (AXF) par Front porch Digital - ficam ju...
Présentation Archive eXchange Format (AXF) par Front porch Digital - ficam ju...Présentation Archive eXchange Format (AXF) par Front porch Digital - ficam ju...
Présentation Archive eXchange Format (AXF) par Front porch Digital - ficam ju...Marc Bourhis
 
Panasas pNFS Status (September 2010)
Panasas pNFS Status (September 2010)Panasas pNFS Status (September 2010)
Panasas pNFS Status (September 2010)Panasas
 

Was ist angesagt? (17)

Obtenga más de Microsoft SQL Server 2012 en el entorno de nube privada
Obtenga más de Microsoft SQL Server  2012 en el entorno de nube privadaObtenga más de Microsoft SQL Server  2012 en el entorno de nube privada
Obtenga más de Microsoft SQL Server 2012 en el entorno de nube privada
 
Windows Sql Azure Cloud Computing Platform
Windows Sql Azure Cloud Computing PlatformWindows Sql Azure Cloud Computing Platform
Windows Sql Azure Cloud Computing Platform
 
Converged infrastructure ucc
Converged infrastructure  uccConverged infrastructure  ucc
Converged infrastructure ucc
 
MyCassandra (Full English Version)
MyCassandra (Full English Version)MyCassandra (Full English Version)
MyCassandra (Full English Version)
 
Nic teaming and converged fabric
Nic teaming and converged fabricNic teaming and converged fabric
Nic teaming and converged fabric
 
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
 
Ramakrishnan Keynote Ladis2009
Ramakrishnan Keynote Ladis2009Ramakrishnan Keynote Ladis2009
Ramakrishnan Keynote Ladis2009
 
NYC Meetup November 15, 2012
NYC Meetup November 15, 2012NYC Meetup November 15, 2012
NYC Meetup November 15, 2012
 
PhillyDB Hbase and MapR M7 - March 2013
PhillyDB Hbase and MapR M7 - March 2013PhillyDB Hbase and MapR M7 - March 2013
PhillyDB Hbase and MapR M7 - March 2013
 
Sap On Esx Backup Methodology
Sap On Esx   Backup MethodologySap On Esx   Backup Methodology
Sap On Esx Backup Methodology
 
Charlie Talk - Everything At The Click Of A Button
Charlie Talk - Everything At The Click Of A ButtonCharlie Talk - Everything At The Click Of A Button
Charlie Talk - Everything At The Click Of A Button
 
Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012
 
Cloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 TutorialCloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 Tutorial
 
CA Nimsoft xen desktop monitoring
CA Nimsoft xen desktop monitoring CA Nimsoft xen desktop monitoring
CA Nimsoft xen desktop monitoring
 
chiba-research 2010-01-22 at rakuten meeting
chiba-research 2010-01-22 at rakuten meetingchiba-research 2010-01-22 at rakuten meeting
chiba-research 2010-01-22 at rakuten meeting
 
Présentation Archive eXchange Format (AXF) par Front porch Digital - ficam ju...
Présentation Archive eXchange Format (AXF) par Front porch Digital - ficam ju...Présentation Archive eXchange Format (AXF) par Front porch Digital - ficam ju...
Présentation Archive eXchange Format (AXF) par Front porch Digital - ficam ju...
 
Panasas pNFS Status (September 2010)
Panasas pNFS Status (September 2010)Panasas pNFS Status (September 2010)
Panasas pNFS Status (September 2010)
 

Ähnlich wie NoSQL "Tools in Action" talk at Devoxx

Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandApachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandRichard McDougall
 
Intuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin StopfordIntuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin StopfordJAXLondon_Conference
 
Cloud Computing & Scaling Web Apps
Cloud Computing & Scaling Web AppsCloud Computing & Scaling Web Apps
Cloud Computing & Scaling Web AppsMark Slingsby
 
Deliver Big Data, Database and AI/ML as-a-Service anywhere
Deliver Big Data, Database and AI/ML as-a-Service anywhereDeliver Big Data, Database and AI/ML as-a-Service anywhere
Deliver Big Data, Database and AI/ML as-a-Service anywhereRavikumar Alluboyina
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsRichard McDougall
 
AWS Summit 2011: Data Storage Solutions in the AWS Cloud
AWS Summit 2011: Data Storage Solutions in the AWS CloudAWS Summit 2011: Data Storage Solutions in the AWS Cloud
AWS Summit 2011: Data Storage Solutions in the AWS CloudAmazon Web Services
 
Scalability
ScalabilityScalability
Scalabilityfelho
 
AWS Partner Presentation – Panzura – AWS Cloud Storage for the Enterprise 2012
AWS Partner Presentation – Panzura – AWS Cloud Storage for the Enterprise 2012AWS Partner Presentation – Panzura – AWS Cloud Storage for the Enterprise 2012
AWS Partner Presentation – Panzura – AWS Cloud Storage for the Enterprise 2012Amazon Web Services
 
AWS Summit 2011: High Availability Database Architectures in AWS Cloud
AWS Summit 2011: High Availability Database Architectures in AWS CloudAWS Summit 2011: High Availability Database Architectures in AWS Cloud
AWS Summit 2011: High Availability Database Architectures in AWS CloudAmazon Web Services
 
Sdc2010 scality cloud storage vs object storage for distribution
Sdc2010 scality cloud storage vs object storage for distributionSdc2010 scality cloud storage vs object storage for distribution
Sdc2010 scality cloud storage vs object storage for distributionJerome Lecat
 
There is NO CLOUD: Geeky Version
There is NO CLOUD: Geeky VersionThere is NO CLOUD: Geeky Version
There is NO CLOUD: Geeky VersionOpen Spectrum Inc
 
Cloumon enterprise
Cloumon enterpriseCloumon enterprise
Cloumon enterpriseGruter
 
Open stack in sina
Open stack in sinaOpen stack in sina
Open stack in sinaHui Cheng
 
Openstack in action2 Rackspace- state of the openstack union 31-05-12
Openstack in action2   Rackspace- state of the openstack union 31-05-12Openstack in action2   Rackspace- state of the openstack union 31-05-12
Openstack in action2 Rackspace- state of the openstack union 31-05-12eNovance
 
Cloud computing with AWS
Cloud computing with AWS Cloud computing with AWS
Cloud computing with AWS ikanow
 
Windows Server 2012 Active Directory Domain and Trust (Forest Trust)
Windows Server 2012 Active Directory Domain and Trust (Forest Trust)Windows Server 2012 Active Directory Domain and Trust (Forest Trust)
Windows Server 2012 Active Directory Domain and Trust (Forest Trust)Serhad MAKBULOĞLU, MBA
 

Ähnlich wie NoSQL "Tools in Action" talk at Devoxx (20)

Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandApachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
 
Intuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin StopfordIntuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin Stopford
 
Cloud Computing & Scaling Web Apps
Cloud Computing & Scaling Web AppsCloud Computing & Scaling Web Apps
Cloud Computing & Scaling Web Apps
 
Deliver Big Data, Database and AI/ML as-a-Service anywhere
Deliver Big Data, Database and AI/ML as-a-Service anywhereDeliver Big Data, Database and AI/ML as-a-Service anywhere
Deliver Big Data, Database and AI/ML as-a-Service anywhere
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
NoSQL
NoSQLNoSQL
NoSQL
 
Hbase jdd
Hbase jddHbase jdd
Hbase jdd
 
AWS Summit 2011: Data Storage Solutions in the AWS Cloud
AWS Summit 2011: Data Storage Solutions in the AWS CloudAWS Summit 2011: Data Storage Solutions in the AWS Cloud
AWS Summit 2011: Data Storage Solutions in the AWS Cloud
 
Scalability
ScalabilityScalability
Scalability
 
AWS Partner Presentation – Panzura – AWS Cloud Storage for the Enterprise 2012
AWS Partner Presentation – Panzura – AWS Cloud Storage for the Enterprise 2012AWS Partner Presentation – Panzura – AWS Cloud Storage for the Enterprise 2012
AWS Partner Presentation – Panzura – AWS Cloud Storage for the Enterprise 2012
 
AWS Summit 2011: High Availability Database Architectures in AWS Cloud
AWS Summit 2011: High Availability Database Architectures in AWS CloudAWS Summit 2011: High Availability Database Architectures in AWS Cloud
AWS Summit 2011: High Availability Database Architectures in AWS Cloud
 
Sdc2010 scality cloud storage vs object storage for distribution
Sdc2010 scality cloud storage vs object storage for distributionSdc2010 scality cloud storage vs object storage for distribution
Sdc2010 scality cloud storage vs object storage for distribution
 
There is NO CLOUD: Geeky Version
There is NO CLOUD: Geeky VersionThere is NO CLOUD: Geeky Version
There is NO CLOUD: Geeky Version
 
Cloumon enterprise
Cloumon enterpriseCloumon enterprise
Cloumon enterprise
 
Open stack in sina
Open stack in sinaOpen stack in sina
Open stack in sina
 
Openstack in action2 Rackspace- state of the openstack union 31-05-12
Openstack in action2   Rackspace- state of the openstack union 31-05-12Openstack in action2   Rackspace- state of the openstack union 31-05-12
Openstack in action2 Rackspace- state of the openstack union 31-05-12
 
Cloud computing with AWS
Cloud computing with AWS Cloud computing with AWS
Cloud computing with AWS
 
WebWorkersCamp 2010
WebWorkersCamp 2010WebWorkersCamp 2010
WebWorkersCamp 2010
 
SQL Azure in deep
SQL Azure in deepSQL Azure in deep
SQL Azure in deep
 
Windows Server 2012 Active Directory Domain and Trust (Forest Trust)
Windows Server 2012 Active Directory Domain and Trust (Forest Trust)Windows Server 2012 Active Directory Domain and Trust (Forest Trust)
Windows Server 2012 Active Directory Domain and Trust (Forest Trust)
 

Mehr von NGDATA

NGDATA Corporate Presentation
NGDATA Corporate PresentationNGDATA Corporate Presentation
NGDATA Corporate PresentationNGDATA
 
Welcome to the Age of Data
Welcome to the Age of DataWelcome to the Age of Data
Welcome to the Age of DataNGDATA
 
The Lily RowLog library
The Lily RowLog libraryThe Lily RowLog library
The Lily RowLog libraryNGDATA
 
Lily @ Work Webinar
Lily @ Work WebinarLily @ Work Webinar
Lily @ Work WebinarNGDATA
 
From Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart DataFrom Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart DataNGDATA
 
20110514 appsforghent
20110514 appsforghent20110514 appsforghent
20110514 appsforghentNGDATA
 
Big Data
Big DataBig Data
Big DataNGDATA
 
Lily at HUG UK
Lily at HUG UKLily at HUG UK
Lily at HUG UKNGDATA
 
NoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNGDATA
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyNGDATA
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyNGDATA
 
Devoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in JavaDevoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in JavaNGDATA
 
Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionNGDATA
 
Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)NGDATA
 
Outerthought / Lily Partnerships
Outerthought / Lily PartnershipsOuterthought / Lily Partnerships
Outerthought / Lily PartnershipsNGDATA
 
NoSQL with Hadoop and HBase
NoSQL with Hadoop and HBaseNoSQL with Hadoop and HBase
NoSQL with Hadoop and HBaseNGDATA
 
Learning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologiesLearning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologiesNGDATA
 
KVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database serversKVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database serversNGDATA
 
N-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the riseN-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the riseNGDATA
 
NoSQL BOF at Devoxx
NoSQL BOF at DevoxxNoSQL BOF at Devoxx
NoSQL BOF at DevoxxNGDATA
 

Mehr von NGDATA (20)

NGDATA Corporate Presentation
NGDATA Corporate PresentationNGDATA Corporate Presentation
NGDATA Corporate Presentation
 
Welcome to the Age of Data
Welcome to the Age of DataWelcome to the Age of Data
Welcome to the Age of Data
 
The Lily RowLog library
The Lily RowLog libraryThe Lily RowLog library
The Lily RowLog library
 
Lily @ Work Webinar
Lily @ Work WebinarLily @ Work Webinar
Lily @ Work Webinar
 
From Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart DataFrom Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart Data
 
20110514 appsforghent
20110514 appsforghent20110514 appsforghent
20110514 appsforghent
 
Big Data
Big DataBig Data
Big Data
 
Lily at HUG UK
Lily at HUG UKLily at HUG UK
Lily at HUG UK
 
NoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG Luxembourg
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and Lily
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and Lily
 
Devoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in JavaDevoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in Java
 
Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC edition
 
Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)
 
Outerthought / Lily Partnerships
Outerthought / Lily PartnershipsOuterthought / Lily Partnerships
Outerthought / Lily Partnerships
 
NoSQL with Hadoop and HBase
NoSQL with Hadoop and HBaseNoSQL with Hadoop and HBase
NoSQL with Hadoop and HBase
 
Learning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologiesLearning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologies
 
KVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database serversKVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database servers
 
N-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the riseN-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the rise
 
NoSQL BOF at Devoxx
NoSQL BOF at DevoxxNoSQL BOF at Devoxx
NoSQL BOF at Devoxx
 

Kürzlich hochgeladen

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 

Kürzlich hochgeladen (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

NoSQL "Tools in Action" talk at Devoxx

  • 1. NoSQL LQS ylnO toN Devoxx 2009
  • 2. How did we get here?
  • 3. Daisy (this is what we do) Java open source cms MySQL Lucene system- and full-text indexes metadata filesystem actual content
  • 4. rather nice query language rather nice publishing features wanted to move away from WCMS fast access control and facet browser requires the active document set to reside in memory cache SCALE?
  • 5. Findings we learned a lot (about content management) over the past 6 years (like: versioning, staging, multilinguality, searching, access control, publishing) people don’t like Cocoon / XSLT, prefer templating instead a lot of our project specifics were about finding the correct storage model for specific data structures (not all data was fit for our CMS) customers with growing ambitions → Daisy has to follow !
  • 6. Overhaul get rid of Cocoon → Kauri quest for differentation “the internet barrier” combine massive storage with useful query-ability REST, Maven, Spring, webapps www.kauriproject.org
  • 7. The Internet Barrier fundamental technology gap NoSQL buy/use vs build focus on architecture and infrastructure SQL rise of semi- or free- structured information layered approach http://github.com/blog/530-how-we-made-github-fast
  • 8. Conclusions let’s start from scratch (ouch) a different architecture / foundations scale big and be available modularity (pluggability) we’re not a banking application, so consistency might be less important
  • 9. Storage challenges sparse data structures flexible, evolving data structures lack of good fault-tolerance setups cope with scale CAP vs BASE (Google) BigTable and (Amazon) Dynamo
  • 10. ACID vs BASE atomicity consistency isolation durability basically available, soft state, eventually consistent Strong Consistency Weak Consistency Isolation Availability First Focus on “commit” Best Effort Nested Transactions Approximative Answers OK Availability? Aggressive (optimistic) Conservative (pessimistic) Simpler! Difficult Evolution (schema) Faster Easier Evolution spectrum slide: Mark Brewer
  • 11. Our CAP multilemma scale consistency partition availability of data tolerance ping means results ‘cluster splits’ should not block fault tolerance
  • 12. C?A?P? Initial gut feeling: cAP A was a given C would be a function of our datastore choice however the P seemed like a nice-to-have (aka over-ambitious use-case)
  • 13. CAPondering HBase vs Cassandra consistency vs SPOF ? possible higher latency vs possibly frailer community ? Cocoon trauma http://www.cs.cornell.edu/projects/ladis2009/talks/ramakrishnan-keynote-ladis2009.pdf
  • 14. Comparison Matrix Partitioning Availability Replication Storage Consistency Sync/async Local/geo Hash/sort Durability Dynamic Reads/ Failures handled Routing writes During failure Colo+ Local+ Double Buffer PNUTS H+S Rtr Read+ Async Timeline + Y server write geo Eventual WAL pages Local+ Buffer MySQL H+S N Cli Colo+ Read Async ACID WAL pages server nearby Read+ Local+ N/A (no Triple Y Colo+ Sync Files HDFS Other Rtr server write nearby updates) replication Colo+ Read+ Local+ Multi- Triple LSM/ BigTable S Y Rtr write Sync nearby version replication SSTable server Colo+ Read+ Local+ Buffer Dynamo H Y P2P write Async nearby Eventual WAL pages server Read+ Sync+ Local+ Triple LSM/ Cassandra H+S Y P2P Colo+ Eventual WAL SSTable server write Async nearby Megastore S Y Rtr Colo+ Read+ Sync Local+ ACID/ Triple replication LSM/ SSTable server write nearby other Azure S N Cli Server Read+ Buffer write Sync Local ACID WAL pages 100 100
  • 15. HBase sorted distributed persisted column-oriented storage system multi-dimensional highly-available high-performance adds random access reads and writes atop HDFS
  • 17. People Inventors Project leads Google BigTable ☺ Michael Stack (Powerset/ Jim Kellerman (Powerset/ Microsoft) Microsoft) Jonathan Gray (Streamy.com) Mike Cafarella (UMich) Ryan Rawson (StumbleUpon) Jean-Daniel Cryans (SU) Bryan Duxbury (Rapleaf)
  • 19. HBase data model Distributed multi-dimensional Keys are arbitrary strings sparse map Access to row data is atomic Multi-dimensional keys: (table, row, family:column, timestamp) → value
  • 20. Date: Thu, 12 Nov 2009 18:19:50 -0800 Message-ID: <78568af10911121819x292527b2t7f8b7d857c3650b2@mail.gmail.com> Subject: Re: newbie: need help on understanding HBase From: Ryan Rawson <ryanobjc@gmail.com> To: hbase-user@hadoop.apache.org HBase is semi-column oriented. Column families is the storage model - everything in a column family is stored in a file linearly in HDFS. That means accessing data from a column family is really cheap and easy. Adding more column families adds more files - it has the performance profile of adding new tables, except you dont actually have additional tables, so the conceptual complexity stays low. Data is stored at the "intersection" of the rowid, and the column family + qualifier. This is sometimes called a "Cell" - contains a timestamp as well. You can have multiple versions all timestamped. The timestamp is by default the int64/java system milli time. I have to recommend against setting the timestamp explicitly if you can avoid it. So when you retrieve a row, you can get everything, a list of column qualifiers or a list of families or any combo. (eg: list of these qualifiers out of family A and everything from family B) [...] The terms to use are: - Column family (or just family): the unit of locality in hbase. Everything in a family is stored in 1 (or a set) of files. A table is a name and a list of families with attributes for those families (eg: compression). A family is a string. - Column qualifier (or just qualifier): allows you to store multiple values for the same row in 1 family. This value is a byte array and can be anything. The API converts null => new byte[0]{}. This is the tricky bit, since most people dont think of "column names" as being dynamic. - Cell - the old name for a value + timestamp. The new API (see: class Result) doesn't use this term, instead provides a different path to read data. You can use HBase as a normal datastore and use static names for the qualifiers, and that is just fine. But if you need something special to get past the lack of relations, you can start to do fun things with the qualifier as data. Building a secondary index for example. The row key would be the secondary value (eg: city) and the qualifier would be the primary key (eg: userid) and the value would be a placeholder to indicate the value exists.
  • 21. E-R Blog partitioning sequential scans
  • 22. Getting data in and out Java API Thrift multi-language API Stargate REST connector HBase shell (JRuby IRB- based) Processing: MapReduce and Cascading

Hinweis der Redaktion

  1. Atomicity. All of the operations in the transaction will complete, or none will. Consistency. The database will be in a consistent state when the transaction begins and ends. Isolation. The transaction will behave as if it is the only operation being performed upon the database. Durability. Upon completion of the transaction, the operation will not be reversed.
  2. consistency of data: think serializability availability: pinging a live node should produce results partition tolerance: live nodes should not be blocked by partitions