SlideShare ist ein Scribd-Unternehmen logo
1 von 147
Data Storage for Extreme
Use Cases: The Lay of the
Land and a Peek at ODC
Ben Stopford : RBS
How fast is a HashMap lookup?
That‟s how long it takes light to
         travel a room
How fast is a database lookup?
That‟s how long it takes light to
   go to Australia and back
Computers really are very fast!
The problem is we‟re quite good at
 writing software that slows them
               down
Question:

Is it fair to compare the
performance of a Database with
a HashMap?
Of course not…
Mechanical Sympathy

                      Ethernet ping
  1MB Disk/Ethernet
                                      RDMA over Infiniband
Cross
Continental      ms      μs        ns              ps
Round
Trip
              0.000,000,000,000
                         Main Memory                      L1 Cache Ref
                         Ref

          1MB Main Memory              L2 Cache Ref
                               * L1 ref is about 2 clock cycles or 0.7ns. This is the
                               time it takes light to travel 20cm
Key Point #1




    Simple computer
programs, operating in a
single address space are
     extremely fast.
Why are there so many
types of database
these days?
…because we need
different architectures
for different jobs
Times are changing
Traditional Database
Architecture is Aging
The Traditional Architecture
Traditional



       Shared                 Shared
                In Memory
        Disk                  Nothing


Distributed                       Simpler
In Memory                         Contract
Key Point #2


  Different architectural
 decisions about how we
store and access data are
    needed in different
      environments.
Our ‘Context’ has changed
Simplifying the
   Contract
How big is the internet?



   5 exabytes
(which is 5,000 petabytes
 or 5,000,000 terabytes)
How big is an average
 enterprise database


80% < 1TB
      (in 2009)
The context of
our problem has
    changed
Simplifying the Contract
Databases have huge
operational overheads




                    Taken from “OLTP Through the
                    Looking Glass, and What We Found
                    There” Harizopoulos et al
Avoid that overhead with a
simpler contract and avoiding
IO
Key Point #3



For the very top end data
   volumes a simpler
 contract is mandatory.
   ACID is simply not
        possible.
Key Point #3 (addendum)




 But we should always
retain ACID properties if
 our use case allows it.
Options for
scaling-out the
  traditional
 architecture
#1: The Shared Disk
    Architecture




                Shared
                 Disk
#2: The Shared Nothing
      Architecture
Each machine is responsible for a subset
of the records. Each record exists on only
               one machine.


                   1, 2, 3…   97, 98, 99…




           765, 769…                 169, 170…
 Client

                  333, 334…   244, 245…
#3: The In Memory Database
   (single address-space)
Databases must cache subsets
    of the data in memory




            Cache
Not knowing what you don‟t
          know



      90% in Cache


        Data on Disk
If you can fit it ALL in memory
you know everything!!
The architecture of an in
   memory database
Memory is at least 100x faster than
               disk
                    ms          μs           ns             ps

 1MB Disk/Network                  1MB Main Memory


           0.000,000,000,000
Cross Continental               Main Memory                        L1 Cache Ref
Round Trip                      Ref
          Cross Network Round          L2 Cache Ref
          Trip               * L1 ref is about 2 clock cycles or 0.7ns. This is the
                                        time it takes light to travel 20cm
Random vs. Sequential Access
This makes them very fast!!
The proof is in the stats. TPC-H
Benchmarks on a 1TB data set
So why haven‟t in-
memory databases taken
        off?
Address-Spaces are relatively
small and of a finite, fixed size
Durability
One solution is
 distribution
Distributed In Memory (Shared
           Nothing)
Again we spread our data but this time
          only using RAM.




                 1, 2, 3…   97, 98, 99…




         765, 769…                 169, 170…
Client

                333, 334…   244, 245…
Distribution solves our two
         problems
We get massive amounts of
   parallel processing
But at the cost
of loosing the
single address
     space
Traditional



       Shared                 Shared
                In Memory
        Disk                  Nothing


Distributed                       Simpler
In Memory                         Contract
Key Point #4
   There are three key forces:

                 Simplify the
Distribution                     No Disk
                   contract

                   Improve
  Gain
                   scalability
  scalability                    All data is
                   by picking
  through a                      held in
                   appropriate
  distributed                    RAM
                   ACID
  architecture
                   properties.
These three non-
 functional themes
lay behind the design
  of ODC, RBS‟s in-
     memory data
      warehouse
ODC
ODC represents
   a balance
    between
throughput and
    latency
What is Latency?
What is Throughput
Which is best for latency?


                 Shared
                 Nothing
               (Distributed)
 Traditional                   In-Memory
 Database                       Database



                Latency?
Which is best for throughput?


                   Shared
                   Nothing
                 (Distributed)
   Traditional                   In-Memory
   Database                       Database



                  Throughput?
So why do we use distributed
        in-memory?
         In        Plentiful
       Memory     hardware




        Latency   Throughput
ODC – Distributed, Shared Nothing, In
Memory, Semi-Normalised, Realtime Graph
                   DB
      450 processes         2TB of RAM




  Messaging (Topic Based) as a system of record
                    (persistence)
The Layers
Access Layer   Jav    Jav
                a      a
               clie   clie
                nt
               API     nt
                      API
Query Layer




                             Transaction
Data Layer




                             s
                                  Mtms

                               Cashflows
Persistence
  Layer
Three Tools of Distributed Data
         Architecture
                   Indexing




    Partitioning              Replication
How should we use these tools?
Replication puts data
     everywhere




      But your storage is limited by
      the memory on a node
Partitioning scales
     Associating data in
     different partitions implies
     moving it.




     Scalable storage, bandwidth
     and processing
So we have some data.
Our data is bound together in a
             model
      Desk
                         Sub
                                 Name
        Trader
                         Party

                 Trade
Which we save..


                   Trade
                     r            Part
                                   y
                           Trad
                            e




Trad       Trade                         Part
 e           r
                                          y
Binding them back together involves a
 “distributed join” => Lots of network
                  hops
                      Trade
                        r            Part
                                      y
                              Trad
                               e




   Trad       Trade                         Part
    e           r
                                             y
The hops have to be spread
        over time




        Network
                     Time
Lots of network hops makes it
            slow
OK – what if we held it
all together??
“Denormalised”
Hence denormalisation is FAST!
           (for reads)
Denormalisation implies the
duplication of some sub-entities
…and that means managing
consistency over lots of copies
…and all the duplication means
 you run out of space really
            quickly
Spaces issues are exaggerated
  further when data is versioned
  Trade
    r            Part        Version 1
                  y
          Trad
           e       Trade
                     r            Part        Version 2
                                   y
                           Trad
                            e       Trade
                                      r            Part        Version 3
                                                    y
                                            Trad
                                             e       Trade
                                                       r            Part   Version 4
                                                                     y
                                                             Trad
…and you need                                                 e

versioning to do MVCC
And reconstituting a previous
  time slice becomes very
           difficult.
   Trad          Trade
          Part
    e              r
           y

          Part   Trade
   Trad    y       r
    e
          Part
           y     Trade
                   r
   Trad
    e     Part
           y
So we want to hold
   entities separately
(normalised) to alleviate
    concerns around
 consistency and space
          usage
Remember this means the
  object graph will be split across
          multiple machines. Data is
Independently
 Versioned            Trade
                                            Singleton
                        r            Part
                                      y
                              Trad
                               e




     Trad     Trade                          Part
      e         r
                                              y
Binding them back together involves a
 “distributed join” => Lots of network
                  hops
                      Trade
                        r            Part
                                      y
                              Trad
                               e




   Trad       Trade                         Part
    e           r
                                             y
Whereas the denormalised
model the join is already done
So what we want is the advantages
of a normalised store at the speed
of a denormalised one!


This is what using Snowflake Schemas and
  the Connected Replication pattern is all
                   about!
Looking more closely: Why
 does normalisation mean we
have to spread data around the
cluster. Why can‟t we hold it all
           together?
It‟s all about the keys
We can collocate data with common keys but if
 they crosscut the only way to collocate is to
                   replicate
    Crosscuttin
        g
       Keys




     Common
      Keys
We tackle this problem with a
       hybrid model:

                                  Replicated
      Trader
                       Party


               Trade
                               Partitioned
We adapt the concept of a
  Snowflake Schema.
Taking the concept of Facts and
          Dimensions
Everything starts from a Core
    Fact (Trades for us)
Facts are   Big, dimensions are
                small
Facts have one key that relates
  them all (used to partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data:


                       Facts:
                       =>Big,
                       common
                       keys

                       Dimensions
                       =>Small,
                       crosscutting
                       Keys
We remember we are a grid. We
 should avoid the distributed
            join.
… so we only want to „join‟ data
  that is in the same process
                        Use a Key
                        Assignment
   Trade
                          Policy
              MTMs     (e.g. KeyAssociation
     s
                          in Coherence)

           Common
             Key
So we prescribe different
physical storage for Facts and
         Dimensions
                                   Replicated
       Trader
                        Party


                Trade
                                Partitioned
Facts are
partitioned, dimensions are
replicated




                                          Query Layer
  Trader
               Party

           Trade




                           Transactions




                                           Data Layer
                                Mtms

                             Cashflows



                        Fact Storage
                        (Partitioned)
Facts are
partitioned, dimensions are
replicated
                        Dimension
                             s
                        (repliacte)
                  Transactions
                                    Facts
                       Mtms

                    Cashflows
                                 (distribute/
                                  partition)
               Fact Storage
               (Partitioned)
The data volumes back this up
  as a sensible hypothesis

                        Facts:
                        =>Big
                        =>Distribut
                             e

                        Dimensions
                        =>Small
                        => Replicate
Key Point


   We use a variant on a
   Snowflake Schema to
partition big entities that can
be related via a partitioning
key and replicate small stuff
who’s keys can’t map to our
        partitioning key.
Replicate




Distribute
So how does they help us to run
  queries without distributed
            joins?
  Select Transaction, MTM,
  RefrenceData From MTM,
  Transaction, Ref Where Cost Centre
  = ‘CC1’
What would this look like
          without this pattern?
 Get      Get    Get    Get      Get   Get      Get
 Cost    Ledger Source Transa   MTMs   Legs     Cost
Center   Books Books c-tions                   Center
  s                                              s




                     Network
                                              Time
But by balancing Replication and
Partitioning we don‟t need all those hops

     Get       Get      Get     Get    Get   Get     Get
     Cost     Ledger   Source Transac MTMs   Legs    Cost
    Centers   Books    Books -tions                 Centers




                            Network
Stage 1: Focus on the where
          clause:
    Where Cost Centre = „CC1‟
Stage 1: Get the right keys to
      query the Facts
    Select Transaction, MTM, ReferenceData From
    MTM, Transaction, Ref Where Cost Centre =
    ‘CC1’


                                       Join
                                       Dimensions in
                                       Query Layer
                                    Transactions


                                          Mtms


                                      Cashflows



                                  Partitioned
Stage 2: Cluster Join to get
           Facts
   Select Transaction, MTM, ReferenceData From
   MTM, Transaction, Ref Where Cost Centre =
   ‘CC1’


                                      Join
                                      Dimensions in
                                      Query Layer
                                   Transactions


                          Join Facts     Mtms


                          acrossCashflows
                                 cluster

                                 Partitioned
Stage 2: Join the facts together
efficiently as we know they are
            collocated
Stage 3: Augment raw Facts
      with relevant Dimensions
             Select Transaction, MTM, ReferenceData From
             MTM, Transaction, Ref Where Cost Centre =
             ‘CC1’

Join                                            Join Dimensions
Dimensions                                      in Query Layer
in Query
Layer

                                             Transactions


                                    Join FactsMtms

                                    across Cashflows
                                           cluster

                                           Partitioned
Stage 3: Bind relevant
dimensions to the result
Bringing it together:
                  Jav
Replicated         a
                  clie
                           Partitioned
                   nt
                  API
Dimensions                   Facts




We never have to do a distributed join!
So all the big stuff is
  held partitioned



 And we can join
 without shipping
 keys around and
having intermediate
      results
We get to do this…


                    Trade
                      r            Part
                                    y
                            Trad
                             e




Trad        Trade                         Part
 e            r
                                           y
…and this…


Trade
  r            Part        Version 1
                y
        Trad
         e       Trade
                   r            Part        Version 2
                                 y
                         Trad
                          e       Trade
                                    r            Part        Version 3
                                                  y
                                          Trad
                                           e       Trade
                                                     r            Part   Version 4
                                                                   y
                                                           Trad
                                                            e
..and this..


Trad          Trade
       Part
 e              r
        y

       Part   Trade
Trad    y       r
 e
       Part
        y     Trade
                r
Trad
 e     Part
        y
…without the problems of this…
…or this..
..all at the speed of this… well
              almost!
But there is a fly in the
      ointment…
I lied earlier. These aren‟t all
             Facts.

                              Facts


     This is a dimension
       • It has a different
         key to the Facts.    Dimensions
       • And it’s BIG
We can‟t replicate really big
stuff… we‟ll run out of space
 => Big Dimensions are a
problem.
Fortunately there is a simple
solution!
Whilst there are lots of these
big dimensions, a large majority
are never used. They are not all
“connected”.
If there are no Trades for Goldmans
in the data store then a Trade Query
will never need the Goldmans
Counterparty
Looking at the Dimension data
    some are quite large
But Connected Dimension Data
    is tiny by comparison
One recent independent study
from the database community
showed that 80% of data
remains unused
So we only replicate
‘Connected’ or ‘Used’
     dimensions
As data is written to the data store we
keep our „Connected Caches‟ up to date




                                                          Processing Layer
                             Dimension
                              Caches
                            (Replicated)
                                           Transactions




                                                            Data Layer
     As new Facts are added                     Mtms
     relevant Dimensions that
     they reference are moved
                                             Cashflows
     to processing layer
     caches

                                    Fact Storage
                                    (Partitioned)
The Replicated Layer is updated
by recursing through the arcs
on the domain model when facts
change
Saving a trade causes all it‟s     1
                                   levelst

        references to be triggered

                                Query Layer
    Save Trade                  (With connected
                                dimension Caches)

                                Data Layer
Cache
            Trad                (All Normalised)
Store         e

                                      Partitioned
            Trigger                   Cache
    Party             Sourc   Ccy
    Alias               e
                      Book
This updates the connected caches


                         Query Layer
                         (With connected
                         dimension Caches)

                         Data Layer
        Trad             (All Normalised)
          e



Party          Sourc   Ccy
Alias            e
               Book
The process recurses through the
          object graph

                                  Query Layer
                                  (With connected
                                  dimension Caches)

                                  Data Layer
        Trad                      (All Normalised)
          e



Party           Sourc           Ccy
Alias             e
                Book


        Party           Ledge
                        rBook
‘Connected Replication’
   A simple pattern which
recurses through the foreign
     keys in the domain
    model, ensuring only
‘Connected’ dimensions are
         replicated
With ‘Connected
  Replication’ only
  1/10th of the data

needs to be replicated
    (on average).
Limitations of this approach
Conclusion
Conclusion
Conclusion
Conclusion
Conclusion
Conclusion




       Partitioned
        Storage
Conclusion
The End

Weitere ähnliche Inhalte

Was ist angesagt?

Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 
Kafka Basic For Beginners
Kafka Basic For BeginnersKafka Basic For Beginners
Kafka Basic For BeginnersRiby Varghese
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewDmitry Tolpeko
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introductionSyed Hadoop
 
Kafka meetup - kafka connect
Kafka meetup -  kafka connectKafka meetup -  kafka connect
Kafka meetup - kafka connectYi Zhang
 
Streaming Data with Apache Kafka
Streaming Data with Apache KafkaStreaming Data with Apache Kafka
Streaming Data with Apache KafkaMarkus Günther
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsGuozhang Wang
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupSnehal Nagmote
 

Was ist angesagt? (20)

Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Kafka Basic For Beginners
Kafka Basic For BeginnersKafka Basic For Beginners
Kafka Basic For Beginners
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Kafka basics
Kafka basicsKafka basics
Kafka basics
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
 
Event Hub & Kafka
Event Hub & KafkaEvent Hub & Kafka
Event Hub & Kafka
 
Kafka tutorial
Kafka tutorialKafka tutorial
Kafka tutorial
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka
Apache Kafka Apache Kafka
Apache Kafka
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka meetup - kafka connect
Kafka meetup -  kafka connectKafka meetup -  kafka connect
Kafka meetup - kafka connect
 
Streaming Data with Apache Kafka
Streaming Data with Apache KafkaStreaming Data with Apache Kafka
Streaming Data with Apache Kafka
 
Apache Kafka Demo
Apache Kafka DemoApache Kafka Demo
Apache Kafka Demo
 
Apache kafka introduction
Apache kafka introductionApache kafka introduction
Apache kafka introduction
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code Meetup
 
Apache Kafka Streams
Apache Kafka StreamsApache Kafka Streams
Apache Kafka Streams
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
Kafka
KafkaKafka
Kafka
 

Ähnlich wie Advanced databases ben stopford

A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...Ben Stopford
 
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBeyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBen Stopford
 
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and...
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and...Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and...
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and...JAX London
 
Memory-Based Cloud Architectures
Memory-Based Cloud ArchitecturesMemory-Based Cloud Architectures
Memory-Based Cloud Architectures小新 制造
 
Basho and Riak at GOTO Stockholm: "Don't Use My Database."
Basho and Riak at GOTO Stockholm:  "Don't Use My Database."Basho and Riak at GOTO Stockholm:  "Don't Use My Database."
Basho and Riak at GOTO Stockholm: "Don't Use My Database."Basho Technologies
 
Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012
Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012
Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012Marc Villemade
 
Re-inventing the Database: What to Keep and What to Throw Away
Re-inventing the Database: What to Keep and What to Throw AwayRe-inventing the Database: What to Keep and What to Throw Away
Re-inventing the Database: What to Keep and What to Throw AwayDATAVERSITY
 
Low level java programming
Low level java programmingLow level java programming
Low level java programmingPeter Lawrey
 
Top Technology Trends
Top Technology Trends Top Technology Trends
Top Technology Trends InnoTech
 
in-memory database system and low latency
in-memory database system and low latencyin-memory database system and low latency
in-memory database system and low latencyhyeongchae lee
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualizationSisimon Soman
 
Intuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin StopfordIntuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin StopfordJAXLondon_Conference
 
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementJ Singh
 
002-Storage Basics and Application Environments V1.0.pptx
002-Storage Basics and Application Environments V1.0.pptx002-Storage Basics and Application Environments V1.0.pptx
002-Storage Basics and Application Environments V1.0.pptxDrewMe1
 
In-Memory Computing: Myths and Facts
In-Memory Computing: Myths and FactsIn-Memory Computing: Myths and Facts
In-Memory Computing: Myths and FactsDATAVERSITY
 
CSC1100 - Chapter05 - Storage
CSC1100 - Chapter05 - StorageCSC1100 - Chapter05 - Storage
CSC1100 - Chapter05 - StorageYhal Htet Aung
 

Ähnlich wie Advanced databases ben stopford (20)

A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
 
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBeyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
 
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and...
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and...Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and...
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and...
 
Memory-Based Cloud Architectures
Memory-Based Cloud ArchitecturesMemory-Based Cloud Architectures
Memory-Based Cloud Architectures
 
Basho and Riak at GOTO Stockholm: "Don't Use My Database."
Basho and Riak at GOTO Stockholm:  "Don't Use My Database."Basho and Riak at GOTO Stockholm:  "Don't Use My Database."
Basho and Riak at GOTO Stockholm: "Don't Use My Database."
 
Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012
Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012
Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012
 
Spark
SparkSpark
Spark
 
Re-inventing the Database: What to Keep and What to Throw Away
Re-inventing the Database: What to Keep and What to Throw AwayRe-inventing the Database: What to Keep and What to Throw Away
Re-inventing the Database: What to Keep and What to Throw Away
 
Low level java programming
Low level java programmingLow level java programming
Low level java programming
 
Top Technology Trends
Top Technology Trends Top Technology Trends
Top Technology Trends
 
in-memory database system and low latency
in-memory database system and low latencyin-memory database system and low latency
in-memory database system and low latency
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualization
 
Intuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin StopfordIntuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin Stopford
 
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage Management
 
002-Storage Basics and Application Environments V1.0.pptx
002-Storage Basics and Application Environments V1.0.pptx002-Storage Basics and Application Environments V1.0.pptx
002-Storage Basics and Application Environments V1.0.pptx
 
Shignled disk
Shignled diskShignled disk
Shignled disk
 
In-Memory Computing: Myths and Facts
In-Memory Computing: Myths and FactsIn-Memory Computing: Myths and Facts
In-Memory Computing: Myths and Facts
 
CSC1100 - Chapter05 - Storage
CSC1100 - Chapter05 - StorageCSC1100 - Chapter05 - Storage
CSC1100 - Chapter05 - Storage
 
Index file
Index fileIndex file
Index file
 
DAS RAID NAS SAN
DAS RAID NAS SANDAS RAID NAS SAN
DAS RAID NAS SAN
 

Mehr von Ben Stopford

10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache KafkaBen Stopford
 
10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven MicroservicesBen Stopford
 
The Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and ServerlessThe Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and ServerlessBen Stopford
 
A Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices GenerationA Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices GenerationBen Stopford
 
Building Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka StreamsBuilding Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka StreamsBen Stopford
 
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with StreamsNDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with StreamsBen Stopford
 
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...Ben Stopford
 
Building Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful StreamsBuilding Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful StreamsBen Stopford
 
Devoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful StreamsDevoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful StreamsBen Stopford
 
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2:  Building Event-Driven Services with Apache KafkaEvent Driven Services Part 2:  Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2: Building Event-Driven Services with Apache KafkaBen Stopford
 
Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy Ben Stopford
 
Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...Ben Stopford
 
Strata Software Architecture NY: The Data Dichotomy
Strata Software Architecture NY: The Data DichotomyStrata Software Architecture NY: The Data Dichotomy
Strata Software Architecture NY: The Data DichotomyBen Stopford
 
Microservices for a Streaming World
Microservices for a Streaming WorldMicroservices for a Streaming World
Microservices for a Streaming WorldBen Stopford
 
A little bit of clojure
A little bit of clojureA little bit of clojure
A little bit of clojureBen Stopford
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)Ben Stopford
 
The return of big iron?
The return of big iron?The return of big iron?
The return of big iron?Ben Stopford
 
Big Data & the Enterprise
Big Data & the EnterpriseBig Data & the Enterprise
Big Data & the EnterpriseBen Stopford
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Ben Stopford
 

Mehr von Ben Stopford (20)

10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka
 
10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices
 
The Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and ServerlessThe Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and Serverless
 
A Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices GenerationA Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices Generation
 
Building Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka StreamsBuilding Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka Streams
 
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with StreamsNDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
 
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
 
Building Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful StreamsBuilding Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful Streams
 
Devoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful StreamsDevoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful Streams
 
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2:  Building Event-Driven Services with Apache KafkaEvent Driven Services Part 2:  Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
 
Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy
 
Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...
 
Strata Software Architecture NY: The Data Dichotomy
Strata Software Architecture NY: The Data DichotomyStrata Software Architecture NY: The Data Dichotomy
Strata Software Architecture NY: The Data Dichotomy
 
JAX London Slides
JAX London SlidesJAX London Slides
JAX London Slides
 
Microservices for a Streaming World
Microservices for a Streaming WorldMicroservices for a Streaming World
Microservices for a Streaming World
 
A little bit of clojure
A little bit of clojureA little bit of clojure
A little bit of clojure
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)
 
The return of big iron?
The return of big iron?The return of big iron?
The return of big iron?
 
Big Data & the Enterprise
Big Data & the EnterpriseBig Data & the Enterprise
Big Data & the Enterprise
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012
 

Kürzlich hochgeladen

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Kürzlich hochgeladen (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Advanced databases ben stopford

  • 1. Data Storage for Extreme Use Cases: The Lay of the Land and a Peek at ODC Ben Stopford : RBS
  • 2. How fast is a HashMap lookup?
  • 3. That‟s how long it takes light to travel a room
  • 4. How fast is a database lookup?
  • 5. That‟s how long it takes light to go to Australia and back
  • 6.
  • 7. Computers really are very fast!
  • 8. The problem is we‟re quite good at writing software that slows them down
  • 9. Question: Is it fair to compare the performance of a Database with a HashMap?
  • 11. Mechanical Sympathy Ethernet ping 1MB Disk/Ethernet RDMA over Infiniband Cross Continental ms μs ns ps Round Trip 0.000,000,000,000 Main Memory L1 Cache Ref Ref 1MB Main Memory L2 Cache Ref * L1 ref is about 2 clock cycles or 0.7ns. This is the time it takes light to travel 20cm
  • 12. Key Point #1 Simple computer programs, operating in a single address space are extremely fast.
  • 13.
  • 14. Why are there so many types of database these days? …because we need different architectures for different jobs
  • 17.
  • 19. Traditional Shared Shared In Memory Disk Nothing Distributed Simpler In Memory Contract
  • 20. Key Point #2 Different architectural decisions about how we store and access data are needed in different environments. Our ‘Context’ has changed
  • 21. Simplifying the Contract
  • 22. How big is the internet? 5 exabytes (which is 5,000 petabytes or 5,000,000 terabytes)
  • 23. How big is an average enterprise database 80% < 1TB (in 2009)
  • 24. The context of our problem has changed
  • 26. Databases have huge operational overheads Taken from “OLTP Through the Looking Glass, and What We Found There” Harizopoulos et al
  • 27. Avoid that overhead with a simpler contract and avoiding IO
  • 28. Key Point #3 For the very top end data volumes a simpler contract is mandatory. ACID is simply not possible.
  • 29. Key Point #3 (addendum) But we should always retain ACID properties if our use case allows it.
  • 30. Options for scaling-out the traditional architecture
  • 31. #1: The Shared Disk Architecture Shared Disk
  • 32. #2: The Shared Nothing Architecture
  • 33. Each machine is responsible for a subset of the records. Each record exists on only one machine. 1, 2, 3… 97, 98, 99… 765, 769… 169, 170… Client 333, 334… 244, 245…
  • 34. #3: The In Memory Database (single address-space)
  • 35. Databases must cache subsets of the data in memory Cache
  • 36. Not knowing what you don‟t know 90% in Cache Data on Disk
  • 37. If you can fit it ALL in memory you know everything!!
  • 38. The architecture of an in memory database
  • 39. Memory is at least 100x faster than disk ms μs ns ps 1MB Disk/Network 1MB Main Memory 0.000,000,000,000 Cross Continental Main Memory L1 Cache Ref Round Trip Ref Cross Network Round L2 Cache Ref Trip * L1 ref is about 2 clock cycles or 0.7ns. This is the time it takes light to travel 20cm
  • 41. This makes them very fast!!
  • 42. The proof is in the stats. TPC-H Benchmarks on a 1TB data set
  • 43. So why haven‟t in- memory databases taken off?
  • 44. Address-Spaces are relatively small and of a finite, fixed size
  • 46. One solution is distribution
  • 47. Distributed In Memory (Shared Nothing)
  • 48. Again we spread our data but this time only using RAM. 1, 2, 3… 97, 98, 99… 765, 769… 169, 170… Client 333, 334… 244, 245…
  • 49. Distribution solves our two problems
  • 50. We get massive amounts of parallel processing
  • 51. But at the cost of loosing the single address space
  • 52. Traditional Shared Shared In Memory Disk Nothing Distributed Simpler In Memory Contract
  • 53. Key Point #4 There are three key forces: Simplify the Distribution No Disk contract Improve Gain scalability scalability All data is by picking through a held in appropriate distributed RAM ACID architecture properties.
  • 54. These three non- functional themes lay behind the design of ODC, RBS‟s in- memory data warehouse
  • 55. ODC
  • 56. ODC represents a balance between throughput and latency
  • 59. Which is best for latency? Shared Nothing (Distributed) Traditional In-Memory Database Database Latency?
  • 60. Which is best for throughput? Shared Nothing (Distributed) Traditional In-Memory Database Database Throughput?
  • 61. So why do we use distributed in-memory? In Plentiful Memory hardware Latency Throughput
  • 62. ODC – Distributed, Shared Nothing, In Memory, Semi-Normalised, Realtime Graph DB 450 processes 2TB of RAM Messaging (Topic Based) as a system of record (persistence)
  • 63. The Layers Access Layer Jav Jav a a clie clie nt API nt API Query Layer Transaction Data Layer s Mtms Cashflows Persistence Layer
  • 64. Three Tools of Distributed Data Architecture Indexing Partitioning Replication
  • 65. How should we use these tools?
  • 66. Replication puts data everywhere But your storage is limited by the memory on a node
  • 67. Partitioning scales Associating data in different partitions implies moving it. Scalable storage, bandwidth and processing
  • 68. So we have some data. Our data is bound together in a model Desk Sub Name Trader Party Trade
  • 69. Which we save.. Trade r Part y Trad e Trad Trade Part e r y
  • 70. Binding them back together involves a “distributed join” => Lots of network hops Trade r Part y Trad e Trad Trade Part e r y
  • 71. The hops have to be spread over time Network Time
  • 72. Lots of network hops makes it slow
  • 73. OK – what if we held it all together?? “Denormalised”
  • 74. Hence denormalisation is FAST! (for reads)
  • 76. …and that means managing consistency over lots of copies
  • 77. …and all the duplication means you run out of space really quickly
  • 78. Spaces issues are exaggerated further when data is versioned Trade r Part Version 1 y Trad e Trade r Part Version 2 y Trad e Trade r Part Version 3 y Trad e Trade r Part Version 4 y Trad …and you need e versioning to do MVCC
  • 79. And reconstituting a previous time slice becomes very difficult. Trad Trade Part e r y Part Trade Trad y r e Part y Trade r Trad e Part y
  • 80. So we want to hold entities separately (normalised) to alleviate concerns around consistency and space usage
  • 81. Remember this means the object graph will be split across multiple machines. Data is Independently Versioned Trade Singleton r Part y Trad e Trad Trade Part e r y
  • 82. Binding them back together involves a “distributed join” => Lots of network hops Trade r Part y Trad e Trad Trade Part e r y
  • 83. Whereas the denormalised model the join is already done
  • 84. So what we want is the advantages of a normalised store at the speed of a denormalised one! This is what using Snowflake Schemas and the Connected Replication pattern is all about!
  • 85. Looking more closely: Why does normalisation mean we have to spread data around the cluster. Why can‟t we hold it all together?
  • 86. It‟s all about the keys
  • 87. We can collocate data with common keys but if they crosscut the only way to collocate is to replicate Crosscuttin g Keys Common Keys
  • 88. We tackle this problem with a hybrid model: Replicated Trader Party Trade Partitioned
  • 89. We adapt the concept of a Snowflake Schema.
  • 90. Taking the concept of Facts and Dimensions
  • 91. Everything starts from a Core Fact (Trades for us)
  • 92. Facts are Big, dimensions are small
  • 93. Facts have one key that relates them all (used to partition)
  • 94. Dimensions have many keys (which crosscut the partitioning key)
  • 95. Looking at the data: Facts: =>Big, common keys Dimensions =>Small, crosscutting Keys
  • 96. We remember we are a grid. We should avoid the distributed join.
  • 97. … so we only want to „join‟ data that is in the same process Use a Key Assignment Trade Policy MTMs (e.g. KeyAssociation s in Coherence) Common Key
  • 98. So we prescribe different physical storage for Facts and Dimensions Replicated Trader Party Trade Partitioned
  • 99. Facts are partitioned, dimensions are replicated Query Layer Trader Party Trade Transactions Data Layer Mtms Cashflows Fact Storage (Partitioned)
  • 100. Facts are partitioned, dimensions are replicated Dimension s (repliacte) Transactions Facts Mtms Cashflows (distribute/ partition) Fact Storage (Partitioned)
  • 101. The data volumes back this up as a sensible hypothesis Facts: =>Big =>Distribut e Dimensions =>Small => Replicate
  • 102. Key Point We use a variant on a Snowflake Schema to partition big entities that can be related via a partitioning key and replicate small stuff who’s keys can’t map to our partitioning key.
  • 104. So how does they help us to run queries without distributed joins? Select Transaction, MTM, RefrenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’
  • 105. What would this look like without this pattern? Get Get Get Get Get Get Get Cost Ledger Source Transa MTMs Legs Cost Center Books Books c-tions Center s s Network Time
  • 106. But by balancing Replication and Partitioning we don‟t need all those hops Get Get Get Get Get Get Get Cost Ledger Source Transac MTMs Legs Cost Centers Books Books -tions Centers Network
  • 107. Stage 1: Focus on the where clause: Where Cost Centre = „CC1‟
  • 108. Stage 1: Get the right keys to query the Facts Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’ Join Dimensions in Query Layer Transactions Mtms Cashflows Partitioned
  • 109. Stage 2: Cluster Join to get Facts Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’ Join Dimensions in Query Layer Transactions Join Facts Mtms acrossCashflows cluster Partitioned
  • 110. Stage 2: Join the facts together efficiently as we know they are collocated
  • 111. Stage 3: Augment raw Facts with relevant Dimensions Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’ Join Join Dimensions Dimensions in Query Layer in Query Layer Transactions Join FactsMtms across Cashflows cluster Partitioned
  • 112. Stage 3: Bind relevant dimensions to the result
  • 113. Bringing it together: Jav Replicated a clie Partitioned nt API Dimensions Facts We never have to do a distributed join!
  • 114. So all the big stuff is held partitioned And we can join without shipping keys around and having intermediate results
  • 115. We get to do this… Trade r Part y Trad e Trad Trade Part e r y
  • 116. …and this… Trade r Part Version 1 y Trad e Trade r Part Version 2 y Trad e Trade r Part Version 3 y Trad e Trade r Part Version 4 y Trad e
  • 117. ..and this.. Trad Trade Part e r y Part Trade Trad y r e Part y Trade r Trad e Part y
  • 118. …without the problems of this…
  • 120. ..all at the speed of this… well almost!
  • 121.
  • 122. But there is a fly in the ointment…
  • 123. I lied earlier. These aren‟t all Facts. Facts This is a dimension • It has a different key to the Facts. Dimensions • And it’s BIG
  • 124. We can‟t replicate really big stuff… we‟ll run out of space => Big Dimensions are a problem.
  • 125. Fortunately there is a simple solution!
  • 126. Whilst there are lots of these big dimensions, a large majority are never used. They are not all “connected”.
  • 127. If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
  • 128. Looking at the Dimension data some are quite large
  • 129. But Connected Dimension Data is tiny by comparison
  • 130. One recent independent study from the database community showed that 80% of data remains unused
  • 131. So we only replicate ‘Connected’ or ‘Used’ dimensions
  • 132. As data is written to the data store we keep our „Connected Caches‟ up to date Processing Layer Dimension Caches (Replicated) Transactions Data Layer As new Facts are added Mtms relevant Dimensions that they reference are moved Cashflows to processing layer caches Fact Storage (Partitioned)
  • 133. The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
  • 134. Saving a trade causes all it‟s 1 levelst references to be triggered Query Layer Save Trade (With connected dimension Caches) Data Layer Cache Trad (All Normalised) Store e Partitioned Trigger Cache Party Sourc Ccy Alias e Book
  • 135. This updates the connected caches Query Layer (With connected dimension Caches) Data Layer Trad (All Normalised) e Party Sourc Ccy Alias e Book
  • 136. The process recurses through the object graph Query Layer (With connected dimension Caches) Data Layer Trad (All Normalised) e Party Sourc Ccy Alias e Book Party Ledge rBook
  • 137. ‘Connected Replication’ A simple pattern which recurses through the foreign keys in the domain model, ensuring only ‘Connected’ dimensions are replicated
  • 138. With ‘Connected Replication’ only 1/10th of the data needs to be replicated (on average).
  • 139. Limitations of this approach
  • 145. Conclusion Partitioned Storage

Hinweis der Redaktion

  1. I started a project back in 2004. It was a trading system back at barcap. When it came to persisting our data there were three choices, Oracle, Sybase or Sql Server. A lot of has changed in that time. Today, we are far more likely to look at one of a variety of technologies to satisfy our need to store and re-retrieve our data. So how many of you use a traditional database?What about a distributed database like Oracle RAC?NoSQL?.. do you use it with a database or stand alone.What about an in memory database? in production?Finally what about distributed in memory?This talk is about an in memory database. It&apos;s not really a distributed cache, despite being implemented in Coherence, although you could call it one if you preferred. In truth it has a variety of elements that make it closer to what you might perceive to be a database. It is normalised: that is to say that it holds entities independently from one another and versions them as such. It has some basic guarantees of the automaticity when writing certain groups of objects that are collocated. Most importantly it is both fast and scalable regardless of the join criteria you impose on it, this being something fairly illusive in the world of distributed data storage. I have a few aims for today:I hope you will leave with a broader view on what stores are available to you and what is coming in the future.I hope you&apos;ll see the benefits that niece storage solutions can provide through simpler contracts between client and data store.I&apos;d like you to understand the benefits of memory over disk.
  2. Better example is amazonPartition by user so orders and basket are held togetherProducts will be shared by multiple users
  3. Big data sets are held distributed and only joined on the grid to collocated objects.Small data sets are held in replicated caches so they can be joined in process (only ‘active’ data is held)
  4. Big data sets are held distributed and only joined on the grid to collocated objects.Small data sets are held in replicated caches so they can be joined in process (only ‘active’ data is held)