SlideShare ist ein Scribd-Unternehmen logo
1 von 89
Downloaden Sie, um offline zu lesen
One Size Doesn’t Fit All
                          The database revolution



                          April 25, 2012

                          Mark R. Madsen
                          http://ThirdNature.net

                          Robin Bloor
                          http://Bloorgroup.com




Wednesday, April 25, 12
Your Host




                             Eric.kavanagh@bloorgroup.com




Wednesday, April 25, 12
Analysts Host


        Bloor                             Madsen




Wednesday, April 25, 12
Introduction
                  Significant and revolutionary changes are taking place
                  in database technology

                  In order to investigate and analyze these changes and
                  where they may lead, The Bloor Group has teamed up
                  with Third Nature to launch an Open Research
                  project.

                  This is the final webinar in a series of webinars and
                  research activities that have comprised part of the
                  project

                  All published research will be made available through
                  our web site: Databaserevolution.com

Wednesday, April 25, 12
Sponsors of This Research




Wednesday, April 25, 12
General Webinar Structure

             Market Changes, Database Changes
             (Some Of The Findings)
             Let’s Talk About Performance
             How to Select A Database



Wednesday, April 25, 12
Market Changes, Database
                         Changes




Wednesday, April 25, 12
Database Performance Bottlenecks
                  CPU saturation

                  Memory saturation

                  Disk I/O channel saturation

                  Locking

                  Network saturation

                  Parallelism – inefficient load balancing




Wednesday, April 25, 12
Multiple Database Roles
                Transactional Systems                      BI and Analytics Systems

                                                 BI             BI                       BI
                                                               BI                       BI
                                                App            App                      App
                                                              App                      App

                 Unstructured   Structured
                    Data           Data



                                                                                       Personal
                                                                                                          BI
                   App           App         Operational       Data                   Personal           BI
                    App           App                         Data                       Data            App
                     App           App          Data           Marts                    Data            App
                                                              Marts                     Stores
                                               Store                                   Stores




                  File or
                   File or      DBMS          Staging                    Data                            OLAP
                  DBMS or
                      File       DBMS                                                                   OLAP
                    DBMS                       Area                    Warehouse                         Cubes
                                                                                                        Cubes
                      DBMS        DBMS




                                              Content        BI             File or                BI
                                               DBMS         App             DBMS                  App




                                   Now there are more...
Wednesday, April 25, 12
The Origin of Big Data


                                       Corporate
                                       Databases


                                   + Unstructured Data

                                    + Personal Data
                               + Supply Chain & Cust. Data
                                      + Web Data

                                  + Social Network Data

                               + Embedded Systems Data


Wednesday, April 25, 12
Wednesday, April 25, 12
Big Data = Scale Out
                     The query is decomposed
                         into a sub-query                      Query
                           for each node

                                                                                            The columnar database
                                                                                             scales up and out by
                     Database                   Sub                         Sub              adding more servers
                      Table                    Query 1                     Query 2


                                               Server 1                   Server 2                Server 1
                                           CPU           CPU            CPU           CPU       CPU           CPU



                                               Common                      Common                  Common
                                               Memory                      Memory                  Memory




                                                 Cache                        Cache                   Cache




               Data is compressed and     DataData                     DataData                DataData
                                                  DataData                     DataData                DataData
                partitioned on disk by
                column and by range




Wednesday, April 25, 12
Let’s Stop Using the Term NoSQL
                               Single Table




        As the graph          Star Schema




   indicates, it’s just not
                                              oldsql           newsql
                               Snow Flake



     helpful. In fact it’s    TNF Schema                           Data
                                                                  Volume

   downright confusing.           OLAP



                              Nested Data

                                                       nosql
                               Graph Data



                              Complex Data




Wednesday, April 25, 12
Wednesday, April 25, 12
NoSQL Directions
           Some NDBMS do not attempt to provide all ACID properties.
           (Atomicity, Consistency, Isolation, Durability)

           Some NDBMS deploy a distributed scale-out architecture with data
           redundancy.

           XML DBMS using XQuery are NDBMS.

           Some documents stores are NDBMS (OrientDB, Terrastore, etc.)

           Object databases are NDBMS (Gemstone, Objectivity, ObjectStore, etc.)

           Key value stores = schema-less stores (Cassandra, MongoDB, Berkeley
           DB, etc.)

           Graph DBMS (DEX, OrientDB, etc.) are NDMBS

           Large data pools (BigTable, Hbase, Mnesia, etc.) are NDBMS


Wednesday, April 25, 12
The Joys of SQL?
             SQL: very good for set manipulation.
             Works for OLTP and many query
             environments.
             Not good for nested data structures
             (documents, web pages, etc.)
             Not good for ordered data sets
             Not good for data graphs (networks of
             values)




Wednesday, April 25, 12
Wednesday, April 25, 12
The “Impedance Mismatch”
           The RDBMS stores data organized
           according to table structures

           The OO programmer manipulates data
           organized according to complex object
           structures, which may have specific
           methods associated with them.

           The data does not simply map to the
           structure it has within the database

           Consequently a mapping activity is
           necessary to get and put data

           Basically: hierarchies, types, result sets,
           crappy APIs, language bindings, tools
Wednesday, April 25, 12
Wednesday, April 25, 12
The SQL Barrier
           SQL has:
               DDL (for data definition)
               DML (for Select, Project and Join)
               But it has no MML (Math) or TML
               (Time)

           Usually result sets are brought to
           the client for further analytical
           manipulation, but this creates
           problems

           Alternatively doing all analytical
           manipulation in the database
           creates problems

Wednesday, April 25, 12
Wednesday, April 25, 12
Hadoop/MapReduce
             Hadoop is a parallel                  Map       Partition   Combine     Reduce



             processing environment     BackUp
                                        /Recov
                                                 Scheduler
                                                                               Node i+1

                                                                               Reducing

             Map/Reduce is a parallel
                                                                                              BackUp
                                                                               Process        /Recov
                                                   BackUp

             processing framework                  /Recov



                                                   Node 1

             Hbase turns Hadoop into
                                                                                   Node j
                                                   Mapping
                                        HDFS       Process                     Reducing       BackUp

             a database of a kind
                                                                               Process        /Recov




             Hive adds an SQL                      BackUp
                                                   /Recov                      Node k

             capability                                                        Reducing
                                                                               Process
                                                                                              BackUp
                                                                                              /Recov
                                                   Node i


             Pig adds analytics
                                                   Mapping
                                        HDFS       Process




Wednesday, April 25, 12
Wednesday, April 25, 12
Market Forces
                  A new set of products appear

                  They include some fundamental innovations

                  A few are sufficiently popular to last

                  Fashion and marketing drive greater adoption

                  Products defects begin to be addressed

                  They eventually challenge the dominant products




Wednesday, April 25, 12
Let’s Talk About Performance




Wednesday, April 25, 12
Performance%and%Scalability%
Scalability%and%performance%are%not%the%same%thing%
Performance%measures
                     %
Throughput:"the"number"of"
tasks"completed"in"a"given"
5me"period"
A"measure"of"how"much"
work"is"or"can"be"done"by"a"
system"in"a"set"amount"of"
5me,"e.g."TPM"or"data"
loaded"per"hour."
It’s"easy"to"increase"
throughput"without"
improving"response"5me."
Page 14
Performance%measures%

Response'8me:"the"speed"
of"a"single"task"
Response"5me"is"usually"
the"measure"of"an"
individual's"experience"
using"a"system.""
Response"5me"=""
5me"interval"/"throughput"


                         Page 15
Scalability%vs%throughput%vs%response%<me%




Scalability"="consistent"performance"for"a"task"over"an"
increase"in"a"scale"factor"
Three%possible%scale%factors
                                    %

Computations!




                                        Number
                          Amount        of users!
                          of data!
Scale:%Data%Volume%
The"different"ways"people"count"
make"establishing"rules"of"thumb"
for"sizing"hard."
How"do"you"measure"it?"
  ▪  Row"counts"
  ▪  Transac5on"counts"
  ▪  Data"size"
  ▪  Raw"data"vs"loaded"data"
  ▪  Schema"objects"

People's8ll'have'trouble'scaling'for'
databases'as'large'as'a'single'PC'
hard'drive.'
Scale:%Concurrency%(ac<ve%and%passive)
                                     %
Scalability%rela<onships
                                     %
As"concurrency"
increases,"response"5me"
(usually)"decreases,"
This"can"be"addressed"
somewhat"via"workload"
management"tools."
When"a"system"hits"a"
bogleneck,"response"
5me"and"throughput"will "
ohen"get"worse,"not"just"
level"off."
“Linear%Scalability”
                                   %
 This"is"the"part"of"the"chart"most"vendors"show.
                                                "




If you’re lucky they leave the bottom axis on so you
know where their system flatlines.
Scale:%Computa<onal%Complexity%
A"key"point"worth"remembering:"

Performance"over"size"<>"performance"over"complexity"

Analy5cs"performance"is"about"the"intersec5on"of"both. "
Database"performance"for"BI"is"mostly"related"to"size"
and"query"complexity."
SOME%TECHNOLOGY%STUFF%
Large%Memories%and%Large%Databases
                                        %
Not"as"fast"as"you"expect"because"of"how"
databases"were"designed"(op5mized"for"small"
memories"and"disk"access)."
For"example:"sequen5al"scans"and"cache"serializa5on"
512GB DB buffer cache



                                         LRU overwrites
                                         older blocks



1B rows, 100/block =
640GB table                            unread
In_Memory%Databases%Today%
1.  Maybe"not"as"fast"you"think."Depends"en5rely"on"
    the"database"(e.g."VectorWise)"
2.  Applied"mainly"to"shared?everything"systems"
3.  Very"large"memories"are"more"applicable"to"shared?
    nothing"than"shared?memory"systems"
7.  S5ll"an"expensive"way"to"get"performance"




  " "Box?limited        "Limited"by"node"scaling"
  " "e.g."2"TB"max      "e.g."16"nodes,"512GB"per"="8TB"
Hardware%changes%enable%new%so`ware%models
                                         %
The"extra"CPU"allows"us"to"
do"things"in"sohware"that"
we"avoided"in"the"past"
because"of"scarce"
resources."
Compression"techniques"
and"columnar"database"
architectures"which"that"
consumed"too"much"are"
now"possible."
Improving%Query%Performance:%Columnar%Databases
                                              %

ID% Name%            Salary%      Posi<on%           In a row-store model
1" Marge"Inovera"    $150,000"         Sta5s5cian"   these three rows
2" Anita"Bath"       $120,000" Sewer"inspector"      would be stored in
3" Ivan"Awfulitch"   $160,000"      Dermatologist"   sequential order as
4" Nadia"Geddit"       $36,000"               DBA"
                                                     shown here, packed
                                                     into a block.


1" Marge"Inovera"     $150,000" Sta5s5cian"          In a column store
2" Anita"Bath"        $120,000" Sewer"inspector"     they would be
3" Ivan"Awfulitch"    $166,000" Dermatologist"       divided into columns
4" Nadia"Geddit"       $36,000" DBA"                 and stored in
                                                     different blocks.
Inser<ng%data%into%a%columnar%database%
                        Each column is stored in its own set
                        of blocks, written to disk separately.
                        Extra work for writes over rowstore,
                        update complexity, delete complexity.
1"   Marge"Inovera"    $150,000"   Sta5s5cian"
2"   Anita"Bath"       $120,000"   Sewer"inspector"
3"   Ivan"Awfulitch"   $166,000"   Dermatologist"
4"   Nadia"Geddit"      $36,000"   DBA"
Reading%from%a%columnar%database%
                       SELECT * FROM emp WHERE ID = 1
                       4 reads, extract & stitch


1"   Marge"Inovera"    $150,000"   Sta5s5cian"
2"   Anita"Bath"       $120,000"   Sewer"inspector"
3"   Ivan"Awfulitch"   $166,000"   Dermatologist"
4"   Nadia"Geddit"      $36,000"   DBA"
Column%elimina<on%and%I/O%
                       SELECT AVG(salary) FROM emp
                       1 read


1"   Marge"Inovera"    $150,000"   Sta5s5cian"
2"   Anita"Bath"       $120,000"   Sewer"inspector"
3"   Ivan"Awfulitch"   $166,000"   Dermatologist"
4"   Nadia"Geddit"      $36,000"   DBA"
How%do%we%scale%performance%for%queries?%
              Make CPU         Add CPUs        Parallelize query
                faster                            execution
Query



CPU


             Faster"CPUs"     More"CPUs"     Parallel"query"
             means"quicker"   means"more"    execu5on"resolves"
             response"5me,"   throughput."   response"5me"but"it"
             increased"                      consumes"more"
             throughput."                    resources,"reducing"
                                             concurrency"and"
                                             possibly"throughput."
Early%query%performance%scaling:%table%par<<oning%

  Table"par55oning"distributes"rows"across"table"
  par55ons"by"range,"hash"or"round"robin"when"
  you"insert"or"load"the"data."

                                  fn




    QI Sales Table   Q2 Sales Table    Q3 Sales Table   Q4 Sales Table
Scale_up%vs.%Scale_out%Parallelism%
Uniprocessor"environments"required"chip"upgrades."
SMP"servers"can"grow"to"a"point,"then"it’s"a"forklih"upgrade"
to"a"bigger"box."
MPP"servers"grow"by"adding"mode"nodes."




     (a)"Scaling"up"with"a"larger"server
                                       "(b)"Scaling"out"with"many"small"servers"




Copyright"Third"Nature,"Inc."                                           Slide 34
Sharding,%aka%Par<<oning%at%the%Node%Level
                                          %
Sharding"is"basically"horizontal"par55oning"applied"
across"mul5ple"database"servers."
Each"node"holds"a"(hopefully)"self?consistent"por5on"
of"the"database."
Good"as"long"as"queried"data"lives"on"a"single"node."



                    Query redirect




       One large database = several smaller databases
Sharding,%Databases%and%Queries
                                      %
What"happens"when"you"need"to"scan"a"full"table"or"
join"tables"across"nodes?"Mul5ple"queries"and"
s5tching"at"the"applica5on"level."




Sharding"works"well"for"fixed"access"paths,"uniform"query"
plans,"and"data"sets"that"can"be"isolated."Mainly"this"
describes"an"OLTP?style"workload."
Cloud%Hardware%Architecture%
It’s"a"scale?out"model."Uniform"virtual"node"
building"blocks."
This"is"the"future"of"sohware"deployments,"albeit"
with"increasing"node"sizes,"so"paying"agen5on"to"
early"adopters"today"will"pay"off."
This"implies"that"an"MPP"database"architecture"
will"be"needed"for"scale."


               X
MPP%Database%Architecture%

                                                                    Leader"node(s)"
                                                                    used"by"some"
  Worker"nodes"




 High"speed"interconnect"
                                                      Some"use"separate"loader"nodes"

 Some database are symmetric (all nodes are the same).
 Some allow mixed worker node sizes. Some are leaderless.
 Some problems with leaders, loaders, e.g. less automated
 management of the environment, treating bottlenecks
Copyright"Third"Nature,"Inc."              Slide 38
Key%to%MPP:%data%distribu<on%
                                Single logical view of a table

                        Table data is evenly spread across all nodes.




 The good: scalability to petabyte range, much faster filtering and
 selection on scans.
 The bad: data skew (values, not rowcounts), aggregate function
 bottlenecks, concurrency challenges, complex multi-table joins
 with unlike distributions.


Copyright"Third"Nature,"Inc."              Slide 39
MPP%challenges%mostly%hinge%on%data%distribu<on%
 Imagine"fact"&"dim"tables"spread"across"all"nodes."
 You"need"to"get"dim"data"to"each"node"to"join"with"
 fact"rows"stored"there."
 Cross?node"joins"result"in"data"shipping."This"is"where"
 inter?node"latency,"data"skew,"node"skew"can"bog"
 down"query"performance."

   Fact tb          Fact tb    The"real"test"of"an"MPP"
                               database"is"not"how"fast"it"
   Dim tb           Dim tb     can"scan"data."That’s"easy."
                               Test"joins"in"a"PoC."
    Node 1          Node 2
MATCHING%PROBLEMS%TO%
TECHNOLOGIES%
Solving%the%Problem%Depends%on%the%Diagnosis
                                           %
Three%General%Workloads
                                     %
Online"Transac5on"Processing"
  ▪  Read,"write,"update"
  ▪  User"concurrency"is"the"common"performance"limiter"
  ▪  Low"data,"compute"complexity"
Business"Intelligence"/"Data"warehousing"
  ▪  Assumed"to"be"read?only,"but"really"read"heavy,"write"heavy,"
     usually"separated"in"5me"
  ▪  Data"size"is"the"common"performance"limiter"
  ▪  High"data"complexity,"low"compute"complexity"
Analy5cs"
  ▪  Read,"write"
  ▪  Data"size"and"complexity"of"algorithm"are"the"limiters"
  ▪  Moderate"data","high"compute"complexity"
Three%General%Workloads
                                 %
But…"
BI"is"not"read"only"
OLTP"is"not"write?only"
Analy5cs"is"not"purely"computa5on"
Types%of%workloads
                                 %
Write?biased:""               Read?biased:"
  ▪  OLTP"                       ▪  Query"
  ▪  OLTP,"batch"                ▪  Query,"simple"retrieval"
  ▪  OLTP,"lite"                 ▪  Query,"complex"
  ▪  Object"persistence"         ▪  Query?hierarchical"/"
  ▪  Data"ingest,"batch"            object"/"network"
  ▪  Data"ingest,"real?5me"      ▪  Analy5c"


                         Mixed
      Inline analytic execution, operational BI
What%you%need%depends%
 on%workload%&%need%
Op5mizing"for:"
  ▪  Response"5me?"
  ▪  Throughput?"
  ▪  both?"
Concerned"about"rapid"
growth"in"data?"
Unpredictable"spikes"in"use?"
Bulk"loads"or"incremental"
inserts"and/or"updates?"
Important%workload%parameters%to%know%
•  Read?intensive""vs."write?intensive"
Important%workload%parameters%to%know%
•  Read?intensive""vs."write?intensive"
•  Mutable"vs."immutable"data"
Important%workload%parameters%to%know%
•  Read?intensive""vs."write?intensive"
•  Mutable"vs."immutable"data"
•  Immediate"vs."eventual"consistency"
Important%workload%parameters%to%know%
•  Read?intensive""vs."write?intensive"
•  Mutable"vs."immutable"data"
•  Immediate"vs."eventual"consistency"
•  Short"vs."long"access"latency"
Important%workload%parameters%to%know%
•  Read?intensive""vs."write?intensive"
•  Mutable"vs."immutable"data"
•  Immediate"vs."eventual"consistency"
•  Short"vs."long"data"latency"
•  Predictable"vs."unpredictable"data"access"pagerns"
Important%workload%parameters%to%know%
•  Read?intensive""vs."write?intensive"
•  Mutable"vs."immutable"data"
•  Immediate"vs."eventual"consistency"
•  Short"vs."long"data"latency"
•  Predictable"vs."unpredictable"data"access"pagerns"
•  Simple"vs."complex"data"types"
You"must"understand"your"
workload"mix"?"throughput"
and"response"5me"
requirements"aren’t"enough."
  ▪  100"simple"queries"accessing"
     month?to?date"data"
  ▪  90"simple"queries"accessing"
     month?to?date"data"and"10"
     complex"queries"using"two"
     years"of"history"
  ▪  Hazard"calcula5on"for"the"
     en5re"customer"master"
  ▪  Performance"problems"are"
     rarely"due"to"a"single"factor.""
Two%useful%concepts%to%characterize%queries
                                            %
Selec7vity"–"The"restric5veness"of"a"query"when"
accessing"data."A"highly"selec5ve"query"filters"out"most"
rows."Low"selec5ve"queries"read"most"of"the"rows."
    "High                                  "Low"
SELECT SUM(salary)                    SELECT SUM(salary)
FROM emp WHERE ID = 1                 FROM emp
Two%useful%concepts%to%characterize%queries
                                            %
Retrieval"–"The"restric5veness"of"a"query"when"returning"
data."High"retrieval"brings"back"most"of"the"rows."Low"
retrieval"brings"back"rela5vely"few"rows."
      "High                                  "Low"
SELECT name, salary                   SELECT SUM(salary)
FROM emp                              FROM emp
Selec<vity%and%number%of%columns%queried%
Row"store"or"column"store,"indexed"or"not?"




        Chart from “The Mimicking Octopus: Towards a one-size-fits-all Database Architecture”, Alekh Jindal
Characteris<cs%of%query%workloads
                                         %

Workload%            Selec<vity% Retrieval% Repe<<on%           Complexity%
Repor<ng%/%BI%       Moderate% Low%            Moderate%        Moderate%
Dashboards%/%        Moderate% Low%            High%            Low%
scorecards%
Ad_hoc%query%and% Low%to%         Moderate% Low%                Low%to%
analysis%         high%           to%low%                       moderate%
Analy<cs%(batch)%    Low%         High%        Low%to%High% Low*%
Analy<cs%(inline)%   High%        Low%         High%            Low*%
Opera<onal%/%        High%        Low%         High%            Low%
embedded%BI%
* Low for retrieving the data, high if doing analytics in SQL
Characteris<cs%of%read_write%workloads
                                          %

Workload%         Selec<vity%    Retrieval% Repe<<on%       Complexity%
Online%OLTP%      High%          Low%         High%         Low%
Batch%OLTP%       Moderate%to% Moderate% High%              Moderate%to%
                  low%         to%high%                     high%
Object%           High%          Low%         High%         Low%
persistence%
Bulk%ingest%      Low%(write)% n/a%           High%         Low%
Real<me%ingest% High%(write)% n/a%            High%         Low%



With ingest workloads we’re dealing with write-only, so selectivity and
retrieval don’t apply in the same way, instead it’s write volume.
Workload%parameters%and%DB%types%at"data"scale"
Workload%     Write_    Read_ Updateable% Eventual%    Un_          Compute%
parameters%   biased%   biased% data%     consistency% predictable% intensive%
                                          ok?%         query%path%
Standard%
RDBMS%
Parallel%
RDBMS%
NoSQL%(kv,%
dht,%obj)%
Hadoop*%

Streaming%
database%

    You see the problem: it’s an intersection of multiple parameters, and
    this chart only includes the first tier of parameters. Plus, workload
    factors can completely invert these general rules of thumb.
Workload%parameters%and%DB%types%at"data"scale"
Workload%          Complex% Selec<ve% Low%latency% High%          High%ingest%
parameters%        queries% queries%  queries%     concurrency%   rate%


Standard%
RDBMS%
Parallel%RDBMS%


NoSQL%(kv,%dht,%
obj)%
Hadoop%

Streaming%
database%

   You have to look at the combination of workload factors: data scale,
   concurrency, latency & response time, then chart the parameters.
Problem:%Architecture%Can%Define%Op<ons
                                     %
A%general%rule%for%the%read_write%axes
                                              %

                                As"workloads"increase"in"both"
                                intensity"and"complexity,"we"move"
                                into"a"realm"of"specialized"databases"
                                adapted"to"specific"workloads."

                 NewSQL
Read intensity



                                   NoSQL
                 OldSQL


                          Write intensity
In%general…%
Rela5onal"row"store"databases"for"conven5onally"tooled"
low"to"mid?scale"OLTP"
Rela5onal"databases"for"ACID"requirements"
Parallel"databases"(row"or"column)"for"unpredictable"or"
variable"query"workloads"
Specialized"databases"for"complex"data"query"workjloads"
NoSQL"(KVS,"DHT)"for"high"scale"OLTP"
NoSQL"(KVS,"DHT)"for"low"latency"read?mostly"data"access"
Parallel"databases"(row"or"column)"for"analy5c"workloads"
over"tabular"data"
NoSQL"/"Hadoop"for"batch"analy5c"workloads"over"large"
data"volumes"
How To Select A Database




Wednesday, April 25, 12
Wednesday, April 25, 12
How To Select A Database - (1)
      1.What are the data management requirements and policies (if any) in
         respect of:
            - Data security (including regulatory requirements)?
            - Data cleansing?
            - Data governance?
            - Deployment of solutions in the cloud?
            - If a deployment environment is mandated, what are its technical
              characteristics and limitations? Best of breed, no standards for
              anything, “polyglot persistence” = silos on steroids, data integration
              challenges, shifting data movement architectures
      2. What kind of data will be stored and used?
            - Is it structured or unstructured?
            - Is it likely to be one big table or many tables?



Wednesday, April 25, 12
How To Select A Database - (2)
      3.What are the data volumes expected to be?
          - What is the expected daily ingest rate?
          - What will the data retention/archiving policy be?
          - How big do we expect the database to grow to? (estimate a range).
      4. What are the applications that will use the database?
          - Estimate by user numbers and transaction numbers
          - Roughly classify transactions as OLTP, short query, long query, long
            query with analytics.
          - What are the expectations in respect of growth of usage (per user) and
            growth of user population?
      5. What are the expected service levels?
          - Classify according to availability service levels
          - Classify according to response time service levels
          - Classify on throughput where appropriate

Wednesday, April 25, 12
How To Select A Database - (3)
      6. What is the budget for this project and what does that cover?
      7. What is the outline project plan?
           - Timescales
           - Delivery of benefits
           - When are costs incurred?
      8. Who will make up the project team?
           - Internal staff
           - External consultants
           - Vendor consultants
      9. What is the policy in respect of external support, possibly including vendor
         consultancy for the early stages of the project?




Wednesday, April 25, 12
How To Select A Database - (4)
      10.What are the business benefits?
          - Which ones can be quantified financially?
          - Which ones can only be guessed at (financially)?
          - Are there opportunity costs?




Wednesday, April 25, 12
A random selection of databases
  Sybase IQ, ASE               EnterpriseDB     Algebraix
  Teradata, Aster Data         LucidDB          Intersystems Caché
  Oracle, RAC                  Vectorwise       Streambase
  Microsoft SQLServer, PDW     MonetDB          SQLStream
  IBM DB2s, Netezza            Exasol           Coral8
  Paraccel                     Illuminate       Ingres
  Kognitio                     Vertica          Postgres
  EMC/Greenplum                InfiniDB         Cassandra
  Oracle Exadata               1010 Data        CouchDB
  SAP HANA                     SAND             Mongo
  Infobright                   Endeca           Hbase
  MySQL                        Xtreme Data      Redis
  MarkLogic                    IMS              RainStor
  Tokyo Cabinet                Hive             Scalaris
                          And a few hundred more…
Wednesday, April 25, 12
Product%selec<on%op<ons
                                   %
The"Subtrac5on"Model"
 ▪  Start"with"a"full"set,"remove"what’s"bad,"evaluate"the"
    remainder"
  ▪  Conven5onal"analyst"model"
  ▪  Works"best"with"a"stable"market"
The"Addi5on"Model"
 ▪  Start"with"an"empty"set,"add"what’s"good,"evaluate"
    the"results"
 ▪  The"designer"model"
 ▪  Works"best"in"an"emerging"or"changing"market"
Product Selection
             Preliminary investigation

             Short-list (usually arrived at by elimination)

             Be sure to set the goals and control the process.

             Evaluation by technical analysis and modeling

             Evaluation by proof of concept.

             Do not be afraid to change your mind

             Negotiation



Wednesday, April 25, 12
Conclusion
             Wherein all is revealed, or ignorance exposed




Wednesday, April 25, 12
Wednesday, April 25, 12
Thank You
                          For Your
                          Attention



Wednesday, April 25, 12

Weitere ähnliche Inhalte

Mehr von Inside Analysis

The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionInside Analysis
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsInside Analysis
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingInside Analysis
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLInside Analysis
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelInside Analysis
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureInside Analysis
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskInside Analysis
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataInside Analysis
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseInside Analysis
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldInside Analysis
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave DuggalInside Analysis
 
Phasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyPhasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyInside Analysis
 
Red Hat - Sarangan Rangachari
Red Hat - Sarangan RangachariRed Hat - Sarangan Rangachari
Red Hat - Sarangan RangachariInside Analysis
 
DisrupTech - Robin Bloor (2)
DisrupTech - Robin Bloor (2)DisrupTech - Robin Bloor (2)
DisrupTech - Robin Bloor (2)Inside Analysis
 
DisrupTech - Robin Bloor (1)
DisrupTech - Robin Bloor (1)DisrupTech - Robin Bloor (1)
DisrupTech - Robin Bloor (1)Inside Analysis
 
Big Data Refinery: Distilling Value for User-Driven Analytics
Big Data Refinery: Distilling Value for User-Driven AnalyticsBig Data Refinery: Distilling Value for User-Driven Analytics
Big Data Refinery: Distilling Value for User-Driven AnalyticsInside Analysis
 

Mehr von Inside Analysis (20)

The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop Adoption
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time Analytics
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of Everything
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global Level
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your Architecture
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the Risk
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data Warehouse
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave Duggal
 
Modus Operandi
Modus OperandiModus Operandi
Modus Operandi
 
Phasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyPhasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey Malafsky
 
Red Hat - Sarangan Rangachari
Red Hat - Sarangan RangachariRed Hat - Sarangan Rangachari
Red Hat - Sarangan Rangachari
 
WebAction-Sami Abkay
WebAction-Sami AbkayWebAction-Sami Abkay
WebAction-Sami Abkay
 
DisrupTech 2015ek
DisrupTech 2015ekDisrupTech 2015ek
DisrupTech 2015ek
 
DisrupTech - Robin Bloor (2)
DisrupTech - Robin Bloor (2)DisrupTech - Robin Bloor (2)
DisrupTech - Robin Bloor (2)
 
DisrupTech - Robin Bloor (1)
DisrupTech - Robin Bloor (1)DisrupTech - Robin Bloor (1)
DisrupTech - Robin Bloor (1)
 
Big Data Refinery: Distilling Value for User-Driven Analytics
Big Data Refinery: Distilling Value for User-Driven AnalyticsBig Data Refinery: Distilling Value for User-Driven Analytics
Big Data Refinery: Distilling Value for User-Driven Analytics
 

Kürzlich hochgeladen

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 

Kürzlich hochgeladen (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 

Fit For Purpose: The New Database Revolution Findings Webcast