SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Downloaden Sie, um offline zu lesen
Scaling with HiveDB
The Problem
• Hundreds of
  millions of
  products

• Millions of new
  products every
  week

• Accelerating
  growth
Scaling up is no
longer an option.
Requirements
➡   OLTP optimized
➡   Constant response time is more
    important than low-latency
➡   Expand capacity online
➡   Ability to move records
➡   No re-sharding
2 years ago
   Clustered      Non-relational   Unstructured



   Oracle RAC
 MS SQL Server                      Hadoop
 MySQL Cluster                      MogileFS
Teradata (OLAP)                       S3
Today
   Clustered      Non-relational   Unstructured



   Oracle RAC
 MS SQL Server     Hypertable
                                    Hadoop
 MySQL Cluster       HBase
                                    MogileFS
      DB2           CouchDB
                                      S3
Teradata (OLAP)    SimpleDB
    HiveDB
System      Storage   Interface   Partitioning    Expansion   Node Types     Maturity


                                                        No
 Oracle RAC     Shared      SQL       Transparent                 Identical      7 years
                                                     downtime

                Memory                               Requires
MySQL Cluster               SQL       Transparent                  Mixed         3 years
                / Local                               Restart

                                                                                    ?
                                                        No
 Hypertable      Local     HQL        Transparent                  Mixed        (released
                                                     downtime
                                                                                  2/08)
                                                     Degraded                    3 years
    DB2          Local      SQL       Fixed Hash     Performan    Identical     (25 years
                                                         ce                       total)
                                      Key-based    No                           18 months
   HiveDB        Local      SQL                                    Mixed
                                      Directory downtime                       (+13 years!)
Operating a
distributed system
is hard.
Non-functional
concerns quickly
dominate functional
concerns.
Replication
Fail over
Monitoring
Our approach:
minimize cleverness
Build atop solid,
easy to manage
technologies.
So we can
get back
to selling
t-shirts.
MySQL is fast.
MySQL is well
understood.
MySQL replication
works.
MySQL is free.
The same for JAVA
The simplest thing
➡   HiveDB is a JDBC Gatekeeper for
    applications.
➡   Lightest integration
➡   Smallest possible implementation
The eBay approach
➡   Pro: Internally consistent data
➡   Con: Re-sharding of data required
➡   Con: Re-sharding is a DBA
    operation
Partition by key
A simple bucketed partitioning scheme
expressed in code.
Partition by key
Directory
➡   No broadcasting
➡   No re-partitioning
➡   Easy to relocate records
➡   Easy to add capacity
Disadvantages
➡   Intelligence moved from the
    database to the application
    (Queries have to be planned and
    indexed)
➡   Can’t join across partitions
➡   NO OLAP! (We consider this a
    benefit.)
Hibernate Shards
Partitioned Hibernate from Google
Why did we build
this thing again?
Oh wait, you have
to tell it where
things are.
Benefits of Shards
➡   Unified data access layer
➡   Result set aggregation across
    partitions
➡   Everyone in the JAVA-world knows
    Hibernate.
Working on
simplifying it even
further.
But does it go?
Case Study:
CafePress
➡   Leader in User-generated
    Commerce
➡   More products than eBay
    (>250,000,000)
➡   24/7/365
Case Study:
Performance Requirements
➡   Thousands of queries/sec
➡   10 : 1 read/write ratio
➡   Geographically distributed
Case Study:
Test Environment
➡   Real schema
➡   Production-class hardware
➡   Partial data (~40M records)
CafePress HiveDB 2007
Performance Test Environment


                             JMeter (1 thread)                                       JMeter (no threads)
 command &



                                 client.jar                                                client.jar
   control




                              Measurement                                              Test Controller
                               Workstation                                              Workstation
                                                      100MBit switch
    load generators




                       JMeter (100s of threads)                                     JMeter (100s of threads)
                                 client.jar                                                 client.jar

                                                    48GB backplane non-
                        Dell 2950 / 2x2 Xeon        blocking gigabit switch          Dell 2950 / 2x2 Xeon
                        16GB, 6x72GB 15k                                             16GB, 6x72GB 15k
 web service




                                                         Hardware LB
  (hivedb)




                        Dell 1950 / 2x2 Xeon        Dell 1950 / 2x2 Xeon            Dell 1950 / 2x2 Xeon
                             Tomcat 5.5                  Tomcat 5.5                      Tomcat 5.5



                                   Directory              Partition 0              Partition 1
 databases
  (mysql)




                             Dell 2950 / 2x2 Xeon   Dell 2950 / 2x2 Xeon      Dell 2950 / 2x2 Xeon
                             16GB, 6x72GB 15k       16GB, 6x72GB 15k          16GB, 6x72GB 15k

   jmccarthy@cafepress.com                                                                               Modified on April 09 2007
Case Study:
Performance Goals
➡   Large object reads: 1500/s


➡   Large object writes: 200/s


➡   Response time: 100ms
Case Study:
Performance Results
➡   Large object reads: 1500/s
    Actual result: 2250/s
➡   Large object writes: 200/s
    Actual result: 300/s
➡   Response time: 100ms
    Actual result: 8ms
Case Study:
Performance Results
➡   Max read throughput
    Actual result: 4100/s
    (CPU limited in Java layer;
    MySQL <25% utilized)
Case Study:
Results
➡   Billions of queries served
➡   Highest DB uptime at CafePress
➡   Hundreds of millions of updates
    performed
Show don’t tell!
High Availability
➡   We don’t specify a fail over
    strategy.
➡   We don’t specify a backup or
    replication strategy.
Fail Over Options
➡   Load balancer
➡   MySQL Proxy
➡   Flipper / MMM
➡   HAProxy
Replication
➡   You should use MySQL replication.
➡   It really works.
➡   You can add in LVM snapshots for
    even more disastrous disaster
    recovery.
The Bottleneck
There is one problem with HiveDB. The
directory is a scaling bottleneck.
There’s no reason
it has to be a
database.
MemcacheD
➡   Hive directory implemented atop
    memcached
Roadmap
➡   MemcacheD directory
➡   Simplified Hibernate support
➡   Management web service
➡   Command-line tools
➡   Framework Adapters
Call for developers!
➡   Grails (GORM)
➡   ActiveRecord (JRuby on Rails)
➡   Ambition (JRuby or web service)
➡   iBatis
➡   JPA
➡   ... Postgres?
Contributing
➡   Post to the mailing list
    http://groups.google.com/group/hivedb-dev

➡   Comment on our site
    http://www.hivedb.org

➡   Submit a patch / pull request
    git clone git://github.com/britt/hivedb.git
Photo Credits
➡   http://www.flickr.com/photos/7362313@N07/1240245941/
    sizes/o

➡   http://www.flickr.com/photos/
    99287245@N00/2229322675/sizes/o

➡   http://www.flickr.com/photos/
    51035555243@N01/26006138/

➡   http://www.flickr.com/photos/vgm8383/2176897085/
    sizes/o/

➡   http://t-shirts.cafepress.com/item/money-organic-cotton-
    tee/73439294
Blobject
➡   Gets around the problem of ALTER
    statements. No data set of this size
    can be transformed synchronously.
➡   The hive can contain multiple
    versions of a serialized record.
➡   Compression
Hadoop!
➡   Query and transform blobbed data
    asynchronously.

Weitere ähnliche Inhalte

Was ist angesagt?

Non-Relational Postgres
Non-Relational PostgresNon-Relational Postgres
Non-Relational PostgresEDB
 
High-Performance JDBC Voxxed Bucharest 2016
High-Performance JDBC Voxxed Bucharest 2016High-Performance JDBC Voxxed Bucharest 2016
High-Performance JDBC Voxxed Bucharest 2016Vlad Mihalcea
 
Parallel Query on Exadata
Parallel Query on ExadataParallel Query on Exadata
Parallel Query on ExadataEnkitec
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...DataStax
 
Siebel Enterprise Data Quality for Siebel
Siebel Enterprise Data Quality for SiebelSiebel Enterprise Data Quality for Siebel
Siebel Enterprise Data Quality for SiebelJeroen Burgers
 
BUG - BEA Users\' Group, Jan16 2003
BUG - BEA Users\' Group, Jan16 2003BUG - BEA Users\' Group, Jan16 2003
BUG - BEA Users\' Group, Jan16 2003Sanjeev Kumar
 
J2EE Performance And Scalability Bp
J2EE Performance And Scalability BpJ2EE Performance And Scalability Bp
J2EE Performance And Scalability BpChris Adkin
 
Scaling Twitter 12758
Scaling Twitter 12758Scaling Twitter 12758
Scaling Twitter 12758davidblum
 
The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012Lucas Jellema
 
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John Naylor
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John NaylorPGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John Naylor
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John NaylorEqunix Business Solutions
 
Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, AnalyticsReal-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, AnalyticsContinuent
 
Online Upgrade Using Logical Replication.
Online Upgrade Using Logical Replication.Online Upgrade Using Logical Replication.
Online Upgrade Using Logical Replication.EDB
 
New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...
New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...
New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...Continuent
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBMydbops
 
Evolutionary Database Design
Evolutionary Database DesignEvolutionary Database Design
Evolutionary Database DesignAndrei Solntsev
 
Tungsten University: Configure & Provision Tungsten Clusters
Tungsten University: Configure & Provision Tungsten ClustersTungsten University: Configure & Provision Tungsten Clusters
Tungsten University: Configure & Provision Tungsten ClustersContinuent
 
Dbvisit replicate: logical replication made easy
Dbvisit replicate: logical replication made easyDbvisit replicate: logical replication made easy
Dbvisit replicate: logical replication made easyFranck Pachot
 
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...Stephan H. Wissel
 

Was ist angesagt? (19)

Non-Relational Postgres
Non-Relational PostgresNon-Relational Postgres
Non-Relational Postgres
 
High-Performance JDBC Voxxed Bucharest 2016
High-Performance JDBC Voxxed Bucharest 2016High-Performance JDBC Voxxed Bucharest 2016
High-Performance JDBC Voxxed Bucharest 2016
 
Parallel Query on Exadata
Parallel Query on ExadataParallel Query on Exadata
Parallel Query on Exadata
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
 
Siebel Enterprise Data Quality for Siebel
Siebel Enterprise Data Quality for SiebelSiebel Enterprise Data Quality for Siebel
Siebel Enterprise Data Quality for Siebel
 
Get to know PostgreSQL!
Get to know PostgreSQL!Get to know PostgreSQL!
Get to know PostgreSQL!
 
BUG - BEA Users\' Group, Jan16 2003
BUG - BEA Users\' Group, Jan16 2003BUG - BEA Users\' Group, Jan16 2003
BUG - BEA Users\' Group, Jan16 2003
 
J2EE Performance And Scalability Bp
J2EE Performance And Scalability BpJ2EE Performance And Scalability Bp
J2EE Performance And Scalability Bp
 
Scaling Twitter 12758
Scaling Twitter 12758Scaling Twitter 12758
Scaling Twitter 12758
 
The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012
 
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John Naylor
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John NaylorPGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John Naylor
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John Naylor
 
Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, AnalyticsReal-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
 
Online Upgrade Using Logical Replication.
Online Upgrade Using Logical Replication.Online Upgrade Using Logical Replication.
Online Upgrade Using Logical Replication.
 
New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...
New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...
New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDB
 
Evolutionary Database Design
Evolutionary Database DesignEvolutionary Database Design
Evolutionary Database Design
 
Tungsten University: Configure & Provision Tungsten Clusters
Tungsten University: Configure & Provision Tungsten ClustersTungsten University: Configure & Provision Tungsten Clusters
Tungsten University: Configure & Provision Tungsten Clusters
 
Dbvisit replicate: logical replication made easy
Dbvisit replicate: logical replication made easyDbvisit replicate: logical replication made easy
Dbvisit replicate: logical replication made easy
 
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...
 

Andere mochten auch

Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentationbr7tt
 
Presentation1class Project Scm
Presentation1class Project ScmPresentation1class Project Scm
Presentation1class Project ScmLenny Wijaya
 
Whitepaper, lynx secure rootkit detection & protection by means of secure vir...
Whitepaper, lynx secure rootkit detection & protection by means of secure vir...Whitepaper, lynx secure rootkit detection & protection by means of secure vir...
Whitepaper, lynx secure rootkit detection & protection by means of secure vir...Avishai Ziv
 
Slidesare Class Project Revisi
Slidesare Class Project RevisiSlidesare Class Project Revisi
Slidesare Class Project RevisiLenny Wijaya
 
Jawaban Cp Mba33 Dmaic
Jawaban Cp Mba33 DmaicJawaban Cp Mba33 Dmaic
Jawaban Cp Mba33 DmaicLenny Wijaya
 
Class Project MBA 26 Biodisel
Class Project MBA 26 BiodiselClass Project MBA 26 Biodisel
Class Project MBA 26 BiodiselLenny Wijaya
 
Class Project MBA 26 Palm Oil Industry
Class Project MBA 26 Palm Oil IndustryClass Project MBA 26 Palm Oil Industry
Class Project MBA 26 Palm Oil IndustryLenny Wijaya
 
Class Project 29: Me
Class Project 29: MeClass Project 29: Me
Class Project 29: MeLenny Wijaya
 

Andere mochten auch (9)

Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentation
 
Presentation1class Project Scm
Presentation1class Project ScmPresentation1class Project Scm
Presentation1class Project Scm
 
Whitepaper, lynx secure rootkit detection & protection by means of secure vir...
Whitepaper, lynx secure rootkit detection & protection by means of secure vir...Whitepaper, lynx secure rootkit detection & protection by means of secure vir...
Whitepaper, lynx secure rootkit detection & protection by means of secure vir...
 
Slidesare Class Project Revisi
Slidesare Class Project RevisiSlidesare Class Project Revisi
Slidesare Class Project Revisi
 
Tugas Pa 2 B
Tugas Pa 2 BTugas Pa 2 B
Tugas Pa 2 B
 
Jawaban Cp Mba33 Dmaic
Jawaban Cp Mba33 DmaicJawaban Cp Mba33 Dmaic
Jawaban Cp Mba33 Dmaic
 
Class Project MBA 26 Biodisel
Class Project MBA 26 BiodiselClass Project MBA 26 Biodisel
Class Project MBA 26 Biodisel
 
Class Project MBA 26 Palm Oil Industry
Class Project MBA 26 Palm Oil IndustryClass Project MBA 26 Palm Oil Industry
Class Project MBA 26 Palm Oil Industry
 
Class Project 29: Me
Class Project 29: MeClass Project 29: Me
Class Project 29: Me
 

Ähnlich wie Cloudcon East Presentation

[Roblek] Distributed computing in practice
[Roblek] Distributed computing in practice[Roblek] Distributed computing in practice
[Roblek] Distributed computing in practicejavablend
 
Clustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And AvailabilityClustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And AvailabilityConSanFrancisco123
 
Drupal Con My Sql Ha 2008 08 29
Drupal Con My Sql Ha 2008 08 29Drupal Con My Sql Ha 2008 08 29
Drupal Con My Sql Ha 2008 08 29liufabin 66688
 
J2EE Batch Processing
J2EE Batch ProcessingJ2EE Batch Processing
J2EE Batch ProcessingChris Adkin
 
Squeak DBX
Squeak DBXSqueak DBX
Squeak DBXESUG
 
Sql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffySql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffyAnuradha
 
Sql 2012 always on
Sql 2012 always onSql 2012 always on
Sql 2012 always ondilip nayak
 
Db As Behaving Badly... Worst Practices For Database Administrators Rod Colledge
Db As Behaving Badly... Worst Practices For Database Administrators Rod ColledgeDb As Behaving Badly... Worst Practices For Database Administrators Rod Colledge
Db As Behaving Badly... Worst Practices For Database Administrators Rod Colledgesqlserver.co.il
 
OOW09 Ebs Tuning Final
OOW09 Ebs Tuning FinalOOW09 Ebs Tuning Final
OOW09 Ebs Tuning Finaljucaab
 
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?Miguel Araújo
 
2012 replication
2012 replication2012 replication
2012 replicationsqlhjalp
 
Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)
Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)
Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)mezis
 
download it from here
download it from heredownload it from here
download it from herewebhostingguy
 
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...LarryZaman
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databasesjbellis
 

Ähnlich wie Cloudcon East Presentation (20)

[Roblek] Distributed computing in practice
[Roblek] Distributed computing in practice[Roblek] Distributed computing in practice
[Roblek] Distributed computing in practice
 
The RDBMS You Should Be Using
The RDBMS You Should Be UsingThe RDBMS You Should Be Using
The RDBMS You Should Be Using
 
Clustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And AvailabilityClustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And Availability
 
Drupal Con My Sql Ha 2008 08 29
Drupal Con My Sql Ha 2008 08 29Drupal Con My Sql Ha 2008 08 29
Drupal Con My Sql Ha 2008 08 29
 
J2EE Batch Processing
J2EE Batch ProcessingJ2EE Batch Processing
J2EE Batch Processing
 
Clustering van IT-componenten
Clustering van IT-componentenClustering van IT-componenten
Clustering van IT-componenten
 
Squeak DBX
Squeak DBXSqueak DBX
Squeak DBX
 
Sql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffySql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffy
 
MySQL高可用
MySQL高可用MySQL高可用
MySQL高可用
 
Sql 2012 always on
Sql 2012 always onSql 2012 always on
Sql 2012 always on
 
Db As Behaving Badly... Worst Practices For Database Administrators Rod Colledge
Db As Behaving Badly... Worst Practices For Database Administrators Rod ColledgeDb As Behaving Badly... Worst Practices For Database Administrators Rod Colledge
Db As Behaving Badly... Worst Practices For Database Administrators Rod Colledge
 
OOW09 Ebs Tuning Final
OOW09 Ebs Tuning FinalOOW09 Ebs Tuning Final
OOW09 Ebs Tuning Final
 
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
 
cosbench-openstack.pdf
cosbench-openstack.pdfcosbench-openstack.pdf
cosbench-openstack.pdf
 
2012 replication
2012 replication2012 replication
2012 replication
 
Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)
Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)
Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)
 
11g R2
11g R211g R2
11g R2
 
download it from here
download it from heredownload it from here
download it from here
 
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 

Kürzlich hochgeladen

Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Kürzlich hochgeladen (20)

Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Cloudcon East Presentation

  • 2. The Problem • Hundreds of millions of products • Millions of new products every week • Accelerating growth
  • 3. Scaling up is no longer an option.
  • 4. Requirements ➡ OLTP optimized ➡ Constant response time is more important than low-latency ➡ Expand capacity online ➡ Ability to move records ➡ No re-sharding
  • 5. 2 years ago Clustered Non-relational Unstructured Oracle RAC MS SQL Server Hadoop MySQL Cluster MogileFS Teradata (OLAP) S3
  • 6. Today Clustered Non-relational Unstructured Oracle RAC MS SQL Server Hypertable Hadoop MySQL Cluster HBase MogileFS DB2 CouchDB S3 Teradata (OLAP) SimpleDB HiveDB
  • 7. System Storage Interface Partitioning Expansion Node Types Maturity No Oracle RAC Shared SQL Transparent Identical 7 years downtime Memory Requires MySQL Cluster SQL Transparent Mixed 3 years / Local Restart ? No Hypertable Local HQL Transparent Mixed (released downtime 2/08) Degraded 3 years DB2 Local SQL Fixed Hash Performan Identical (25 years ce total) Key-based No 18 months HiveDB Local SQL Mixed Directory downtime (+13 years!)
  • 12. Build atop solid, easy to manage technologies.
  • 13. So we can get back to selling t-shirts.
  • 18. The same for JAVA
  • 19.
  • 20. The simplest thing ➡ HiveDB is a JDBC Gatekeeper for applications. ➡ Lightest integration ➡ Smallest possible implementation
  • 21. The eBay approach ➡ Pro: Internally consistent data ➡ Con: Re-sharding of data required ➡ Con: Re-sharding is a DBA operation
  • 22. Partition by key A simple bucketed partitioning scheme expressed in code.
  • 24. Directory ➡ No broadcasting ➡ No re-partitioning ➡ Easy to relocate records ➡ Easy to add capacity
  • 25.
  • 26. Disadvantages ➡ Intelligence moved from the database to the application (Queries have to be planned and indexed) ➡ Can’t join across partitions ➡ NO OLAP! (We consider this a benefit.)
  • 28. Why did we build this thing again?
  • 29. Oh wait, you have to tell it where things are.
  • 30.
  • 31. Benefits of Shards ➡ Unified data access layer ➡ Result set aggregation across partitions ➡ Everyone in the JAVA-world knows Hibernate.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38. Working on simplifying it even further.
  • 39.
  • 40. But does it go?
  • 41. Case Study: CafePress ➡ Leader in User-generated Commerce ➡ More products than eBay (>250,000,000) ➡ 24/7/365
  • 42. Case Study: Performance Requirements ➡ Thousands of queries/sec ➡ 10 : 1 read/write ratio ➡ Geographically distributed
  • 43. Case Study: Test Environment ➡ Real schema ➡ Production-class hardware ➡ Partial data (~40M records)
  • 44. CafePress HiveDB 2007 Performance Test Environment JMeter (1 thread) JMeter (no threads) command & client.jar client.jar control Measurement Test Controller Workstation Workstation 100MBit switch load generators JMeter (100s of threads) JMeter (100s of threads) client.jar client.jar 48GB backplane non- Dell 2950 / 2x2 Xeon blocking gigabit switch Dell 2950 / 2x2 Xeon 16GB, 6x72GB 15k 16GB, 6x72GB 15k web service Hardware LB (hivedb) Dell 1950 / 2x2 Xeon Dell 1950 / 2x2 Xeon Dell 1950 / 2x2 Xeon Tomcat 5.5 Tomcat 5.5 Tomcat 5.5 Directory Partition 0 Partition 1 databases (mysql) Dell 2950 / 2x2 Xeon Dell 2950 / 2x2 Xeon Dell 2950 / 2x2 Xeon 16GB, 6x72GB 15k 16GB, 6x72GB 15k 16GB, 6x72GB 15k jmccarthy@cafepress.com Modified on April 09 2007
  • 45. Case Study: Performance Goals ➡ Large object reads: 1500/s ➡ Large object writes: 200/s ➡ Response time: 100ms
  • 46. Case Study: Performance Results ➡ Large object reads: 1500/s Actual result: 2250/s ➡ Large object writes: 200/s Actual result: 300/s ➡ Response time: 100ms Actual result: 8ms
  • 47. Case Study: Performance Results ➡ Max read throughput Actual result: 4100/s (CPU limited in Java layer; MySQL <25% utilized)
  • 48. Case Study: Results ➡ Billions of queries served ➡ Highest DB uptime at CafePress ➡ Hundreds of millions of updates performed
  • 50. High Availability ➡ We don’t specify a fail over strategy. ➡ We don’t specify a backup or replication strategy.
  • 51. Fail Over Options ➡ Load balancer ➡ MySQL Proxy ➡ Flipper / MMM ➡ HAProxy
  • 52. Replication ➡ You should use MySQL replication. ➡ It really works. ➡ You can add in LVM snapshots for even more disastrous disaster recovery.
  • 53. The Bottleneck There is one problem with HiveDB. The directory is a scaling bottleneck.
  • 54. There’s no reason it has to be a database.
  • 55. MemcacheD ➡ Hive directory implemented atop memcached
  • 56. Roadmap ➡ MemcacheD directory ➡ Simplified Hibernate support ➡ Management web service ➡ Command-line tools ➡ Framework Adapters
  • 57. Call for developers! ➡ Grails (GORM) ➡ ActiveRecord (JRuby on Rails) ➡ Ambition (JRuby or web service) ➡ iBatis ➡ JPA ➡ ... Postgres?
  • 58. Contributing ➡ Post to the mailing list http://groups.google.com/group/hivedb-dev ➡ Comment on our site http://www.hivedb.org ➡ Submit a patch / pull request git clone git://github.com/britt/hivedb.git
  • 59. Photo Credits ➡ http://www.flickr.com/photos/7362313@N07/1240245941/ sizes/o ➡ http://www.flickr.com/photos/ 99287245@N00/2229322675/sizes/o ➡ http://www.flickr.com/photos/ 51035555243@N01/26006138/ ➡ http://www.flickr.com/photos/vgm8383/2176897085/ sizes/o/ ➡ http://t-shirts.cafepress.com/item/money-organic-cotton- tee/73439294
  • 60. Blobject ➡ Gets around the problem of ALTER statements. No data set of this size can be transformed synchronously. ➡ The hive can contain multiple versions of a serialized record. ➡ Compression
  • 61. Hadoop! ➡ Query and transform blobbed data asynchronously.