SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Downloaden Sie, um offline zu lesen
Scaling with HiveDB
The Problem
• Hundreds of
  millions of
  products

• Millions of new
  products every
  week

• Accelerating
  growth
Scaling up is no
longer an option.
Requirements
➡   OLTP optimized
➡   Constant response time is more
    important than low-latency
➡   Expand capacity online
➡   Ability to move records
➡   No re-sharding
2 years ago
   Clustered      Non-relational   Unstructured



   Oracle RAC
 MS SQL Server                      Hadoop
 MySQL Cluster                      MogileFS
Teradata (OLAP)                       S3
Today
   Clustered      Non-relational   Unstructured



   Oracle RAC
 MS SQL Server     Hypertable
                                    Hadoop
 MySQL Cluster       HBase
                                    MogileFS
      DB2           CouchDB
                                      S3
Teradata (OLAP)    SimpleDB
    HiveDB
System      Storage   Interface   Partitioning    Expansion   Node Types     Maturity


                                                        No
 Oracle RAC     Shared      SQL       Transparent                 Identical      7 years
                                                     downtime

                Memory                               Requires
MySQL Cluster               SQL       Transparent                  Mixed         3 years
                / Local                               Restart

                                                                                    ?
                                                        No
 Hypertable      Local     HQL        Transparent                  Mixed        (released
                                                     downtime
                                                                                  2/08)
                                                     Degraded                    3 years
    DB2          Local      SQL       Fixed Hash     Performan    Identical     (25 years
                                                         ce                       total)
                                      Key-based    No                           18 months
   HiveDB        Local      SQL                                    Mixed
                                      Directory downtime                       (+13 years!)
Operating a
distributed system
is hard.
Non-functional
concerns quickly
dominate functional
concerns.
Replication
Fail over
Monitoring
Our approach:
minimize cleverness
Build atop solid,
easy to manage
technologies.
So we can
get back
to selling
t-shirts.
MySQL is fast.
MySQL is well
understood.
MySQL replication
works.
MySQL is free.
The same for JAVA
The simplest thing
➡   HiveDB is a JDBC Gatekeeper for
    applications.
➡   Lightest integration
➡   Smallest possible implementation
The eBay approach
➡   Pro: Internally consistent data
➡   Con: Re-sharding of data required
➡   Con: Re-sharding is a DBA
    operation
Partition by key
A simple bucketed partitioning scheme
expressed in code.
Partition by key
Directory
➡   No broadcasting
➡   No re-partitioning
➡   Easy to relocate records
➡   Easy to add capacity
Disadvantages
➡   Intelligence moved from the
    database to the application
    (Queries have to be planned and
    indexed)
➡   Can’t join across partitions
➡   NO OLAP! (We consider this a
    benefit.)
Hibernate Shards
Partitioned Hibernate from Google
Why did we build
this thing again?
Oh wait, you have
to tell it where
things are.
Benefits of Shards
➡   Unified data access layer
➡   Result set aggregation across
    partitions
➡   Everyone in the JAVA-world knows
    Hibernate.
Working on
simplifying it even
further.
But does it go?
Case Study:
CafePress
➡   Leader in User-generated
    Commerce
➡   More products than eBay
    (>250,000,000)
➡   24/7/365
Case Study:
Performance Requirements
➡   Thousands of queries/sec
➡   10 : 1 read/write ratio
➡   Geographically distributed
Case Study:
Test Environment
➡   Real schema
➡   Production-class hardware
➡   Partial data (~40M records)
CafePress HiveDB 2007
Performance Test Environment


                             JMeter (1 thread)                                       JMeter (no threads)
 command &



                                 client.jar                                                client.jar
   control




                              Measurement                                              Test Controller
                               Workstation                                              Workstation
                                                      100MBit switch
    load generators




                       JMeter (100s of threads)                                     JMeter (100s of threads)
                                 client.jar                                                 client.jar

                                                    48GB backplane non-
                        Dell 2950 / 2x2 Xeon        blocking gigabit switch          Dell 2950 / 2x2 Xeon
                        16GB, 6x72GB 15k                                             16GB, 6x72GB 15k
 web service




                                                         Hardware LB
  (hivedb)




                        Dell 1950 / 2x2 Xeon        Dell 1950 / 2x2 Xeon            Dell 1950 / 2x2 Xeon
                             Tomcat 5.5                  Tomcat 5.5                      Tomcat 5.5



                                   Directory              Partition 0              Partition 1
 databases
  (mysql)




                             Dell 2950 / 2x2 Xeon   Dell 2950 / 2x2 Xeon      Dell 2950 / 2x2 Xeon
                             16GB, 6x72GB 15k       16GB, 6x72GB 15k          16GB, 6x72GB 15k

   jmccarthy@cafepress.com                                                                               Modified on April 09 2007
Case Study:
Performance Goals
➡   Large object reads: 1500/s


➡   Large object writes: 200/s


➡   Response time: 100ms
Case Study:
Performance Results
➡   Large object reads: 1500/s
    Actual result: 2250/s
➡   Large object writes: 200/s
    Actual result: 300/s
➡   Response time: 100ms
    Actual result: 8ms
Case Study:
Performance Results
➡   Max read throughput
    Actual result: 4100/s
    (CPU limited in Java layer;
    MySQL <25% utilized)
Case Study:
Results
➡   Billions of queries served
➡   Highest DB uptime at CafePress
➡   Hundreds of millions of updates
    performed
Show don’t tell!
High Availability
➡   We don’t specify a fail over
    strategy.
➡   We don’t specify a backup or
    replication strategy.
Fail Over Options
➡   Load balancer
➡   MySQL Proxy
➡   Flipper / MMM
➡   HAProxy
Replication
➡   You should use MySQL replication.
➡   It really works.
➡   You can add in LVM snapshots for
    even more disastrous disaster
    recovery.
The Bottleneck
There is one problem with HiveDB. The
directory is a scaling bottleneck.
There’s no reason
it has to be a
database.
MemcacheD
➡   Hive directory implemented atop
    memcached
Roadmap
➡   MemcacheD directory
➡   Simplified Hibernate support
➡   Management web service
➡   Command-line tools
➡   Framework Adapters
Call for developers!
➡   Grails (GORM)
➡   ActiveRecord (JRuby on Rails)
➡   Ambition (JRuby or web service)
➡   iBatis
➡   JPA
➡   ... Postgres?
Contributing
➡   Post to the mailing list
    http://groups.google.com/group/hivedb-dev

➡   Comment on our site
    http://www.hivedb.org

➡   Submit a patch / pull request
    git clone git://github.com/britt/hivedb.git
Photo Credits
➡   http://www.flickr.com/photos/7362313@N07/1240245941/
    sizes/o

➡   http://www.flickr.com/photos/
    99287245@N00/2229322675/sizes/o

➡   http://www.flickr.com/photos/
    51035555243@N01/26006138/

➡   http://www.flickr.com/photos/vgm8383/2176897085/
    sizes/o/

➡   http://t-shirts.cafepress.com/item/money-organic-cotton-
    tee/73439294
Blobject
➡   Gets around the problem of ALTER
    statements. No data set of this size
    can be transformed synchronously.
➡   The hive can contain multiple
    versions of a serialized record.
➡   Compression
Hadoop!
➡   Query and transform blobbed data
    asynchronously.

Weitere ähnliche Inhalte

Was ist angesagt?

Non-Relational Postgres
Non-Relational PostgresNon-Relational Postgres
Non-Relational PostgresEDB
 
High-Performance JDBC Voxxed Bucharest 2016
High-Performance JDBC Voxxed Bucharest 2016High-Performance JDBC Voxxed Bucharest 2016
High-Performance JDBC Voxxed Bucharest 2016Vlad Mihalcea
 
Parallel Query on Exadata
Parallel Query on ExadataParallel Query on Exadata
Parallel Query on ExadataEnkitec
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...DataStax
 
Siebel Enterprise Data Quality for Siebel
Siebel Enterprise Data Quality for SiebelSiebel Enterprise Data Quality for Siebel
Siebel Enterprise Data Quality for SiebelJeroen Burgers
 
BUG - BEA Users\' Group, Jan16 2003
BUG - BEA Users\' Group, Jan16 2003BUG - BEA Users\' Group, Jan16 2003
BUG - BEA Users\' Group, Jan16 2003Sanjeev Kumar
 
J2EE Performance And Scalability Bp
J2EE Performance And Scalability BpJ2EE Performance And Scalability Bp
J2EE Performance And Scalability BpChris Adkin
 
Scaling Twitter 12758
Scaling Twitter 12758Scaling Twitter 12758
Scaling Twitter 12758davidblum
 
The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012Lucas Jellema
 
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John Naylor
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John NaylorPGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John Naylor
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John NaylorEqunix Business Solutions
 
Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, AnalyticsReal-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, AnalyticsContinuent
 
Online Upgrade Using Logical Replication.
Online Upgrade Using Logical Replication.Online Upgrade Using Logical Replication.
Online Upgrade Using Logical Replication.EDB
 
New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...
New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...
New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...Continuent
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBMydbops
 
Evolutionary Database Design
Evolutionary Database DesignEvolutionary Database Design
Evolutionary Database DesignAndrei Solntsev
 
Tungsten University: Configure & Provision Tungsten Clusters
Tungsten University: Configure & Provision Tungsten ClustersTungsten University: Configure & Provision Tungsten Clusters
Tungsten University: Configure & Provision Tungsten ClustersContinuent
 
Dbvisit replicate: logical replication made easy
Dbvisit replicate: logical replication made easyDbvisit replicate: logical replication made easy
Dbvisit replicate: logical replication made easyFranck Pachot
 
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...Stephan H. Wissel
 

Was ist angesagt? (19)

Non-Relational Postgres
Non-Relational PostgresNon-Relational Postgres
Non-Relational Postgres
 
High-Performance JDBC Voxxed Bucharest 2016
High-Performance JDBC Voxxed Bucharest 2016High-Performance JDBC Voxxed Bucharest 2016
High-Performance JDBC Voxxed Bucharest 2016
 
Parallel Query on Exadata
Parallel Query on ExadataParallel Query on Exadata
Parallel Query on Exadata
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
 
Siebel Enterprise Data Quality for Siebel
Siebel Enterprise Data Quality for SiebelSiebel Enterprise Data Quality for Siebel
Siebel Enterprise Data Quality for Siebel
 
Get to know PostgreSQL!
Get to know PostgreSQL!Get to know PostgreSQL!
Get to know PostgreSQL!
 
BUG - BEA Users\' Group, Jan16 2003
BUG - BEA Users\' Group, Jan16 2003BUG - BEA Users\' Group, Jan16 2003
BUG - BEA Users\' Group, Jan16 2003
 
J2EE Performance And Scalability Bp
J2EE Performance And Scalability BpJ2EE Performance And Scalability Bp
J2EE Performance And Scalability Bp
 
Scaling Twitter 12758
Scaling Twitter 12758Scaling Twitter 12758
Scaling Twitter 12758
 
The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012
 
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John Naylor
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John NaylorPGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John Naylor
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John Naylor
 
Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, AnalyticsReal-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
 
Online Upgrade Using Logical Replication.
Online Upgrade Using Logical Replication.Online Upgrade Using Logical Replication.
Online Upgrade Using Logical Replication.
 
New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...
New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...
New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDB
 
Evolutionary Database Design
Evolutionary Database DesignEvolutionary Database Design
Evolutionary Database Design
 
Tungsten University: Configure & Provision Tungsten Clusters
Tungsten University: Configure & Provision Tungsten ClustersTungsten University: Configure & Provision Tungsten Clusters
Tungsten University: Configure & Provision Tungsten Clusters
 
Dbvisit replicate: logical replication made easy
Dbvisit replicate: logical replication made easyDbvisit replicate: logical replication made easy
Dbvisit replicate: logical replication made easy
 
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...
 

Andere mochten auch

Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentationbr7tt
 
Presentation1class Project Scm
Presentation1class Project ScmPresentation1class Project Scm
Presentation1class Project ScmLenny Wijaya
 
Whitepaper, lynx secure rootkit detection & protection by means of secure vir...
Whitepaper, lynx secure rootkit detection & protection by means of secure vir...Whitepaper, lynx secure rootkit detection & protection by means of secure vir...
Whitepaper, lynx secure rootkit detection & protection by means of secure vir...Avishai Ziv
 
Slidesare Class Project Revisi
Slidesare Class Project RevisiSlidesare Class Project Revisi
Slidesare Class Project RevisiLenny Wijaya
 
Jawaban Cp Mba33 Dmaic
Jawaban Cp Mba33 DmaicJawaban Cp Mba33 Dmaic
Jawaban Cp Mba33 DmaicLenny Wijaya
 
Class Project MBA 26 Biodisel
Class Project MBA 26 BiodiselClass Project MBA 26 Biodisel
Class Project MBA 26 BiodiselLenny Wijaya
 
Class Project MBA 26 Palm Oil Industry
Class Project MBA 26 Palm Oil IndustryClass Project MBA 26 Palm Oil Industry
Class Project MBA 26 Palm Oil IndustryLenny Wijaya
 
Class Project 29: Me
Class Project 29: MeClass Project 29: Me
Class Project 29: MeLenny Wijaya
 

Andere mochten auch (9)

Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentation
 
Presentation1class Project Scm
Presentation1class Project ScmPresentation1class Project Scm
Presentation1class Project Scm
 
Whitepaper, lynx secure rootkit detection & protection by means of secure vir...
Whitepaper, lynx secure rootkit detection & protection by means of secure vir...Whitepaper, lynx secure rootkit detection & protection by means of secure vir...
Whitepaper, lynx secure rootkit detection & protection by means of secure vir...
 
Slidesare Class Project Revisi
Slidesare Class Project RevisiSlidesare Class Project Revisi
Slidesare Class Project Revisi
 
Tugas Pa 2 B
Tugas Pa 2 BTugas Pa 2 B
Tugas Pa 2 B
 
Jawaban Cp Mba33 Dmaic
Jawaban Cp Mba33 DmaicJawaban Cp Mba33 Dmaic
Jawaban Cp Mba33 Dmaic
 
Class Project MBA 26 Biodisel
Class Project MBA 26 BiodiselClass Project MBA 26 Biodisel
Class Project MBA 26 Biodisel
 
Class Project MBA 26 Palm Oil Industry
Class Project MBA 26 Palm Oil IndustryClass Project MBA 26 Palm Oil Industry
Class Project MBA 26 Palm Oil Industry
 
Class Project 29: Me
Class Project 29: MeClass Project 29: Me
Class Project 29: Me
 

Ähnlich wie Cloudcon East Presentation

[Roblek] Distributed computing in practice
[Roblek] Distributed computing in practice[Roblek] Distributed computing in practice
[Roblek] Distributed computing in practicejavablend
 
Clustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And AvailabilityClustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And AvailabilityConSanFrancisco123
 
Drupal Con My Sql Ha 2008 08 29
Drupal Con My Sql Ha 2008 08 29Drupal Con My Sql Ha 2008 08 29
Drupal Con My Sql Ha 2008 08 29liufabin 66688
 
J2EE Batch Processing
J2EE Batch ProcessingJ2EE Batch Processing
J2EE Batch ProcessingChris Adkin
 
Squeak DBX
Squeak DBXSqueak DBX
Squeak DBXESUG
 
Sql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffySql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffyAnuradha
 
Sql 2012 always on
Sql 2012 always onSql 2012 always on
Sql 2012 always ondilip nayak
 
Db As Behaving Badly... Worst Practices For Database Administrators Rod Colledge
Db As Behaving Badly... Worst Practices For Database Administrators Rod ColledgeDb As Behaving Badly... Worst Practices For Database Administrators Rod Colledge
Db As Behaving Badly... Worst Practices For Database Administrators Rod Colledgesqlserver.co.il
 
OOW09 Ebs Tuning Final
OOW09 Ebs Tuning FinalOOW09 Ebs Tuning Final
OOW09 Ebs Tuning Finaljucaab
 
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?Miguel Araújo
 
2012 replication
2012 replication2012 replication
2012 replicationsqlhjalp
 
Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)
Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)
Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)mezis
 
download it from here
download it from heredownload it from here
download it from herewebhostingguy
 
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...LarryZaman
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databasesjbellis
 

Ähnlich wie Cloudcon East Presentation (20)

[Roblek] Distributed computing in practice
[Roblek] Distributed computing in practice[Roblek] Distributed computing in practice
[Roblek] Distributed computing in practice
 
The RDBMS You Should Be Using
The RDBMS You Should Be UsingThe RDBMS You Should Be Using
The RDBMS You Should Be Using
 
Clustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And AvailabilityClustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And Availability
 
Drupal Con My Sql Ha 2008 08 29
Drupal Con My Sql Ha 2008 08 29Drupal Con My Sql Ha 2008 08 29
Drupal Con My Sql Ha 2008 08 29
 
J2EE Batch Processing
J2EE Batch ProcessingJ2EE Batch Processing
J2EE Batch Processing
 
Clustering van IT-componenten
Clustering van IT-componentenClustering van IT-componenten
Clustering van IT-componenten
 
Squeak DBX
Squeak DBXSqueak DBX
Squeak DBX
 
Sql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffySql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffy
 
MySQL高可用
MySQL高可用MySQL高可用
MySQL高可用
 
Sql 2012 always on
Sql 2012 always onSql 2012 always on
Sql 2012 always on
 
Db As Behaving Badly... Worst Practices For Database Administrators Rod Colledge
Db As Behaving Badly... Worst Practices For Database Administrators Rod ColledgeDb As Behaving Badly... Worst Practices For Database Administrators Rod Colledge
Db As Behaving Badly... Worst Practices For Database Administrators Rod Colledge
 
OOW09 Ebs Tuning Final
OOW09 Ebs Tuning FinalOOW09 Ebs Tuning Final
OOW09 Ebs Tuning Final
 
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
 
cosbench-openstack.pdf
cosbench-openstack.pdfcosbench-openstack.pdf
cosbench-openstack.pdf
 
2012 replication
2012 replication2012 replication
2012 replication
 
Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)
Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)
Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)
 
11g R2
11g R211g R2
11g R2
 
download it from here
download it from heredownload it from here
download it from here
 
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Kürzlich hochgeladen (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Cloudcon East Presentation

  • 2. The Problem • Hundreds of millions of products • Millions of new products every week • Accelerating growth
  • 3. Scaling up is no longer an option.
  • 4. Requirements ➡ OLTP optimized ➡ Constant response time is more important than low-latency ➡ Expand capacity online ➡ Ability to move records ➡ No re-sharding
  • 5. 2 years ago Clustered Non-relational Unstructured Oracle RAC MS SQL Server Hadoop MySQL Cluster MogileFS Teradata (OLAP) S3
  • 6. Today Clustered Non-relational Unstructured Oracle RAC MS SQL Server Hypertable Hadoop MySQL Cluster HBase MogileFS DB2 CouchDB S3 Teradata (OLAP) SimpleDB HiveDB
  • 7. System Storage Interface Partitioning Expansion Node Types Maturity No Oracle RAC Shared SQL Transparent Identical 7 years downtime Memory Requires MySQL Cluster SQL Transparent Mixed 3 years / Local Restart ? No Hypertable Local HQL Transparent Mixed (released downtime 2/08) Degraded 3 years DB2 Local SQL Fixed Hash Performan Identical (25 years ce total) Key-based No 18 months HiveDB Local SQL Mixed Directory downtime (+13 years!)
  • 12. Build atop solid, easy to manage technologies.
  • 13. So we can get back to selling t-shirts.
  • 18. The same for JAVA
  • 19.
  • 20. The simplest thing ➡ HiveDB is a JDBC Gatekeeper for applications. ➡ Lightest integration ➡ Smallest possible implementation
  • 21. The eBay approach ➡ Pro: Internally consistent data ➡ Con: Re-sharding of data required ➡ Con: Re-sharding is a DBA operation
  • 22. Partition by key A simple bucketed partitioning scheme expressed in code.
  • 24. Directory ➡ No broadcasting ➡ No re-partitioning ➡ Easy to relocate records ➡ Easy to add capacity
  • 25.
  • 26. Disadvantages ➡ Intelligence moved from the database to the application (Queries have to be planned and indexed) ➡ Can’t join across partitions ➡ NO OLAP! (We consider this a benefit.)
  • 28. Why did we build this thing again?
  • 29. Oh wait, you have to tell it where things are.
  • 30.
  • 31. Benefits of Shards ➡ Unified data access layer ➡ Result set aggregation across partitions ➡ Everyone in the JAVA-world knows Hibernate.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38. Working on simplifying it even further.
  • 39.
  • 40. But does it go?
  • 41. Case Study: CafePress ➡ Leader in User-generated Commerce ➡ More products than eBay (>250,000,000) ➡ 24/7/365
  • 42. Case Study: Performance Requirements ➡ Thousands of queries/sec ➡ 10 : 1 read/write ratio ➡ Geographically distributed
  • 43. Case Study: Test Environment ➡ Real schema ➡ Production-class hardware ➡ Partial data (~40M records)
  • 44. CafePress HiveDB 2007 Performance Test Environment JMeter (1 thread) JMeter (no threads) command & client.jar client.jar control Measurement Test Controller Workstation Workstation 100MBit switch load generators JMeter (100s of threads) JMeter (100s of threads) client.jar client.jar 48GB backplane non- Dell 2950 / 2x2 Xeon blocking gigabit switch Dell 2950 / 2x2 Xeon 16GB, 6x72GB 15k 16GB, 6x72GB 15k web service Hardware LB (hivedb) Dell 1950 / 2x2 Xeon Dell 1950 / 2x2 Xeon Dell 1950 / 2x2 Xeon Tomcat 5.5 Tomcat 5.5 Tomcat 5.5 Directory Partition 0 Partition 1 databases (mysql) Dell 2950 / 2x2 Xeon Dell 2950 / 2x2 Xeon Dell 2950 / 2x2 Xeon 16GB, 6x72GB 15k 16GB, 6x72GB 15k 16GB, 6x72GB 15k jmccarthy@cafepress.com Modified on April 09 2007
  • 45. Case Study: Performance Goals ➡ Large object reads: 1500/s ➡ Large object writes: 200/s ➡ Response time: 100ms
  • 46. Case Study: Performance Results ➡ Large object reads: 1500/s Actual result: 2250/s ➡ Large object writes: 200/s Actual result: 300/s ➡ Response time: 100ms Actual result: 8ms
  • 47. Case Study: Performance Results ➡ Max read throughput Actual result: 4100/s (CPU limited in Java layer; MySQL <25% utilized)
  • 48. Case Study: Results ➡ Billions of queries served ➡ Highest DB uptime at CafePress ➡ Hundreds of millions of updates performed
  • 50. High Availability ➡ We don’t specify a fail over strategy. ➡ We don’t specify a backup or replication strategy.
  • 51. Fail Over Options ➡ Load balancer ➡ MySQL Proxy ➡ Flipper / MMM ➡ HAProxy
  • 52. Replication ➡ You should use MySQL replication. ➡ It really works. ➡ You can add in LVM snapshots for even more disastrous disaster recovery.
  • 53. The Bottleneck There is one problem with HiveDB. The directory is a scaling bottleneck.
  • 54. There’s no reason it has to be a database.
  • 55. MemcacheD ➡ Hive directory implemented atop memcached
  • 56. Roadmap ➡ MemcacheD directory ➡ Simplified Hibernate support ➡ Management web service ➡ Command-line tools ➡ Framework Adapters
  • 57. Call for developers! ➡ Grails (GORM) ➡ ActiveRecord (JRuby on Rails) ➡ Ambition (JRuby or web service) ➡ iBatis ➡ JPA ➡ ... Postgres?
  • 58. Contributing ➡ Post to the mailing list http://groups.google.com/group/hivedb-dev ➡ Comment on our site http://www.hivedb.org ➡ Submit a patch / pull request git clone git://github.com/britt/hivedb.git
  • 59. Photo Credits ➡ http://www.flickr.com/photos/7362313@N07/1240245941/ sizes/o ➡ http://www.flickr.com/photos/ 99287245@N00/2229322675/sizes/o ➡ http://www.flickr.com/photos/ 51035555243@N01/26006138/ ➡ http://www.flickr.com/photos/vgm8383/2176897085/ sizes/o/ ➡ http://t-shirts.cafepress.com/item/money-organic-cotton- tee/73439294
  • 60. Blobject ➡ Gets around the problem of ALTER statements. No data set of this size can be transformed synchronously. ➡ The hive can contain multiple versions of a serialized record. ➡ Compression
  • 61. Hadoop! ➡ Query and transform blobbed data asynchronously.