SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Downloaden Sie, um offline zu lesen
Accelerating NoSQL
   Running Voldemort on HailDB


         Sunny Gleason
         March 11, 2011
whoami
• Sunny Gleason, human
• passion: distributed systems engineering
• previous...
   Ning : custom social networks
   Amazon.com : infra & web services
• now...
   building cloud infrastructure
whereami

• twitter : twitter.com/sunnygleason
• github : github.com/sunnygleason
• linkedin : linkedin.com/in/sunnygleason
what’s in this presentation?

  • NoSQL Roundup
  • Voldemort who?
  • HailDB wha?
  • Results & Next Steps
  • Special Bonus Material
NoSQL

• “Not Only” SQL
• What’s the point?
• Proponent: “reaching next level of scale”
• Cynic: “cloud is hype, ops nightmare”
what does it gain?

• Higher performance, scalability, availability
• More robust fault-tolerance
• Simplified systems design
• Easier operations
what does it lose?
• Reduced / simplified programming model
• No ad-hoc queries, no joins, no txns
• Not ACID: Atomicity / Consistency /
  Isolation / Durability
• Operations / management is still evolving
• Challenging to quantify health of system
• Fewer domain experts
NoSQL Map
                                KV Stores
                                (volatile)           Memcached,
                                                       Redis




                                KV Stores             Dynamo,
        Key-Value               (durable)            Voldemort,
          Store
                                                        Riak


                     Document
                       Store
NoSQL                                    CouchDB,
                                         MongoDB

                    Column
                     Store              Cassandra,
                                         BigTable,
                                          HBase

         Graph
                                             Neo4J
         Store
NoSQL Map
                                KV Stores
                                (volatile)




                                KV Stores      Dynamo,
        Key-Value               (durable)    Voldemort,
          Store
                                                 Riak


                     Document
                       Store
NoSQL

                    Column
                     Store



         Graph
         Store
motivation

• database on 1 box : ok
• database with master/slave replication : ok
• database on cluster : tricky
• database on SAN : time bomb
performance
                                      + Sharding
Complexity




                               +FusionIO

                      +SSD

             MySQL       Voldemort                 +Cluster


                               Memcached


                1K       10K           100K             1M

                     Aggregate Operations / Sec
dynamo case study
• Amazon : high read throughput, always-
  accessible writes
• Shopping cart application
• ‘Glitches’ ok, duplicate or missing item
• Data loss or unavailability is unacceptable
• Solution: K-V schema plus smart routing &
  data placement
key-value storage
• Essentially, a gigantic hash table
• Typically assign byte[] values to byte[] keys
• Plus versioning mixed in to handle failures
  and conflicts
• Yes, you *can* do range partitioning; in
  practice, avoid it because of hot spots
k-v: durable vs. volatile

• RAM is ridiculous speed (ns), not durable
• Disk is persistent and slow (3-7ms)
• RAID eases the pain a bit (4-8x throughput)
• SSD is providing good promise (100-300us)
• FusionIO is redefining the space (30-100us)
dynamo clones

• Voldemort : from LinkedIn, dynamo
  implementation in Java (default: BDB-JE)
• Riak : from Basho, dynamo implementation
  in Erlang (default: embedded InnoDB)
Voldemort

• Developed at LinkedIn
• Scalable Key-Value Storage
• Based on Amazon Dynamo model
• High Read Throughput
• Always Writable
Voldemort features

• Consistent Hashing
• Quorum settings : R, W, N
• Auto-sharding & rebalancing
• Pluggable storage engines
Consistent Hashing
         * Arrange keys around ring


           * Compute token in ring
              using hash function


       * Determine nodes responsible
           for token using live set
R/W/N
• N : maximum number of nodes to query
  for an operation
• R : read quorum
• W : write quorum
• Can adjust ‘quorum’ to balance throughput
  and fault-tolerance
setting up Voldemort 1
Step 1: Download the code
Download either a recent stable release or, for those who like to live more
dangerously, the up-to-the-minute build from the build server.

Step 2: Start single node cluster
	

   > bin/voldemort-server.sh config/single_node_cluster > /tmp/voldemort.log &

Step 3: Start commandline test client and do some operations
	

   > bin/voldemort-shell.sh test tcp://localhost:6666
	

   Established connection to test via tcp://localhost:6666
	

   > put "hello" "world"
	

   > get "hello"
	

   version(0:1): "world"
	

   > delete "hello"
	

   > get "hello"
	

   null
	

   > exit
	

   k k thx bye.
setting up Voldemort 2

• For a cluster, use cloud startup scripts
• Works with Amazon EC2
• See https://github.com/voldemort/
  voldemort/wiki/EC2-Testing-Infrastructure
Voldemort client libraries


• Java, Scala, Clojure
• Ruby
• Python
• C++
storage engines

• BDB-JE (Oracle Sleepycat, the original)
• Krati (LinkedIn, pretty new)
• HailDB (new!)
• MySQL (old / dated)
BDB-JE

• Log-Structured B-Tree
• Fast Storage When Mostly Cached
• Configured without fsync() by default -
  writes are batched and flushed periodically
Krati

• Fast Hash-Oriented Storage
• Uses memory-mapped files for speed
• Configured without fsync() by default -
  writes are batched and flushed periodically
HailDB
• Fork of MySQL InnoDB plugin
  (contributors : Oracle, Google, Facebook,
  Percona)
• Higher stability for large data sets
• Fast crash recovery
• External from Java heap (ease GC pain)
• apt-get install haildb (from launchpad PPA)
• Use “flush-once-per-second” mode
HailDB, Java & Voldemort
                         Voldemort Client




    Voldemort Node       Voldemort Node     Voldemort Node



     v-storage-inno

      g414-haildb

          JNA


         HailDB
    (log, buffer pool,
       tablespace)
HailDB & Java

• g414-haildb : where the magic happens
• uses JNA: Java Native Access
• dynamic binding to libhaildb shared library
• auto-generated from .h file (w/ JNAerator)
• Pointer classes & other shenanigans
HailDB schema

_key VARBINARY(200)
_version VARBINARY(200)
_value BLOB
PRIMARY KEY(_key, _version)
implementation gotchas

• InnoDB API-level usage is unclear
• Synchronization & locking is unclear
• Therefore... I learned to love reading C
• Error handling is *nasty*
• Installation a bit of a pain
experimental setup
• OS X: 8-Core Xeon, 32GB RAM, 200GB
  OWC SSD
• Faban Benchmark : PUT 64-byte key, 1024-
  byte value
• Scenarios:1, 2, 4, 8 threads
• 512M Java Heap
Perf: BDB Put 100%
Perf: BDB Put 20%/Get 80%
Perf: BDB Put 20% Detail
Perf: Krati Put 100%
Perf: Krati Put 20%/Get 80%
Perf: Krati Put 20% Detail
Perf: HailDB Put 100%
Perf: HailDB Put 20%/Get 80%
Perf: HailDB Put 20% Detail
future work

• Improve Packaging / Installation
• Schema refinements & perf enhancements
• Online backup/export with XtraBackup
• JNI Bindings
schema refinements
• Build upon Nokia work on fast k-v schema
• 8-byte ‘long’ key hash vs. full key bytes
• Smart use of secondary indexes
• Native representation of vector clocks
• Delayed / soft deletion
• Expect 40-50% performance boost
InnoDB tuning
• Skinny columns, skinny rows! (esp. Primary Key)
 • Varchar enum ‘bad’, int or smallint ‘good’
 • fixed-width rows allows in-place updates
• Use covering indexes strategically
• More data per page means faster index scans,
  more efficient buffer pool utilization
• You only get so many trx’s on given CPU/RAM
  configuration - benchmark this!
refined schema
_id BIGINT (auto increment)
_key_hash BIGINT
_key VARBINARY(200)
_version VARBINARY(200)
_value BLOB
PRIMARY KEY(_id)
KEY(_key_hash)
online backup

• hot backup of data to other machine /
  destination
• test Percona Xtrabackup with HailDB
• next step: backup/export to Hadoop/HDFS
  (similar to Cloudera Sqoop tool)
JNI bindings

• JNI can get 2-5x perf boost vs. JNA
• ... at the expense of nasty code
• Will go for schema optimizations and
  InnoDB tuning tips *first*
resources

• github.com/voldemort/voldemort
  freenode #voldemort
• github.com/sunnygleason/v-storage-haildb
  github.com/sunnygleason/v-storage-bench
  github.com/sunnygleason/g414-haildb
• jna.dev.java.net
more resources

• Amazon Dynamo
• Faban / XFaban
• HailDB
• Drizzle
• PBXT
Thank You!

Weitere ähnliche Inhalte

Was ist angesagt?

Turning object storage into vm storage
Turning object storage into vm storageTurning object storage into vm storage
Turning object storage into vm storagewim_provoost
 
Threads Needles Stacks Heaps - Java edition
Threads Needles Stacks Heaps - Java editionThreads Needles Stacks Heaps - Java edition
Threads Needles Stacks Heaps - Java editionOvidiu Dimulescu
 
Using and Benchmarking Galera in different architectures (PLUK 2012)
Using and Benchmarking Galera in different architectures (PLUK 2012)Using and Benchmarking Galera in different architectures (PLUK 2012)
Using and Benchmarking Galera in different architectures (PLUK 2012)Henrik Ingo
 
Highly Available MySQL/PHP Applications with mysqlnd
Highly Available MySQL/PHP Applications with mysqlndHighly Available MySQL/PHP Applications with mysqlnd
Highly Available MySQL/PHP Applications with mysqlndJervin Real
 
Hindsight is 20/20: MySQL to Cassandra
Hindsight is 20/20: MySQL to CassandraHindsight is 20/20: MySQL to Cassandra
Hindsight is 20/20: MySQL to CassandraMichael Kjellman
 
MariaDB Galera Cluster
MariaDB Galera ClusterMariaDB Galera Cluster
MariaDB Galera ClusterAbdul Manaf
 
Databases in the hosted cloud
Databases in the hosted cloudDatabases in the hosted cloud
Databases in the hosted cloudColin Charles
 
NewSQL vs NoSQL for New OLTP
NewSQL vs NoSQL for New OLTPNewSQL vs NoSQL for New OLTP
NewSQL vs NoSQL for New OLTPDATAVERSITY
 
A beginners guide to MariaDB
A beginners guide to MariaDBA beginners guide to MariaDB
A beginners guide to MariaDBColin Charles
 
High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011Tim Y
 
Choosing a MySQL High Availability solution - Percona Live UK 2011
Choosing a MySQL High Availability solution - Percona Live UK 2011Choosing a MySQL High Availability solution - Percona Live UK 2011
Choosing a MySQL High Availability solution - Percona Live UK 2011Henrik Ingo
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]Malin Weiss
 
Redis memcached pdf
Redis memcached pdfRedis memcached pdf
Redis memcached pdfErin O'Neill
 
MariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live LondonMariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live LondonIvan Zoratti
 
Kafka Tutorial: Kafka Security
Kafka Tutorial: Kafka SecurityKafka Tutorial: Kafka Security
Kafka Tutorial: Kafka SecurityJean-Paul Azar
 
Gear6 Webinar - MySQL Scaling with Memcached
Gear6 Webinar - MySQL Scaling with MemcachedGear6 Webinar - MySQL Scaling with Memcached
Gear6 Webinar - MySQL Scaling with MemcachedGear6
 
第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandra第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandraShun Nakamura
 
DrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilityDrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilitycherryhillco
 
What's new in Jewel and Beyond
What's new in Jewel and BeyondWhat's new in Jewel and Beyond
What's new in Jewel and BeyondSage Weil
 

Was ist angesagt? (20)

Turning object storage into vm storage
Turning object storage into vm storageTurning object storage into vm storage
Turning object storage into vm storage
 
Threads Needles Stacks Heaps - Java edition
Threads Needles Stacks Heaps - Java editionThreads Needles Stacks Heaps - Java edition
Threads Needles Stacks Heaps - Java edition
 
Using and Benchmarking Galera in different architectures (PLUK 2012)
Using and Benchmarking Galera in different architectures (PLUK 2012)Using and Benchmarking Galera in different architectures (PLUK 2012)
Using and Benchmarking Galera in different architectures (PLUK 2012)
 
Highly Available MySQL/PHP Applications with mysqlnd
Highly Available MySQL/PHP Applications with mysqlndHighly Available MySQL/PHP Applications with mysqlnd
Highly Available MySQL/PHP Applications with mysqlnd
 
Hindsight is 20/20: MySQL to Cassandra
Hindsight is 20/20: MySQL to CassandraHindsight is 20/20: MySQL to Cassandra
Hindsight is 20/20: MySQL to Cassandra
 
MariaDB Galera Cluster
MariaDB Galera ClusterMariaDB Galera Cluster
MariaDB Galera Cluster
 
Databases in the hosted cloud
Databases in the hosted cloudDatabases in the hosted cloud
Databases in the hosted cloud
 
NewSQL vs NoSQL for New OLTP
NewSQL vs NoSQL for New OLTPNewSQL vs NoSQL for New OLTP
NewSQL vs NoSQL for New OLTP
 
A beginners guide to MariaDB
A beginners guide to MariaDBA beginners guide to MariaDB
A beginners guide to MariaDB
 
High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011
 
Choosing a MySQL High Availability solution - Percona Live UK 2011
Choosing a MySQL High Availability solution - Percona Live UK 2011Choosing a MySQL High Availability solution - Percona Live UK 2011
Choosing a MySQL High Availability solution - Percona Live UK 2011
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
 
Redis memcached pdf
Redis memcached pdfRedis memcached pdf
Redis memcached pdf
 
MariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live LondonMariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live London
 
Kafka Tutorial: Kafka Security
Kafka Tutorial: Kafka SecurityKafka Tutorial: Kafka Security
Kafka Tutorial: Kafka Security
 
Gear6 Webinar - MySQL Scaling with Memcached
Gear6 Webinar - MySQL Scaling with MemcachedGear6 Webinar - MySQL Scaling with Memcached
Gear6 Webinar - MySQL Scaling with Memcached
 
Why MariaDB?
Why MariaDB?Why MariaDB?
Why MariaDB?
 
第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandra第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandra
 
DrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilityDrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalability
 
What's new in Jewel and Beyond
What's new in Jewel and BeyondWhat's new in Jewel and Beyond
What's new in Jewel and Beyond
 

Ähnlich wie Accelerating NoSQL Running Voldemort on HailDB

Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataRoger Xia
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Chris Richardson
 
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarC* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarDataStax Academy
 
Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the CloudInes Sombra
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydbDaniel Austin
 
PhegData X - High Performance EBS
PhegData X - High Performance EBSPhegData X - High Performance EBS
PhegData X - High Performance EBSHanson Dong
 
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Amazon Web Services
 
A Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLA Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLDaniel Austin
 
Clustrix Database Percona Ruby on Rails benchmark
Clustrix Database Percona Ruby on Rails benchmarkClustrix Database Percona Ruby on Rails benchmark
Clustrix Database Percona Ruby on Rails benchmarkClustrix
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataacelyc1112009
 
Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)Alexey Rybak
 
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of DutyCassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of DutyDataStax Academy
 
Adding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of TricksAdding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of Trickssiculars
 
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Introducing Venice - Strata NYC 2017
Introducing Venice - Strata NYC 2017Introducing Venice - Strata NYC 2017
Introducing Venice - Strata NYC 2017Felix GV
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesshnkr_rmchndrn
 

Ähnlich wie Accelerating NoSQL Running Voldemort on HailDB (20)

Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarC* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
 
Mongodb lab
Mongodb labMongodb lab
Mongodb lab
 
Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the Cloud
 
KeyValue Stores
KeyValue StoresKeyValue Stores
KeyValue Stores
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydb
 
PhegData X - High Performance EBS
PhegData X - High Performance EBSPhegData X - High Performance EBS
PhegData X - High Performance EBS
 
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
 
A Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLA Global In-memory Data System for MySQL
A Global In-memory Data System for MySQL
 
Clustrix Database Percona Ruby on Rails benchmark
Clustrix Database Percona Ruby on Rails benchmarkClustrix Database Percona Ruby on Rails benchmark
Clustrix Database Percona Ruby on Rails benchmark
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
 
NoSQL
NoSQLNoSQL
NoSQL
 
Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)
 
Client storage
Client storageClient storage
Client storage
 
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of DutyCassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
 
Adding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of TricksAdding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of Tricks
 
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
 
Introducing Venice - Strata NYC 2017
Introducing Venice - Strata NYC 2017Introducing Venice - Strata NYC 2017
Introducing Venice - Strata NYC 2017
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 

Kürzlich hochgeladen

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Kürzlich hochgeladen (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Accelerating NoSQL Running Voldemort on HailDB

  • 1. Accelerating NoSQL Running Voldemort on HailDB Sunny Gleason March 11, 2011
  • 2. whoami • Sunny Gleason, human • passion: distributed systems engineering • previous... Ning : custom social networks Amazon.com : infra & web services • now... building cloud infrastructure
  • 3. whereami • twitter : twitter.com/sunnygleason • github : github.com/sunnygleason • linkedin : linkedin.com/in/sunnygleason
  • 4. what’s in this presentation? • NoSQL Roundup • Voldemort who? • HailDB wha? • Results & Next Steps • Special Bonus Material
  • 5. NoSQL • “Not Only” SQL • What’s the point? • Proponent: “reaching next level of scale” • Cynic: “cloud is hype, ops nightmare”
  • 6. what does it gain? • Higher performance, scalability, availability • More robust fault-tolerance • Simplified systems design • Easier operations
  • 7. what does it lose? • Reduced / simplified programming model • No ad-hoc queries, no joins, no txns • Not ACID: Atomicity / Consistency / Isolation / Durability • Operations / management is still evolving • Challenging to quantify health of system • Fewer domain experts
  • 8. NoSQL Map KV Stores (volatile) Memcached, Redis KV Stores Dynamo, Key-Value (durable) Voldemort, Store Riak Document Store NoSQL CouchDB, MongoDB Column Store Cassandra, BigTable, HBase Graph Neo4J Store
  • 9. NoSQL Map KV Stores (volatile) KV Stores Dynamo, Key-Value (durable) Voldemort, Store Riak Document Store NoSQL Column Store Graph Store
  • 10. motivation • database on 1 box : ok • database with master/slave replication : ok • database on cluster : tricky • database on SAN : time bomb
  • 11. performance + Sharding Complexity +FusionIO +SSD MySQL Voldemort +Cluster Memcached 1K 10K 100K 1M Aggregate Operations / Sec
  • 12. dynamo case study • Amazon : high read throughput, always- accessible writes • Shopping cart application • ‘Glitches’ ok, duplicate or missing item • Data loss or unavailability is unacceptable • Solution: K-V schema plus smart routing & data placement
  • 13. key-value storage • Essentially, a gigantic hash table • Typically assign byte[] values to byte[] keys • Plus versioning mixed in to handle failures and conflicts • Yes, you *can* do range partitioning; in practice, avoid it because of hot spots
  • 14. k-v: durable vs. volatile • RAM is ridiculous speed (ns), not durable • Disk is persistent and slow (3-7ms) • RAID eases the pain a bit (4-8x throughput) • SSD is providing good promise (100-300us) • FusionIO is redefining the space (30-100us)
  • 15. dynamo clones • Voldemort : from LinkedIn, dynamo implementation in Java (default: BDB-JE) • Riak : from Basho, dynamo implementation in Erlang (default: embedded InnoDB)
  • 16. Voldemort • Developed at LinkedIn • Scalable Key-Value Storage • Based on Amazon Dynamo model • High Read Throughput • Always Writable
  • 17. Voldemort features • Consistent Hashing • Quorum settings : R, W, N • Auto-sharding & rebalancing • Pluggable storage engines
  • 18. Consistent Hashing * Arrange keys around ring * Compute token in ring using hash function * Determine nodes responsible for token using live set
  • 19. R/W/N • N : maximum number of nodes to query for an operation • R : read quorum • W : write quorum • Can adjust ‘quorum’ to balance throughput and fault-tolerance
  • 20. setting up Voldemort 1 Step 1: Download the code Download either a recent stable release or, for those who like to live more dangerously, the up-to-the-minute build from the build server. Step 2: Start single node cluster > bin/voldemort-server.sh config/single_node_cluster > /tmp/voldemort.log & Step 3: Start commandline test client and do some operations > bin/voldemort-shell.sh test tcp://localhost:6666 Established connection to test via tcp://localhost:6666 > put "hello" "world" > get "hello" version(0:1): "world" > delete "hello" > get "hello" null > exit k k thx bye.
  • 21. setting up Voldemort 2 • For a cluster, use cloud startup scripts • Works with Amazon EC2 • See https://github.com/voldemort/ voldemort/wiki/EC2-Testing-Infrastructure
  • 22. Voldemort client libraries • Java, Scala, Clojure • Ruby • Python • C++
  • 23. storage engines • BDB-JE (Oracle Sleepycat, the original) • Krati (LinkedIn, pretty new) • HailDB (new!) • MySQL (old / dated)
  • 24. BDB-JE • Log-Structured B-Tree • Fast Storage When Mostly Cached • Configured without fsync() by default - writes are batched and flushed periodically
  • 25. Krati • Fast Hash-Oriented Storage • Uses memory-mapped files for speed • Configured without fsync() by default - writes are batched and flushed periodically
  • 26. HailDB • Fork of MySQL InnoDB plugin (contributors : Oracle, Google, Facebook, Percona) • Higher stability for large data sets • Fast crash recovery • External from Java heap (ease GC pain) • apt-get install haildb (from launchpad PPA) • Use “flush-once-per-second” mode
  • 27. HailDB, Java & Voldemort Voldemort Client Voldemort Node Voldemort Node Voldemort Node v-storage-inno g414-haildb JNA HailDB (log, buffer pool, tablespace)
  • 28. HailDB & Java • g414-haildb : where the magic happens • uses JNA: Java Native Access • dynamic binding to libhaildb shared library • auto-generated from .h file (w/ JNAerator) • Pointer classes & other shenanigans
  • 29. HailDB schema _key VARBINARY(200) _version VARBINARY(200) _value BLOB PRIMARY KEY(_key, _version)
  • 30. implementation gotchas • InnoDB API-level usage is unclear • Synchronization & locking is unclear • Therefore... I learned to love reading C • Error handling is *nasty* • Installation a bit of a pain
  • 31. experimental setup • OS X: 8-Core Xeon, 32GB RAM, 200GB OWC SSD • Faban Benchmark : PUT 64-byte key, 1024- byte value • Scenarios:1, 2, 4, 8 threads • 512M Java Heap
  • 33. Perf: BDB Put 20%/Get 80%
  • 34. Perf: BDB Put 20% Detail
  • 36. Perf: Krati Put 20%/Get 80%
  • 37. Perf: Krati Put 20% Detail
  • 39. Perf: HailDB Put 20%/Get 80%
  • 40. Perf: HailDB Put 20% Detail
  • 41. future work • Improve Packaging / Installation • Schema refinements & perf enhancements • Online backup/export with XtraBackup • JNI Bindings
  • 42. schema refinements • Build upon Nokia work on fast k-v schema • 8-byte ‘long’ key hash vs. full key bytes • Smart use of secondary indexes • Native representation of vector clocks • Delayed / soft deletion • Expect 40-50% performance boost
  • 43. InnoDB tuning • Skinny columns, skinny rows! (esp. Primary Key) • Varchar enum ‘bad’, int or smallint ‘good’ • fixed-width rows allows in-place updates • Use covering indexes strategically • More data per page means faster index scans, more efficient buffer pool utilization • You only get so many trx’s on given CPU/RAM configuration - benchmark this!
  • 44. refined schema _id BIGINT (auto increment) _key_hash BIGINT _key VARBINARY(200) _version VARBINARY(200) _value BLOB PRIMARY KEY(_id) KEY(_key_hash)
  • 45. online backup • hot backup of data to other machine / destination • test Percona Xtrabackup with HailDB • next step: backup/export to Hadoop/HDFS (similar to Cloudera Sqoop tool)
  • 46. JNI bindings • JNI can get 2-5x perf boost vs. JNA • ... at the expense of nasty code • Will go for schema optimizations and InnoDB tuning tips *first*
  • 47. resources • github.com/voldemort/voldemort freenode #voldemort • github.com/sunnygleason/v-storage-haildb github.com/sunnygleason/v-storage-bench github.com/sunnygleason/g414-haildb • jna.dev.java.net
  • 48. more resources • Amazon Dynamo • Faban / XFaban • HailDB • Drizzle • PBXT