SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Riak @


     Robby Grossman
  robby@shareaholic.com
       @freerobby
Agenda

Shareaholic: Product & Tech

Why Riak: The Search for a Big Data Store

Transitioning to Riak

Riak Use Cases

Deploying to EC2
What’s   ?
Browser Tools
Sharing Buttons
Recommendations
Social Analytics
Monthly @

 Thousands of developers hitting API

 Hundreds of thousands of publishers

 Tens of millions of shares & clicks

 Hundreds of millions of pageviews & events
Tech @

JRuby on Rails (via Torquebox)

MySQL (Master, Read Slave)

Elastic MapReduce (similar to Hadoop)

Redis

Formerly Mongo, Now Riak
Why Not Mongo?


Working set needs to fit in memory

Global write lock blocks all queries
despite not having transactions/joins

Standbys not “hot”
Why Riak?
Next @
Options:      Goals:

  HBase         Linear scalability

  Cassandra     Full-text search

  Riak          Flexible indexing

                Easier Devops
HBase
Pros                  Cons

  Battle tested           Complex
                          Architecture
  High performance
                          SPOFs

                          Requires Hive for
                          Indexing/Querying

                          Expensive to deploy
                          at small scale
Cassandra
Pros                   Cons

  Native secondary       Known users all
  indices                domain experts

  Linear scalability     Search requires
                         Lucene
  Tunable CAP
                         Heavy Weight
                         MapReduce
Riak
Pros                          Cons

  Operationally simpler         Multi-data center
                                replication requires
  Linear scalability            Enterprise product

  Integrated search             leveldb puts high
                                strain on CPU
  Secondary indices

  Tunable CAP

  Vector clocks solve
  time-sync problems
From Mongo to Riak
Migration Goals



No time where database goes “offline”

Product parity throughout migration
Migration Process

1. App writes to Mongo and Riak

2. Verify data integrity

3. Import historical data

4. App reads from Riak

5. Decommission Mongo
Use Cases
Share API


Save shared content

Uses MapReduce to
populate user dashboard
Recommendations



Sets of related pages

Generated on-demand
Publisher Analytics


Generated nightly via Hadoop

Typical stored “document” (JSON)

80kb-1Mb
Riak Successes
MapReduce

Handy for querying

Runs at “web page speed”.

Easy to re-reduce for complex queries

Easy to test via CURL
Tunable CAP @


    Replication: primary/secondary authority

    Read failure tolerance: speed/consistency

    Write failure tolerance
Full Text Search

Built on Lucene

Make user content searchable

Make arbitrary keys queryable

“Just turn it on”


Hiccup: corrupt merge indexes
Query Example
  Who’s our oldest user who’s shared something in the last minute?

curl -XPOST http://localhost:8098/mapred -H 'Content-Type: application/json' -d '{
   "inputs": {
      "bucket":"links",
      "query":"timestamp:[1346350877 TO 1346350937}" //60 second period
   },
   "query":[
      {"map":{"language":"javascript","source":"function(riakObject) {
         return [[Riak.mapValuesJson(riakObject)[0].user_id]];
      }"}},
      {"reduce":{"language":"javascript",
         "name":"Riak.reduceMin" // [[2],[5],[9],[13]] => [[2]]
      }}
   ]
}'


                                    [[2197]]
Riak on EC2
In a Nutshell

EC2 specs poorly proportioned for leveldb

Multiple AZs in one location works well

Scale vertically for better latency & consistency

Scale horizontally for more throughput/$
Benchmarks

Top Graph: c1.medium (1.7G, 5 CPU)



Middle: m1.large (7.5G, 4 CPU)



Bottom: cc1.4xlarge (23G, 33.5 CPU)
Throughput
Latency (Typical)
Latency (Worst Case)
Calculations
c1.medium (1.7G, 5 CPU)
1758 IOPS/$-hr
Worst 1% of queries: 300ms/800ms

m1.large (7.5G, 4 CPU)
1167 IOPS/$-hr
Worst 1% of queries: 110ms/200ms

cc1.4xlarge (23G, 33.5 CPU)
872 IOPS/$-hr
Worst 1% of queries: 47ms/139ms
Benchmark Takeaways


 You can’t go “by spec”

 IO is limiting factor

 RAM never limiting factor for 1%
 of keyspace to be in memory
Fin. Questions?
Thanks:                 We’re Hiring!

  Tom Santero              Robby Grossman

  Justin Sheehy            robby@shareaholic.com

  Ryan Zezeski             @freerobby

  Reid Draper

  #freenode riak crew
Fin.

Weitere ähnliche Inhalte

Was ist angesagt?

SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
Chester Chen
 
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
confluent
 
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseHBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
Michael Stack
 
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
Michael Stack
 

Was ist angesagt? (20)

SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
 
Apache HBase Workshop
Apache HBase WorkshopApache HBase Workshop
Apache HBase Workshop
 
A Collaborative Data Science Development Workflow
A Collaborative Data Science Development WorkflowA Collaborative Data Science Development Workflow
A Collaborative Data Science Development Workflow
 
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
 
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseHBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
 
When the Cloud is a Rockin: High Availability in Apache CloudStack
When the Cloud is a Rockin: High Availability in Apache CloudStackWhen the Cloud is a Rockin: High Availability in Apache CloudStack
When the Cloud is a Rockin: High Availability in Apache CloudStack
 
James Turner (Caplin) - Enterprise HTML5 Patterns
James Turner (Caplin) - Enterprise HTML5 PatternsJames Turner (Caplin) - Enterprise HTML5 Patterns
James Turner (Caplin) - Enterprise HTML5 Patterns
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to Kafka
 
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
 
HBaseConAsia2018 Track3-5: HBase Practice at Lianjia
HBaseConAsia2018 Track3-5: HBase Practice at LianjiaHBaseConAsia2018 Track3-5: HBase Practice at Lianjia
HBaseConAsia2018 Track3-5: HBase Practice at Lianjia
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
HBaseConAsia2018 Track3-2: HBase at China Telecom
HBaseConAsia2018 Track3-2:  HBase at China TelecomHBaseConAsia2018 Track3-2:  HBase at China Telecom
HBaseConAsia2018 Track3-2: HBase at China Telecom
 
Column and hadoop
Column and hadoopColumn and hadoop
Column and hadoop
 
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...
 
Apache Spark on Kubernetes
Apache Spark on KubernetesApache Spark on Kubernetes
Apache Spark on Kubernetes
 
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
 
Big Data Platform at Pinterest
Big Data Platform at PinterestBig Data Platform at Pinterest
Big Data Platform at Pinterest
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 

Andere mochten auch

Andere mochten auch (6)

Migrating to Riak at Shareaholic
Migrating to Riak at ShareaholicMigrating to Riak at Shareaholic
Migrating to Riak at Shareaholic
 
Riak TS
Riak TSRiak TS
Riak TS
 
IoT BASED VEHICLE TRACKING AND TRAFFIC SURVIELLENCE SYSTEM
IoT BASED VEHICLE TRACKING AND TRAFFIC SURVIELLENCE SYSTEMIoT BASED VEHICLE TRACKING AND TRAFFIC SURVIELLENCE SYSTEM
IoT BASED VEHICLE TRACKING AND TRAFFIC SURVIELLENCE SYSTEM
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
Data Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQLData Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQL
 
An Introduction to Distributed Search with Cassandra and Solr
An Introduction to Distributed Search with Cassandra and SolrAn Introduction to Distributed Search with Cassandra and Solr
An Introduction to Distributed Search with Cassandra and Solr
 

Ähnlich wie Riak at shareaholic

How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and Fast
MapR Technologies
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
Heriyadi Janwar
 

Ähnlich wie Riak at shareaholic (20)

How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and Fast
 
Understanding Database Options
Understanding Database OptionsUnderstanding Database Options
Understanding Database Options
 
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
 
Glint with Apache Spark
Glint with Apache SparkGlint with Apache Spark
Glint with Apache Spark
 
High Performance Databases
High Performance DatabasesHigh Performance Databases
High Performance Databases
 
Scalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache SamzaScalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache Samza
 
Riak at Engine Yard Cloud
Riak at Engine Yard CloudRiak at Engine Yard Cloud
Riak at Engine Yard Cloud
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
 
Efficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out DatabasesEfficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out Databases
 
Containerized Hadoop beyond Kubernetes
Containerized Hadoop beyond KubernetesContainerized Hadoop beyond Kubernetes
Containerized Hadoop beyond Kubernetes
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale Systems
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
 
DAT101 Understanding AWS Database Options - AWS re: Invent 2012
DAT101 Understanding AWS Database Options - AWS re: Invent 2012DAT101 Understanding AWS Database Options - AWS re: Invent 2012
DAT101 Understanding AWS Database Options - AWS re: Invent 2012
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network Analytics
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
SQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerSQL and NoSQL in SQL Server
SQL and NoSQL in SQL Server
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Riak at shareaholic

  • 1. Riak @ Robby Grossman robby@shareaholic.com @freerobby
  • 2. Agenda Shareaholic: Product & Tech Why Riak: The Search for a Big Data Store Transitioning to Riak Riak Use Cases Deploying to EC2
  • 8. Monthly @ Thousands of developers hitting API Hundreds of thousands of publishers Tens of millions of shares & clicks Hundreds of millions of pageviews & events
  • 9. Tech @ JRuby on Rails (via Torquebox) MySQL (Master, Read Slave) Elastic MapReduce (similar to Hadoop) Redis Formerly Mongo, Now Riak
  • 10. Why Not Mongo? Working set needs to fit in memory Global write lock blocks all queries despite not having transactions/joins Standbys not “hot”
  • 12. Next @ Options: Goals: HBase Linear scalability Cassandra Full-text search Riak Flexible indexing Easier Devops
  • 13. HBase Pros Cons Battle tested Complex Architecture High performance SPOFs Requires Hive for Indexing/Querying Expensive to deploy at small scale
  • 14. Cassandra Pros Cons Native secondary Known users all indices domain experts Linear scalability Search requires Lucene Tunable CAP Heavy Weight MapReduce
  • 15. Riak Pros Cons Operationally simpler Multi-data center replication requires Linear scalability Enterprise product Integrated search leveldb puts high strain on CPU Secondary indices Tunable CAP Vector clocks solve time-sync problems
  • 17. Migration Goals No time where database goes “offline” Product parity throughout migration
  • 18. Migration Process 1. App writes to Mongo and Riak 2. Verify data integrity 3. Import historical data 4. App reads from Riak 5. Decommission Mongo
  • 20. Share API Save shared content Uses MapReduce to populate user dashboard
  • 21. Recommendations Sets of related pages Generated on-demand
  • 22. Publisher Analytics Generated nightly via Hadoop Typical stored “document” (JSON) 80kb-1Mb
  • 24. MapReduce Handy for querying Runs at “web page speed”. Easy to re-reduce for complex queries Easy to test via CURL
  • 25. Tunable CAP @ Replication: primary/secondary authority Read failure tolerance: speed/consistency Write failure tolerance
  • 26. Full Text Search Built on Lucene Make user content searchable Make arbitrary keys queryable “Just turn it on” Hiccup: corrupt merge indexes
  • 27. Query Example Who’s our oldest user who’s shared something in the last minute? curl -XPOST http://localhost:8098/mapred -H 'Content-Type: application/json' -d '{ "inputs": { "bucket":"links", "query":"timestamp:[1346350877 TO 1346350937}" //60 second period }, "query":[ {"map":{"language":"javascript","source":"function(riakObject) { return [[Riak.mapValuesJson(riakObject)[0].user_id]]; }"}}, {"reduce":{"language":"javascript", "name":"Riak.reduceMin" // [[2],[5],[9],[13]] => [[2]] }} ] }' [[2197]]
  • 29. In a Nutshell EC2 specs poorly proportioned for leveldb Multiple AZs in one location works well Scale vertically for better latency & consistency Scale horizontally for more throughput/$
  • 30. Benchmarks Top Graph: c1.medium (1.7G, 5 CPU) Middle: m1.large (7.5G, 4 CPU) Bottom: cc1.4xlarge (23G, 33.5 CPU)
  • 34. Calculations c1.medium (1.7G, 5 CPU) 1758 IOPS/$-hr Worst 1% of queries: 300ms/800ms m1.large (7.5G, 4 CPU) 1167 IOPS/$-hr Worst 1% of queries: 110ms/200ms cc1.4xlarge (23G, 33.5 CPU) 872 IOPS/$-hr Worst 1% of queries: 47ms/139ms
  • 35. Benchmark Takeaways You can’t go “by spec” IO is limiting factor RAM never limiting factor for 1% of keyspace to be in memory
  • 36. Fin. Questions? Thanks: We’re Hiring! Tom Santero Robby Grossman Justin Sheehy robby@shareaholic.com Ryan Zezeski @freerobby Reid Draper #freenode riak crew
  • 37. Fin.