SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
Riak Search
Performance Wins
 How we got > 100x improvement
            in query throughput


            Gary Flake, Founder
           gary@clipboard.com
Demo


       Introduction
Architecture
                             web-01                       web-02                web-03
                          Node.js + Nginx              Node.js + Nginx       Node.js + Nginx




                      riak-01

                                                          cache-01                redis-01

  riak-05                                 riak-02
                                                          cache-02                redis-02


                                                          cache-03

            riak-04             riak-03
                                                                                                 admin-01




                      thumb-01              thumb-02                     job-01         job-02
Riak

An awesome noSQL data store:

• Super easy to scale up AND down
• Fault tolerant – no SPoF
• Flexible schema
• Full-text search out of the box
• Can be fixed and improved in Erlang (the
  Basho folks awesomely take our commits)
Riak – Basics

• Data in Riak is grouped buckets
  (effectively namespaces)
• Basic operations are:
    •   Get, save, delete, search, map, reduce
• Eventual consistency managed through
  N, R, and W bucket parameters.
• Everything we put in Riak is JSON
• We talk to Riak through the excellent riak-js
  node library by Francisco Treacy
Data Model – Clips
           title                  ctime
                                          domain

 author




mentions           annotation   tags
Data Model - Clips
Clips are the gateway to all of our data

                   <html>         Comments on Clip ‘abc’
                      …                  “F1rst”

                   </html>
 key: abc           Blob              “Nice clip yo!”


                                  “Saw this on Reddit…”
   Clip            Key: abc



                Comment Cache
Other Buckets

• Users
• Blobs
• Comments
• Templates
• Counts
• Search Caches
• Transactions
Riak Search

• Gets many things out of Riak by something
  other than the primary key.
• You specify a schema (the types for the
  field within a JSON object).
• Works great but with one big gotcha:
  – Index is uses term-based partitioning instead
    of document-based partitioning
  – Implication: joins + sort + pagination sucks
  – We know how to work around this
Riak Search – Querying

• Query syntax based on Lucene
• Basic Query
   text:funny
• Compound Query
   login:greg OR (login:gary AND tags:riak)
• Range Query
   ctime:[98685879630026 TO 98686484430026]
Clipboard App Flow
      Client                           node.js                           Riak
            Go to clipboard.com/home
                                                  Search clips bucket
                                                   query = login:greg

                                                     Top 20 results
                  Top 20 results
    start
rendering
                  (For each clip)
               API Request for blob
                                                 GET from blobs bucket

               Return blob to client
  render
    blob
Clipboard Queries


                 login:greg



               mentions:greg



  ctime:[98685879630026 TO 98686484430026]

                                             (Search)
Clipboard Queries cont.



            login:greg AND tags:riak




  login:greg AND text:node AND text:javascript


                                                 (Search)
Uh oh


               login:greg AND private:false
  Matches only my clips           Matches 20% of all clips!




                login:greg AND text:iPhone



                                                              (Search)
Index Partitioning Schemes
Doc Partition Query Processing

1. x AND y (sort z, start = 990, count = 10)
2. On Each node:
    1. Perform x AND y
    2. Sort on z
    3. Slice [ 0 .. 1000 ]
    4. Send to aggregator
3. On aggregator
    1. Merge all results (N x 1000)
    2. Slice [ 990 .. 1000 ]
Term Partition Query Processing

1. x AND y (sort z, start = 990, count = 10)
2. On x node: search for x (and send all)
3. On y node: search for y (and send all)
4. On aggregator:
    1. Do x AND y
    2. Sort on z
    3. Slice to [ 990 .. 1000 ]
Riak Search Issues

1. For any singular term, all results must be
   sent back to aggregator.
2. Incorrectly performs sort and slice (does
   sort then slice)
3. ANDs take time O(MAX(|x|, |y|)) instead
   of O(MIN(|x|, |y|).
4. All matches must be read to get sort field.
Riak Search Fixes

1. Inline fields for short and common
   attributes.
2. Dynamic fields for precomputed ANDs.
3. PRESORT option for sorting without
   document reads.
Inline Fields

Nifty feature added recently to Riak Search


Fields only used to prune result set can be
made inline for a big perf win


Normal query applied first – then results filtered
quickly with inline “filter” query


High storage cost – only viable for small fields!

                                               (Search)
Riak Search – Inline Fields cont.


             login:greg AND private:false

                       becomes
                   Query - login:greg
              Filter Query – private:false

 private:false is efficiently applied only to results of
 login:greg. Hooray!
                                                       (Search)
Fixing ANDs

But what about login:greg AND text:iPhone?



text field is too large to inline!



We had to get creative.


                                         (Search)
Dynamic Fields
Our Solution: Create a new field - text_u
   (u for user)


Values in text_u have the user’s name appended


In greg’s clip
 text:iPhone  text_greg:iPhone
In bob’s clip
 text:iPhone  text_bob:iPhone

                                            (Search)
Presort on Keys

• Our addition to Riak code base.
• Does sort before slice
• If PRESORT=key, then never reads the docs
• Tremendous win (> 100x compared to M/R
  approaches)
Clip Keys

<Time (ms)><User (guid)><SHA1 of Value>


• Base-64 encode each component
• Only use first 4 characters of user & content
• Only 16 bytes


Collisions? 1 in 17M if clipped the same thing
at same time.
Our Query Processing

1. w AND (x AND y)
   (sort z, start = 990, count = 10)
2. On w_x node: search and send w_x
3. On w_y node: search and send all w_y
4. On aggregator:
    1. Do w_x AND w_y
    2. Sort on z
    3. Slice to [ 990 .. 1000 ]
Summary

• Use inline fields for short and common bits
• Use dynamic fields for prebuilt ANDs
• Use keys that imply sort order
• Use same techniques for pagination


• Out approach yields search throughput
  that is 100x better than out of the box (and
  better as you scale outward).
Questions?
We’re hiring!


       www.clipboard.com/register
          Invitation Code: just4u


        www.clipboard.com/jobs
         Or talk to us right now!



                                    Thanks!

Weitere ähnliche Inhalte

Was ist angesagt?

Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms_mdev_
 
Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1datamantra
 
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
Anatomy of Data Frame API :  A deep dive into Spark Data Frame APIAnatomy of Data Frame API :  A deep dive into Spark Data Frame API
Anatomy of Data Frame API : A deep dive into Spark Data Frame APIdatamantra
 
Using JPA applications in the era of NoSQL: Introducing Hibernate OGM
Using JPA applications in the era of NoSQL: Introducing Hibernate OGMUsing JPA applications in the era of NoSQL: Introducing Hibernate OGM
Using JPA applications in the era of NoSQL: Introducing Hibernate OGMPT.JUG
 
Datomic – A Modern Database - StampedeCon 2014
Datomic – A Modern Database - StampedeCon 2014Datomic – A Modern Database - StampedeCon 2014
Datomic – A Modern Database - StampedeCon 2014StampedeCon
 
Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...
Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...
Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...Restlet
 
Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010Rusty Klophaus
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 PresentationsAna Rebelo
 
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADRTweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADRLucidworks
 
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)Cohesive Networks
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionWes McKinney
 
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...Alex Gorbachev
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLdatamantra
 
The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!Michele Leroux Bustamante
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Piotr Findeisen
 
Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Josh Carlisle
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Wes McKinney
 

Was ist angesagt? (20)

Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms
 
Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1
 
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
Anatomy of Data Frame API :  A deep dive into Spark Data Frame APIAnatomy of Data Frame API :  A deep dive into Spark Data Frame API
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
 
Using JPA applications in the era of NoSQL: Introducing Hibernate OGM
Using JPA applications in the era of NoSQL: Introducing Hibernate OGMUsing JPA applications in the era of NoSQL: Introducing Hibernate OGM
Using JPA applications in the era of NoSQL: Introducing Hibernate OGM
 
Datomic – A Modern Database - StampedeCon 2014
Datomic – A Modern Database - StampedeCon 2014Datomic – A Modern Database - StampedeCon 2014
Datomic – A Modern Database - StampedeCon 2014
 
Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...
Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...
Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...
 
Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 Presentations
 
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADRTweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
 
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
 
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
 
Apache Spark in Industry
Apache Spark in IndustryApache Spark in Industry
Apache Spark in Industry
 
Scala profiling
Scala profilingScala profiling
Scala profiling
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
 
The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!
 
Introduction to datomic
Introduction to datomicIntroduction to datomic
Introduction to datomic
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020
 
Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
 

Andere mochten auch

Riak 2.0 : For Beginners, and Everyone Else
Riak 2.0 : For Beginners, and Everyone ElseRiak 2.0 : For Beginners, and Everyone Else
Riak 2.0 : For Beginners, and Everyone ElseEngin Yoeyen
 
Proyecto fredy-jaramillo extenzo
Proyecto fredy-jaramillo extenzoProyecto fredy-jaramillo extenzo
Proyecto fredy-jaramillo extenzoFreddy Jaramillo
 
Plt process (category products)
Plt process (category products)Plt process (category products)
Plt process (category products)Fitira
 
Data data every where!! Thomas O'Grady
Data data every where!! Thomas O'GradyData data every where!! Thomas O'Grady
Data data every where!! Thomas O'Gradytomo006
 
презентация1
презентация1презентация1
презентация1Danil Kozlov
 
Bunny booktemplate1
Bunny booktemplate1Bunny booktemplate1
Bunny booktemplate1mjbeichner
 
Voting presentation
Voting presentationVoting presentation
Voting presentationhannahfenney
 
Maritime New Haven - Sound School
Maritime New Haven - Sound SchoolMaritime New Haven - Sound School
Maritime New Haven - Sound SchoolAmy Durbin
 
Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...
Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...
Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...B1 Systems GmbH
 
Marketing Management
Marketing ManagementMarketing Management
Marketing Managementrsinghkaurav
 
Android vs ios
Android vs iosAndroid vs ios
Android vs iosgndolf
 
Forefront for exchange entrenamiento ventas es
Forefront for exchange entrenamiento ventas esForefront for exchange entrenamiento ventas es
Forefront for exchange entrenamiento ventas esFitira
 
Simplify and run your development environments with Vagrant on OpenStack
Simplify and run your development environments with Vagrant on OpenStackSimplify and run your development environments with Vagrant on OpenStack
Simplify and run your development environments with Vagrant on OpenStackB1 Systems GmbH
 
The Poker Entrepreneurship: Speaking @ JFDI.Asia
The Poker Entrepreneurship: Speaking @ JFDI.AsiaThe Poker Entrepreneurship: Speaking @ JFDI.Asia
The Poker Entrepreneurship: Speaking @ JFDI.Asiasaumilnanavati
 
P m01 inside_selling
P m01 inside_sellingP m01 inside_selling
P m01 inside_sellingFitira
 
Digital Audio/Podcast Assignment
Digital Audio/Podcast AssignmentDigital Audio/Podcast Assignment
Digital Audio/Podcast AssignmentJordan Kelly
 
P m01 inside_selling
P m01 inside_sellingP m01 inside_selling
P m01 inside_sellingFitira
 

Andere mochten auch (20)

Riak 2.0 : For Beginners, and Everyone Else
Riak 2.0 : For Beginners, and Everyone ElseRiak 2.0 : For Beginners, and Everyone Else
Riak 2.0 : For Beginners, and Everyone Else
 
Leon fagan
Leon faganLeon fagan
Leon fagan
 
Proyecto fredy-jaramillo extenzo
Proyecto fredy-jaramillo extenzoProyecto fredy-jaramillo extenzo
Proyecto fredy-jaramillo extenzo
 
Plt process (category products)
Plt process (category products)Plt process (category products)
Plt process (category products)
 
Data data every where!! Thomas O'Grady
Data data every where!! Thomas O'GradyData data every where!! Thomas O'Grady
Data data every where!! Thomas O'Grady
 
26 28
26 2826 28
26 28
 
презентация1
презентация1презентация1
презентация1
 
Bunny booktemplate1
Bunny booktemplate1Bunny booktemplate1
Bunny booktemplate1
 
Voting presentation
Voting presentationVoting presentation
Voting presentation
 
Maritime New Haven - Sound School
Maritime New Haven - Sound SchoolMaritime New Haven - Sound School
Maritime New Haven - Sound School
 
Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...
Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...
Klein, aber oho - Continuous Delivery von Micro Applications mit Jenkins, Doc...
 
Digipak analysis
Digipak analysisDigipak analysis
Digipak analysis
 
Marketing Management
Marketing ManagementMarketing Management
Marketing Management
 
Android vs ios
Android vs iosAndroid vs ios
Android vs ios
 
Forefront for exchange entrenamiento ventas es
Forefront for exchange entrenamiento ventas esForefront for exchange entrenamiento ventas es
Forefront for exchange entrenamiento ventas es
 
Simplify and run your development environments with Vagrant on OpenStack
Simplify and run your development environments with Vagrant on OpenStackSimplify and run your development environments with Vagrant on OpenStack
Simplify and run your development environments with Vagrant on OpenStack
 
The Poker Entrepreneurship: Speaking @ JFDI.Asia
The Poker Entrepreneurship: Speaking @ JFDI.AsiaThe Poker Entrepreneurship: Speaking @ JFDI.Asia
The Poker Entrepreneurship: Speaking @ JFDI.Asia
 
P m01 inside_selling
P m01 inside_sellingP m01 inside_selling
P m01 inside_selling
 
Digital Audio/Podcast Assignment
Digital Audio/Podcast AssignmentDigital Audio/Podcast Assignment
Digital Audio/Podcast Assignment
 
P m01 inside_selling
P m01 inside_sellingP m01 inside_selling
P m01 inside_selling
 

Ähnlich wie Riak perf wins

Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large GraphsNishant Gandhi
 
Adding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of TricksAdding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of Trickssiculars
 
Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...
Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...
Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...Flink Forward
 
Tuning Flink For Robustness And Performance
Tuning Flink For Robustness And PerformanceTuning Flink For Robustness And Performance
Tuning Flink For Robustness And PerformanceStefan Richter
 
OSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross LawleyOSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross LawleyNETWAYS
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbMongoDB APAC
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nlbartzon
 
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Kevin Xu
 
Circonus: Design failures - A Case Study
Circonus: Design failures - A Case StudyCirconus: Design failures - A Case Study
Circonus: Design failures - A Case StudyHeinrich Hartmann
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nltieleman
 
Tweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийTweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийGeeksLab Odessa
 
The Performance Engineer's Guide To HotSpot Just-in-Time Compilation
The Performance Engineer's Guide To HotSpot Just-in-Time CompilationThe Performance Engineer's Guide To HotSpot Just-in-Time Compilation
The Performance Engineer's Guide To HotSpot Just-in-Time CompilationMonica Beckwith
 
Bh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slidesBh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slidesMatt Kocubinski
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projectsDmitriy Dumanskiy
 
Monitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with ZabbixMonitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with ZabbixGerger
 
Rafael Bagmanov «Scala in a wild enterprise»
Rafael Bagmanov «Scala in a wild enterprise»Rafael Bagmanov «Scala in a wild enterprise»
Rafael Bagmanov «Scala in a wild enterprise»e-Legion
 

Ähnlich wie Riak perf wins (20)

Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
 
Adding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of TricksAdding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of Tricks
 
Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...
Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...
Flink Forward Berlin 2018: Stefan Richter - "Tuning Flink for Robustness and ...
 
Tuning Flink For Robustness And Performance
Tuning Flink For Robustness And PerformanceTuning Flink For Robustness And Performance
Tuning Flink For Robustness And Performance
 
OSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross LawleyOSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross Lawley
 
遇見 Ruby on Rails
遇見 Ruby on Rails遇見 Ruby on Rails
遇見 Ruby on Rails
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
Fluent 2012 v2
Fluent 2012   v2Fluent 2012   v2
Fluent 2012 v2
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
 
Circonus: Design failures - A Case Study
Circonus: Design failures - A Case StudyCirconus: Design failures - A Case Study
Circonus: Design failures - A Case Study
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
Tweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийTweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский Дмитрий
 
The Performance Engineer's Guide To HotSpot Just-in-Time Compilation
The Performance Engineer's Guide To HotSpot Just-in-Time CompilationThe Performance Engineer's Guide To HotSpot Just-in-Time Compilation
The Performance Engineer's Guide To HotSpot Just-in-Time Compilation
 
Bh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slidesBh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slides
 
Let's Get to the Rapids
Let's Get to the RapidsLet's Get to the Rapids
Let's Get to the Rapids
 
TiDB Introduction
TiDB IntroductionTiDB Introduction
TiDB Introduction
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projects
 
Monitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with ZabbixMonitoring Oracle Database Instances with Zabbix
Monitoring Oracle Database Instances with Zabbix
 
Rafael Bagmanov «Scala in a wild enterprise»
Rafael Bagmanov «Scala in a wild enterprise»Rafael Bagmanov «Scala in a wild enterprise»
Rafael Bagmanov «Scala in a wild enterprise»
 

Kürzlich hochgeladen

Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 

Kürzlich hochgeladen (20)

Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 

Riak perf wins

  • 1. Riak Search Performance Wins How we got > 100x improvement in query throughput Gary Flake, Founder gary@clipboard.com
  • 2. Demo Introduction
  • 3. Architecture web-01 web-02 web-03 Node.js + Nginx Node.js + Nginx Node.js + Nginx riak-01 cache-01 redis-01 riak-05 riak-02 cache-02 redis-02 cache-03 riak-04 riak-03 admin-01 thumb-01 thumb-02 job-01 job-02
  • 4. Riak An awesome noSQL data store: • Super easy to scale up AND down • Fault tolerant – no SPoF • Flexible schema • Full-text search out of the box • Can be fixed and improved in Erlang (the Basho folks awesomely take our commits)
  • 5. Riak – Basics • Data in Riak is grouped buckets (effectively namespaces) • Basic operations are: • Get, save, delete, search, map, reduce • Eventual consistency managed through N, R, and W bucket parameters. • Everything we put in Riak is JSON • We talk to Riak through the excellent riak-js node library by Francisco Treacy
  • 6. Data Model – Clips title ctime domain author mentions annotation tags
  • 7. Data Model - Clips Clips are the gateway to all of our data <html> Comments on Clip ‘abc’ … “F1rst” </html> key: abc Blob “Nice clip yo!” “Saw this on Reddit…” Clip Key: abc Comment Cache
  • 8. Other Buckets • Users • Blobs • Comments • Templates • Counts • Search Caches • Transactions
  • 9. Riak Search • Gets many things out of Riak by something other than the primary key. • You specify a schema (the types for the field within a JSON object). • Works great but with one big gotcha: – Index is uses term-based partitioning instead of document-based partitioning – Implication: joins + sort + pagination sucks – We know how to work around this
  • 10. Riak Search – Querying • Query syntax based on Lucene • Basic Query text:funny • Compound Query login:greg OR (login:gary AND tags:riak) • Range Query ctime:[98685879630026 TO 98686484430026]
  • 11. Clipboard App Flow Client node.js Riak Go to clipboard.com/home Search clips bucket query = login:greg Top 20 results Top 20 results start rendering (For each clip) API Request for blob GET from blobs bucket Return blob to client render blob
  • 12. Clipboard Queries login:greg mentions:greg ctime:[98685879630026 TO 98686484430026] (Search)
  • 13. Clipboard Queries cont. login:greg AND tags:riak login:greg AND text:node AND text:javascript (Search)
  • 14. Uh oh login:greg AND private:false Matches only my clips Matches 20% of all clips! login:greg AND text:iPhone (Search)
  • 16. Doc Partition Query Processing 1. x AND y (sort z, start = 990, count = 10) 2. On Each node: 1. Perform x AND y 2. Sort on z 3. Slice [ 0 .. 1000 ] 4. Send to aggregator 3. On aggregator 1. Merge all results (N x 1000) 2. Slice [ 990 .. 1000 ]
  • 17. Term Partition Query Processing 1. x AND y (sort z, start = 990, count = 10) 2. On x node: search for x (and send all) 3. On y node: search for y (and send all) 4. On aggregator: 1. Do x AND y 2. Sort on z 3. Slice to [ 990 .. 1000 ]
  • 18. Riak Search Issues 1. For any singular term, all results must be sent back to aggregator. 2. Incorrectly performs sort and slice (does sort then slice) 3. ANDs take time O(MAX(|x|, |y|)) instead of O(MIN(|x|, |y|). 4. All matches must be read to get sort field.
  • 19. Riak Search Fixes 1. Inline fields for short and common attributes. 2. Dynamic fields for precomputed ANDs. 3. PRESORT option for sorting without document reads.
  • 20. Inline Fields Nifty feature added recently to Riak Search Fields only used to prune result set can be made inline for a big perf win Normal query applied first – then results filtered quickly with inline “filter” query High storage cost – only viable for small fields! (Search)
  • 21. Riak Search – Inline Fields cont. login:greg AND private:false becomes Query - login:greg Filter Query – private:false private:false is efficiently applied only to results of login:greg. Hooray! (Search)
  • 22. Fixing ANDs But what about login:greg AND text:iPhone? text field is too large to inline! We had to get creative. (Search)
  • 23. Dynamic Fields Our Solution: Create a new field - text_u (u for user) Values in text_u have the user’s name appended In greg’s clip text:iPhone  text_greg:iPhone In bob’s clip text:iPhone  text_bob:iPhone (Search)
  • 24. Presort on Keys • Our addition to Riak code base. • Does sort before slice • If PRESORT=key, then never reads the docs • Tremendous win (> 100x compared to M/R approaches)
  • 25. Clip Keys <Time (ms)><User (guid)><SHA1 of Value> • Base-64 encode each component • Only use first 4 characters of user & content • Only 16 bytes Collisions? 1 in 17M if clipped the same thing at same time.
  • 26. Our Query Processing 1. w AND (x AND y) (sort z, start = 990, count = 10) 2. On w_x node: search and send w_x 3. On w_y node: search and send all w_y 4. On aggregator: 1. Do w_x AND w_y 2. Sort on z 3. Slice to [ 990 .. 1000 ]
  • 27. Summary • Use inline fields for short and common bits • Use dynamic fields for prebuilt ANDs • Use keys that imply sort order • Use same techniques for pagination • Out approach yields search throughput that is 100x better than out of the box (and better as you scale outward).
  • 29. We’re hiring! www.clipboard.com/register Invitation Code: just4u www.clipboard.com/jobs Or talk to us right now! Thanks!