SlideShare ist ein Scribd-Unternehmen logo
1 von 53
Downloaden Sie, um offline zu lesen
Riak Search
    A Full-Text Search
   and Indexing Engine
         based on Riak


      Berlin Buzzwords· June 2010

               Basho Technologies
          Rusty Klophaus - @rklophaus
Why did we build it?
What are the major goals?
  How does it work?




                            2
Part One
Why did we build
 Riak Search?




                   3
Riak is
a scalable, highly-available, networked,
    open-source key/value store.




                                           4
Writing to a Key/Value Store




                 Key/Value




       CLIENT                  RIAK
                                      5
Writing to a Key/Value Store




                     Object




       CLIENT                  RIAK
                                      6
Querying a Key/Value Store




                       Key


                     Object

       CLIENT                 RIAK
                                     7
Querying Riak via LinkWalking

               Key + Instructions
                                     Walk to
                                     Related
                                      Keys


                  Object(s)

       CLIENT                       RIAK
                                               8
Querying Riak via Map/Reduce

            Key(s) + JS Functions
                                      Map

                                      Map

                                     Reduce
             Computed Value(s)

       CLIENT                       RIAK
                                              9
Key/Value Stores
       like
Key-Based Queries




                    10
Query by Secondary Index


            where Category == "Shoes"




                           WTF!? I'm a
                           KV store!



      CLIENT                            RIAK
                                               11
Full-Text Query


                  "Converse AND Shoes"




                                This is
                              getting old.



       CLIENT                             RIAK
                                                 12
These kinds of queries
   need an Index.

*Market Opportunity!*



                         13
Part Two
What are the major
goals of Riak Search?




                        14
An application built on Riak.




              Your
                                Riak
          Application




                                       15
Hrm... I need an index.




              Your
                             Riak
          Application     Index
                          Object




                                    16
Hrm... I need an index with more features.




       Your
                         ???           Riak
  Application




                                              17
Lucene should do the trick...




       Your
                      Lucene    Riak
   Application




                                       18
...shard to add more storage capacity...




      Your
   Application
                 Lucene   Lucene   Lucene   Riak




                                                   19
...replicate to add more throughput.




                 Lucene   Lucene   Lucene
      Your
   Application
                 Lucene   Lucene   Lucene   Riak
                 Lucene   Lucene   Lucene




                                                   20
...replicate to add more throughput.




                 Lucene   Lucene   Lucene
      Your
   Application
                 Lucene   Lucene   Lucene   Riak
                 Lucene   Lucene   Lucene




                  Operations nightmare!

                                                   21
What do we really want?




       Your       Riak-ified
                               Riak
   Application      Lucene




                                      22
What do we really want?




       Your          Riak
                            Riak
   Application     Search




                                   23
Functionality? Be like Lucene (and more).

• Lucene Syntax
• Leverages Java Lucene Analyzers
• Solr Endpoints
• Integration via Riak Post-Commit Hook (Index)
• Integration via Riak Map/Reduce (Query)
• Near-Realtime
• Schema-less


                                                  24
Operations? Be like Riak.

• No special nodes
• Add nodes, get more compute and storage
• Automatically load balance
• Replicas for durability and performance
• Index and query in parallel
• Swappable storage backends




                                            25
Part Three
How do we do it?




                   26
A Gentle Introduction to
  Document Indexing




                           27
The Inverted Index

    Document           Inverted Index



                           day, 1
                           dog, 1
  #1
       Every dog has
       his day.
                           every, 1
                           has, 1
                           his, 1




                                        28
The Inverted Index

   Documents            Combined Inverted Index
                                and, 4
 #1   Every dog has
      his day.
                                bag, 3
                                bark, 2
                                bite, 2
      The dog's bark
 #2
                                cat, 3
      is worse than
      his bite.                 cat, 4
                                day, 1
                                dog, 1
 #3
      Let the cat out
      of the bag.               dog, 2
                                dog, 4
                                every, 1
#4    It's raining
      cats and dogs.            has, 1
                                ...


                                                  29
At Query Time...

                   "dog AND cat"




                       AND



           dog                     cat
                                         30
At Query Time...

                   AND


           dog            cat


         dog, 1
                         cat, 3
         dog, 2
                         cat, 4
         dog, 4


                                  31
At Query Time...
                      Result: 4




                        AND
                 (Merge Intersection)




             1
                                        3
             2
                                        4
             4


                                            32
At Query Time...
                   Result: 1, 2, 3, 4




                           OR
                      (Merge Union)




             1
                                        3
             2
                                        4
             4


                                            33
Complex Behavior from Simple Structures




                                          34
Storage Approaches...




                        35
Riak Search uses
Consistent Hashing
 to store data on
     Partitions



                     36
Introduction to Consistent Hashing and Partitions

                Partitions = 10
                Number of Nodes = 5
                Partitions per Node = 2
                Replicas (NVal) = 2




                                                    37
Introduction to Consistent Hashing and Partitions


             Object




                                                    38
Document Partitioning
        vs.
  Term Partitioning




                        39
...and the
Resulting Tradeoffs




                      40
Document Partitioning @ Index Time


     #1   Every dog has
          his day.




                                     41
Document Partitioning @ Query Time

                          "dog OR cat"




                                         42
Term Partitioning @ Index Time
                                 day, 1
                                 dog, 1
     #1   Every dog has
          his day.
                                 every, 1
                                 has, 1
                                 his, 1




                                            43
Term Partitioning @ Index Time



                       dog, 1
     day, 1                      has, 1




     his, 1                      every, 1



                                            44
Term Partitioning @ Query Time

                           "dog OR cat"




                                          45
Tradeoffs...


    Document Partitioning       Term Partitioning

   + Lower Latency Queries   - Higher Latency Queries
   - Lower Throughput        + Higher Throughput
   - Lots of Disk Seeks      - Hotspots in Ring
                               (the "Obama" problem)




                                                        46
Riak Search: Term Partitioning

Term-partitioning is the most viable approach for our beta
clients’ needs: high throughput on Really Big Datasets.
Optimizations:
     • Term splitting to reduce hot spots
     • Bloom filters & caching to save query-time bandwidth
     • Batching to save query-time & index-time bandwidth
Support for either approach eventually.




                                                             47
Part Four
 Review




            48
Riak Search turns this...


                  "Converse AND Shoes"




                              WTF!? I'm a
                               KV store!



        CLIENT                             RIAK
                                                  49
...into this...


                  "Converse AND Shoes"




                                Gladly!



           CLIENT                         RIAK
                                                 50
...into this...


                  "Converse AND Shoes"




                    Keys or Objects




           CLIENT                        RIAK
                                                51
...while keeping operations easy.




        Your            Riak
                                    Riak
    Application       Search




                                           52
Thanks! Questions?

                        Search Team:

                        John Muellerleile - @jrecursive

                        Rusty Klophaus - @rklophaus

                        Kevin Smith - @kevsmith



Currently working with a small set of Beta users.
Open-source release planned for Q3.
www.basho.com

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (6)

Riak Search - The Next Generation
Riak Search - The Next GenerationRiak Search - The Next Generation
Riak Search - The Next Generation
 
Riak Search 2: Yokozuna
Riak Search 2: YokozunaRiak Search 2: Yokozuna
Riak Search 2: Yokozuna
 
Building Distributed Systems With Riak and Riak Core
Building Distributed Systems With Riak and Riak CoreBuilding Distributed Systems With Riak and Riak Core
Building Distributed Systems With Riak and Riak Core
 
Riak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared StateRiak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared State
 
Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010
 
Convergent Replicated Data Types in Riak 2.0
Convergent Replicated Data Types in Riak 2.0Convergent Replicated Data Types in Riak 2.0
Convergent Replicated Data Types in Riak 2.0
 

Ähnlich wie Riak Search - Berlin Buzzwords 2010

Essential action script 3.0
Essential action script 3.0Essential action script 3.0
Essential action script 3.0
UltimateCodeX
 
Introducing Riak
Introducing RiakIntroducing Riak
Introducing Riak
Kevin Smith
 
Introducing Riak
Introducing RiakIntroducing Riak
Introducing Riak
Kevin Smith
 
Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...
Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...
Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...
Lucidworks
 
MonoRails - GoGaRuCo 2012
MonoRails - GoGaRuCo 2012MonoRails - GoGaRuCo 2012
MonoRails - GoGaRuCo 2012
jackdanger
 

Ähnlich wie Riak Search - Berlin Buzzwords 2010 (14)

Riak Search - Erlang Factory London 2010
Riak Search - Erlang Factory London 2010Riak Search - Erlang Factory London 2010
Riak Search - Erlang Factory London 2010
 
Riak - From Small to Large - StrangeLoop
Riak - From Small to Large - StrangeLoopRiak - From Small to Large - StrangeLoop
Riak - From Small to Large - StrangeLoop
 
Avatara: OLAP for Web-scale Analytics Products
Avatara: OLAP for Web-scale Analytics Products Avatara: OLAP for Web-scale Analytics Products
Avatara: OLAP for Web-scale Analytics Products
 
Essential action script 3.0
Essential action script 3.0Essential action script 3.0
Essential action script 3.0
 
Eclipse IDE for Scala (2.9 story)
Eclipse IDE for Scala (2.9 story)Eclipse IDE for Scala (2.9 story)
Eclipse IDE for Scala (2.9 story)
 
Personalized Search on the Largest Flash Sale Site in America
Personalized Search on the Largest Flash Sale Site in AmericaPersonalized Search on the Largest Flash Sale Site in America
Personalized Search on the Largest Flash Sale Site in America
 
Introducing Riak
Introducing RiakIntroducing Riak
Introducing Riak
 
Introducing Riak
Introducing RiakIntroducing Riak
Introducing Riak
 
Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...
Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...
Distributed Search in Riak - Integrating Search in a NoSQL Database: Presente...
 
MonoRails - GoGaRuCo 2012
MonoRails - GoGaRuCo 2012MonoRails - GoGaRuCo 2012
MonoRails - GoGaRuCo 2012
 
When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?
 
Jazoon 2011 - Smart EAI with Apache Camel
Jazoon 2011 - Smart EAI with Apache CamelJazoon 2011 - Smart EAI with Apache Camel
Jazoon 2011 - Smart EAI with Apache Camel
 
Riak CS in Cloudstack
Riak CS in CloudstackRiak CS in Cloudstack
Riak CS in Cloudstack
 
TorqueBox for Rubyists
TorqueBox for RubyistsTorqueBox for Rubyists
TorqueBox for Rubyists
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Riak Search - Berlin Buzzwords 2010