Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

keyvi the key value index @ Cliqz

946 Aufrufe

Veröffentlicht am

An introduction to keyvi and how Cliqz switched from Redis to keyvi to power it's search.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

keyvi the key value index @ Cliqz

  1. 1. keyvi - the key value index Or How did we build a large scale low-latency search-engine with keyvi? Hendrik Muhs <hendrik.muhs@gmail.com>
  2. 2. BASED IN MUNICH MAJORITY-OWNED BY HUBERT BURDA MEDIA INTERNATIONAL TEAM OF 90 EXPERTS WE COMBINE THE POWER OF DATA, SEARCH, AND BROWSERS TO REDESIGN THE INTERNET FOR THE USER WE REDESIGN THE INTERNET http://cliqz.com/
  3. 3. Key value index based on finite state, so basically a immutable key value store. Licence: Apache 2.0 (just keyvi, 3rdparty) Language: C++ (core), Python (binding) Runs on: Linux, MacOSX (not tested on Windows) Link: www.keyvi.org Author: me ;-)
  4. 4. Cliqz Search Backend Elasticsearch used in the early days (2014) → Redis own cluster implementation (before Redis cluster), at peak over 100 redis instances in 1 cluster, > 5TB of data, all on AWS → keyvi drop-in replacement for Redis, significantly reduced size (2TB) and number of machines ! Whether redis or keyvi: average latency of 55ms at backend !
  5. 5. Why replace redis? Size extremely efficient storing values low-level access: msgpack & Redis fork to compress even more (zlib) implementation of auto-completion is expensive and slow Runtime single threaded → contention, queuing, timeouts Persistence memory only, loading times of several minutes
  6. 6. Why replace redis? →Redis is great! We still use it a lot! But for 1 of our - and only 1 of our – usecases, we can do better!
  7. 7. started as auto-completion engine caching layer for Redis now providing the complete index (>2 TB) distributed across multiple machines multi-process, fast, reliable, stable @
  8. 8. shared memory model (mmap) multi-core, reliable, no loading (un-serializing) space efficient compact key-space, FSA minimization BUT: keyvi is an immutable store, therefore index (as the underlying data structure of Lucene is) vs. Redis
  9. 9. Workflow has 2 steps: compile/build index using keyvicompiler or via python bindings dump/query using C++ or python API Note: There is no SegmentWriter/Merger/Reader (yet)! Usage
  10. 10. exact matching / simple entity recognition: values can None, integer, string or json approximate matching: close/near match e.g. for Geo applications scoring based: Levenshtein & Co completion matching: prefix, multi-word, fuzzy more on Features
  11. 11. it's fast! extremely fast! it scales: it's compact/small, enables indexing GB's of data it brings FST's to a level of more established data structures like hash tables and B-Trees on one side … … and enables applications not or hardly possible with them (completions, approximate matching, etc.) the gist
  12. 12. http://www.keyvi.org Lot's of content from crashcourse to in-depth check it out!
  13. 13. Questions? Comments! Feedback. Contact: hendrik.muhs@gmail.com check it out!

×