SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Downloaden Sie, um offline zu lesen
Automata Invasion


          Michael McCandless
              Robert Muir




                  Lucene Revolution
                  May 9, 2012
Agenda



● Who we are
● Overview of finite state automata/transducers
● Three Lucene implementations
  ○ Automaton
  ○ FST
  ○ TokenStream
● Applications in Lucene
Who we are


● Lucene/Solr committer, PMC member
    ○ 6 years = old hat!
●   Tika committer, PMC member
●   Co-author Lucene in Action, 2nd ed.
●   http://blog.mikemccandless.com
●   Sponsored by IBM (thank you!)
Who we are


● Lucene/Solr committer, PMC member
   ○ 3 years = young hat, yet: less hair!
● Lucid Imagination employee
● Background in internationalization
● Tangled up in finite state
   ○ NLP tasks
   ○ One of my first contributions to Lucene
● http://twitter.com/rcmuir
Agenda



● Who we are
● Overview of finite state automata/transducers
● Three Lucene implementations
  ○ Automaton
  ○ FST
  ○ TokenStream
● Applications in Lucene
Finite State Automata (FSA)




● Set<int[]>:
    ○ mop, moth, pop, slop, sloth, stop, top
●   Start node, final node
●   Append arc labels as you traverse
●   Traverse all paths to get all strings
●   Determinize, minimize, union, intersect, ...
Finite State Transducer (FST)




● Adds optional output to each arc
  ○ Accumulate output as you traverse
● Map<int[],T>
  ○ mop: 0, moth: 1, pop: 2, slop: 3, sloth: 4, stop: 5, top: 6
● T is pluggable (defined by output algebra)
● Deep theory, many algorithms...
Agenda



● Who we are
● Overview of finite state automata/transducers
● Three Lucene implementations
  ○ Automaton
  ○ FST
  ○ TokenStream
● Applications in Lucene
Lucene's FSA Implementation

● org.apache.lucene.util.automaton.*
  ○ Poached from Anders Møller's Brics http://www.
    brics.dk/automaton
● Build up any FSA one node/arc at a time
  ○ Automaton, Transition, State
  ○ Transition has min/max label
● Many standard algorithms
  ○   Minimize
  ○   Determinize
  ○   Intersect
  ○   Union
  ○   Regexp -> Automaton
● RunAutomaton for stepping
Lucene's FST Implementation

● org.apache.lucene.util.fst.*
● FST encoded as a byte[]
● Write-once API
   ○   From Mihov & Maurel paper
   ○   Build minimal, acyclic FST from pre-sorted inputs
   ○   Fast (linear time with input size), low memory
   ○   Optional two-pass packing can shrink by ~25%
● Traverse FST one arc at a time
   ○ Decode byte[] during lookup
● SortedMap<int[],T>: arcs are sorted by label
   ○ getByOutput also possible if outputs are sorted
● http://s.apache.org/LuceneFSTs
Lucene's FST Implementation

 Downside: Generics Policeman does not approve!




 @UweSays "I looked at the code, it was
 ununderstandable why this thing was generified"
Lucene's TokenStreams are FSAs!

● Streaming FSA, one arc at a time


● New PositionLengthAttribute in 3.6.0
  ○   How many positions does the token "span"
  ○   TokenStreamToAutomaton
  ○   Indexer ignores it (sausage!)
  ○   http://s.apache.org/TokenGraphs
Lucene's TokenStreams are FSAs!

● Fixing all analyzers to behave with graphs?
Agenda



● Who we are
● Overview of finite state automata/transducers
● Three Lucene implementations
  ○ Automaton
  ○ FST
  ○ TokenStream
● Applications in Lucene
AutomatonQuery


● Automaton specifies which terms match
● New Terms.intersect(Automaton) method
  ○ Visits all matching terms & docs
● Example
  ○ RegexpQuery
  ○ WildcardQuery
  ○ FuzzyQuery
● Other possible uses
  ○ Stemming at query time via expansion
How to implement algorithm
from 67-page paper
Hands-On
FuzzyQuery



● Lev1(dogs)
● 100X faster!
● Schulz and Mihov
  algo is hairy
● Help from Jean-Philippe
  Barrette-LaPierre
● Includes transpositions
● UTF32toUTF8 conversion
● http://s.apache.org/FuzzyQuery
DirectSpellChecker

● Use LevT automata to find respellings
● Decent performance
     ○ ~47 QPS on Wikipedia, single thread
● No side-car index required!


"I'm confident that in three
weeks, I'll be done."
JapaneseTokenizer (Kuromoji)


● Donated by Atilika Inc. (アティリカ株式会社)
● Hard problem! (no whitespace, mixed Kanji,
  Hiragana, Katakana, compounds)
● Dictionaries are stored as FST
  ○ System dictionary and user dictionaries
  ○ 11.8X smaller than double array trie
● Viterbi search finds least-cost segmentation
● Token graph for compound words
  ○ ショッピングセンター (shopping center),
  ○ ショッピング(shopping), センター (center)
Query Suggesters


● E.g: w e a suggests weather
● Compile all suggestible queries + weights
  into FST(s)
● Two FST based suggesters
   ○ FSTSuggester quantizes weights into buckets
   ○ WFSTSuggester doesn't
● Find all paths after user's prefix
● Fast: ~240K lookups/sec
● Active work in progress
   ○ Fuzzy, analyzing, ngram
WFSTSuggester example

  wacky|1      "I don't think this will work but I can't
  waffle|4     provide a counterexample right now"

  wealthy|3
  weather|10
  weaver|7
SynonymFilter




● Apply at index time or query time or both
● First version used recursive maps
● New version (in 3.4.0) uses FST
  ○ 5X faster filter time
  ○ 14X faster build time
  ○ 59X less RAM
● Multi-token synonyms mess up graph
● Cannot consume token graph
● http://s.apache.org/TokenGraphs
BlockTree Terms Dictionary




 http://s.apache.org/LuceneRespellPerf
BlockTree Terms Dictionary


● Terms dict maps terms to metadata/postings
   ○ SortedMap<Term,Postings>
   ○ Pluggable (per codec)
● Term blocks on disk; FST index in memory
   ○ Variable number of terms per block (vs 3.x)
● Variant of a burst trie (Heinz, Zobel, Williams)
   ○ Terms assigned to blocks by shared prefix, e.g.
     http://www.*
● Fast intersect(Automaton)
● Fast "term can't exist"
BlockTree Example

able
above
apple
perfect
preface
prefecture
prefix
previous
profit
programmer
project
zoo
BlockTree Example

             ble
a*           bove
perfect      pple
preface
prefecture
prefix
previous
profit
programmer
project
zoo
BlockTree Example

         ble
a*       bove
p*       pple
zoo

         erfect
         reface
         refecture
         refix
         revious
         rofit
         rogrammer
         roject
BlockTree Example

         ble
a*       bove
p*       pple
zoo                 face
                    fecture
         erfect     fix
         re*        vious
         ro*
                    fit
                    grammer
                    ject
BlockTree Example

         ble
a*       bove
p*       pple
zoo                 face
                    fecture
         erfect     fix
         re*        vious
         ro*
                    fit
                    grammer
                    ject
MemoryPostingsFormat




 http://s.apache.org/LucenePKPerf
MemoryPostingsFormat



● Postings format is pluggable per-field
● Store all terms + postings in an FST
   ○ FST is saved to disk
● Great match for primary key fields, date filters
   ○ 2.8X faster PK lookup
● http://s.apache.org/MemPostingFormat
Future FSA/T Usages


● MappingCharFilter (in progress...)
● FieldCache/DocValues (prototype patch)
● FSTQuery?
● More suggesters
● Top-N most frequent terms for approx
  distributed IDF
● .... patches welcome!
Conclusions

● FSA/Ts are a good match for Lucene
● Lucene has three wildly different
  implementations
● FSA/Ts are now used in a number of
  places...
● ... and more coming
More Information


● Finite State Automata in Lucene
  ○ Dawid Weiss: Lucene Revolution 2011
  ○ http://slidesha.re/vKtpVA
● FST API code samples
  ○ http://s.apache.org/pR

Weitere ähnliche Inhalte

Was ist angesagt?

整数列圧縮
整数列圧縮整数列圧縮
整数列圧縮JAVA DM
 
Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)
Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)
Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)NTT DATA OSS Professional Services
 
ソーシャルアプリにおけるRedisの活用事例とトラブル事例
ソーシャルアプリにおけるRedisの活用事例とトラブル事例ソーシャルアプリにおけるRedisの活用事例とトラブル事例
ソーシャルアプリにおけるRedisの活用事例とトラブル事例leverages_event
 
RSA暗号運用でやってはいけない n のこと #ssmjp
RSA暗号運用でやってはいけない n のこと #ssmjpRSA暗号運用でやってはいけない n のこと #ssmjp
RSA暗号運用でやってはいけない n のこと #ssmjpsonickun
 
「GraphDB徹底入門」〜構造や仕組み理解から使いどころ・種々のGraphDBの比較まで幅広く〜
「GraphDB徹底入門」〜構造や仕組み理解から使いどころ・種々のGraphDBの比較まで幅広く〜「GraphDB徹底入門」〜構造や仕組み理解から使いどころ・種々のGraphDBの比較まで幅広く〜
「GraphDB徹底入門」〜構造や仕組み理解から使いどころ・種々のGraphDBの比較まで幅広く〜Takahiro Inoue
 
第17回Lucene/Solr勉強会 #SolrJP – Apache Lucene Solrによる形態素解析の課題とN-bestの提案
第17回Lucene/Solr勉強会 #SolrJP – Apache Lucene Solrによる形態素解析の課題とN-bestの提案第17回Lucene/Solr勉強会 #SolrJP – Apache Lucene Solrによる形態素解析の課題とN-bestの提案
第17回Lucene/Solr勉強会 #SolrJP – Apache Lucene Solrによる形態素解析の課題とN-bestの提案Yahoo!デベロッパーネットワーク
 
WASM(WebAssembly)入門 ペアリング演算やってみた
WASM(WebAssembly)入門 ペアリング演算やってみたWASM(WebAssembly)入門 ペアリング演算やってみた
WASM(WebAssembly)入門 ペアリング演算やってみたMITSUNARI Shigeo
 
超実践 Cloud Spanner 設計講座
超実践 Cloud Spanner 設計講座超実践 Cloud Spanner 設計講座
超実践 Cloud Spanner 設計講座Samir Hammoudi
 
ラムダ計算入門
ラムダ計算入門ラムダ計算入門
ラムダ計算入門Eita Sugimoto
 
マイクロサービスバックエンドAPIのためのRESTとgRPC
マイクロサービスバックエンドAPIのためのRESTとgRPCマイクロサービスバックエンドAPIのためのRESTとgRPC
マイクロサービスバックエンドAPIのためのRESTとgRPCdisc99_
 
Elasticsearch の検索精度のチューニング 〜テストを作って高速かつ安全に〜
Elasticsearch の検索精度のチューニング 〜テストを作って高速かつ安全に〜Elasticsearch の検索精度のチューニング 〜テストを作って高速かつ安全に〜
Elasticsearch の検索精度のチューニング 〜テストを作って高速かつ安全に〜Takahiko Ito
 
ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方Yoshiyasu SAEKI
 
"SRv6の現状と展望" ENOG53@上越
"SRv6の現状と展望" ENOG53@上越"SRv6の現状と展望" ENOG53@上越
"SRv6の現状と展望" ENOG53@上越Kentaro Ebisawa
 
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜Ryoma Sin'ya
 
マイクロにしすぎた結果がこれだよ!
マイクロにしすぎた結果がこれだよ!マイクロにしすぎた結果がこれだよ!
マイクロにしすぎた結果がこれだよ!mosa siru
 
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...NTT DATA Technology & Innovation
 
情報科学における18のメタテクニック
情報科学における18のメタテクニック情報科学における18のメタテクニック
情報科学における18のメタテクニックnakano_lab
 

Was ist angesagt? (20)

整数列圧縮
整数列圧縮整数列圧縮
整数列圧縮
 
Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)
Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)
Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)
 
Oss貢献超入門
Oss貢献超入門Oss貢献超入門
Oss貢献超入門
 
ソーシャルアプリにおけるRedisの活用事例とトラブル事例
ソーシャルアプリにおけるRedisの活用事例とトラブル事例ソーシャルアプリにおけるRedisの活用事例とトラブル事例
ソーシャルアプリにおけるRedisの活用事例とトラブル事例
 
RSA暗号運用でやってはいけない n のこと #ssmjp
RSA暗号運用でやってはいけない n のこと #ssmjpRSA暗号運用でやってはいけない n のこと #ssmjp
RSA暗号運用でやってはいけない n のこと #ssmjp
 
Marp Tutorial
Marp TutorialMarp Tutorial
Marp Tutorial
 
「GraphDB徹底入門」〜構造や仕組み理解から使いどころ・種々のGraphDBの比較まで幅広く〜
「GraphDB徹底入門」〜構造や仕組み理解から使いどころ・種々のGraphDBの比較まで幅広く〜「GraphDB徹底入門」〜構造や仕組み理解から使いどころ・種々のGraphDBの比較まで幅広く〜
「GraphDB徹底入門」〜構造や仕組み理解から使いどころ・種々のGraphDBの比較まで幅広く〜
 
第17回Lucene/Solr勉強会 #SolrJP – Apache Lucene Solrによる形態素解析の課題とN-bestの提案
第17回Lucene/Solr勉強会 #SolrJP – Apache Lucene Solrによる形態素解析の課題とN-bestの提案第17回Lucene/Solr勉強会 #SolrJP – Apache Lucene Solrによる形態素解析の課題とN-bestの提案
第17回Lucene/Solr勉強会 #SolrJP – Apache Lucene Solrによる形態素解析の課題とN-bestの提案
 
WASM(WebAssembly)入門 ペアリング演算やってみた
WASM(WebAssembly)入門 ペアリング演算やってみたWASM(WebAssembly)入門 ペアリング演算やってみた
WASM(WebAssembly)入門 ペアリング演算やってみた
 
超実践 Cloud Spanner 設計講座
超実践 Cloud Spanner 設計講座超実践 Cloud Spanner 設計講座
超実践 Cloud Spanner 設計講座
 
ラムダ計算入門
ラムダ計算入門ラムダ計算入門
ラムダ計算入門
 
マイクロサービスバックエンドAPIのためのRESTとgRPC
マイクロサービスバックエンドAPIのためのRESTとgRPCマイクロサービスバックエンドAPIのためのRESTとgRPC
マイクロサービスバックエンドAPIのためのRESTとgRPC
 
Elasticsearch の検索精度のチューニング 〜テストを作って高速かつ安全に〜
Elasticsearch の検索精度のチューニング 〜テストを作って高速かつ安全に〜Elasticsearch の検索精度のチューニング 〜テストを作って高速かつ安全に〜
Elasticsearch の検索精度のチューニング 〜テストを作って高速かつ安全に〜
 
ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方
 
"SRv6の現状と展望" ENOG53@上越
"SRv6の現状と展望" ENOG53@上越"SRv6の現状と展望" ENOG53@上越
"SRv6の現状と展望" ENOG53@上越
 
RDF Semantic Graph「RDF 超入門」
RDF Semantic Graph「RDF 超入門」RDF Semantic Graph「RDF 超入門」
RDF Semantic Graph「RDF 超入門」
 
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜
AVX2時代の正規表現マッチング 〜半群でぐんぐん!〜
 
マイクロにしすぎた結果がこれだよ!
マイクロにしすぎた結果がこれだよ!マイクロにしすぎた結果がこれだよ!
マイクロにしすぎた結果がこれだよ!
 
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
 
情報科学における18のメタテクニック
情報科学における18のメタテクニック情報科学における18のメタテクニック
情報科学における18のメタテクニック
 

Andere mochten auch

Text tagging with finite state transducers
Text tagging with finite state transducersText tagging with finite state transducers
Text tagging with finite state transducerslucenerevolution
 
Block Emulation and Computation in One-dimensional Cellular Automata: Breakin...
Block Emulation and Computation in One-dimensional Cellular Automata: Breakin...Block Emulation and Computation in One-dimensional Cellular Automata: Breakin...
Block Emulation and Computation in One-dimensional Cellular Automata: Breakin...Jurgen Riedel
 
online payment system using Steganography and Visual cryptography
online payment system using Steganography and Visual cryptographyonline payment system using Steganography and Visual cryptography
online payment system using Steganography and Visual cryptographyShahrukh Ali
 
Visual Cryptography Industrial Training Report
Visual Cryptography Industrial Training ReportVisual Cryptography Industrial Training Report
Visual Cryptography Industrial Training ReportMohit Kumar
 
Scene Text Detection on Images using Cellular Automata
Scene Text Detection on Images using Cellular AutomataScene Text Detection on Images using Cellular Automata
Scene Text Detection on Images using Cellular AutomataKonstantinos Zagoris
 
類義語検索と類義語ハイライト
類義語検索と類義語ハイライト類義語検索と類義語ハイライト
類義語検索と類義語ハイライトShinichiro Abe
 
Visual Cryptography
Visual CryptographyVisual Cryptography
Visual CryptographyAneeshGKumar
 
steganography using genetic algorithm along with visual cryptography for wire...
steganography using genetic algorithm along with visual cryptography for wire...steganography using genetic algorithm along with visual cryptography for wire...
steganography using genetic algorithm along with visual cryptography for wire...Aparna Nk
 
Automata
AutomataAutomata
AutomataGaditek
 
Online paymentusingsteganographt&Visualcryptography
Online paymentusingsteganographt&VisualcryptographyOnline paymentusingsteganographt&Visualcryptography
Online paymentusingsteganographt&VisualcryptographyNagarjuna mahanti
 
Steganography using visual cryptography: Report
Steganography using visual cryptography: ReportSteganography using visual cryptography: Report
Steganography using visual cryptography: ReportAparna Nk
 
Cellular automata by Devdutta Chakrabarti
Cellular automata by Devdutta ChakrabartiCellular automata by Devdutta Chakrabarti
Cellular automata by Devdutta ChakrabartiDevdutta Chakrabarti
 
Steganography using visual cryptography
Steganography using visual cryptographySteganography using visual cryptography
Steganography using visual cryptographySaurabh Nambiar
 

Andere mochten auch (20)

Text tagging with finite state transducers
Text tagging with finite state transducersText tagging with finite state transducers
Text tagging with finite state transducers
 
Block Emulation and Computation in One-dimensional Cellular Automata: Breakin...
Block Emulation and Computation in One-dimensional Cellular Automata: Breakin...Block Emulation and Computation in One-dimensional Cellular Automata: Breakin...
Block Emulation and Computation in One-dimensional Cellular Automata: Breakin...
 
online payment system using Steganography and Visual cryptography
online payment system using Steganography and Visual cryptographyonline payment system using Steganography and Visual cryptography
online payment system using Steganography and Visual cryptography
 
Visual Cryptography Industrial Training Report
Visual Cryptography Industrial Training ReportVisual Cryptography Industrial Training Report
Visual Cryptography Industrial Training Report
 
Visual cryptography
Visual cryptographyVisual cryptography
Visual cryptography
 
Finite automata
Finite automataFinite automata
Finite automata
 
Scene Text Detection on Images using Cellular Automata
Scene Text Detection on Images using Cellular AutomataScene Text Detection on Images using Cellular Automata
Scene Text Detection on Images using Cellular Automata
 
類義語検索と類義語ハイライト
類義語検索と類義語ハイライト類義語検索と類義語ハイライト
類義語検索と類義語ハイライト
 
Visual Cryptography
Visual CryptographyVisual Cryptography
Visual Cryptography
 
steganography using genetic algorithm along with visual cryptography for wire...
steganography using genetic algorithm along with visual cryptography for wire...steganography using genetic algorithm along with visual cryptography for wire...
steganography using genetic algorithm along with visual cryptography for wire...
 
Vc pred
Vc predVc pred
Vc pred
 
Automata
AutomataAutomata
Automata
 
Cellular Automata
Cellular AutomataCellular Automata
Cellular Automata
 
Online paymentusingsteganographt&Visualcryptography
Online paymentusingsteganographt&VisualcryptographyOnline paymentusingsteganographt&Visualcryptography
Online paymentusingsteganographt&Visualcryptography
 
Steganography using visual cryptography: Report
Steganography using visual cryptography: ReportSteganography using visual cryptography: Report
Steganography using visual cryptography: Report
 
Cellular automata by Devdutta Chakrabarti
Cellular automata by Devdutta ChakrabartiCellular automata by Devdutta Chakrabarti
Cellular automata by Devdutta Chakrabarti
 
Visual Cryptography
Visual CryptographyVisual Cryptography
Visual Cryptography
 
Finite automata
Finite automataFinite automata
Finite automata
 
Steganography using visual cryptography
Steganography using visual cryptographySteganography using visual cryptography
Steganography using visual cryptography
 
Visual cryptography1
Visual cryptography1Visual cryptography1
Visual cryptography1
 

Ähnlich wie Automata Invasion

Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Grigory Sapunov
 
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaBuilding a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaMushfekur Rahman
 
MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103Linaro
 
M|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleM|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleMariaDB plc
 
Linux Locking Mechanisms
Linux Locking MechanismsLinux Locking Mechanisms
Linux Locking MechanismsKernel TLV
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...Rob Skillington
 
Introduction to redis - version 2
Introduction to redis - version 2Introduction to redis - version 2
Introduction to redis - version 2Dvir Volk
 
Cassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data ModelingCassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data ModelingMatthew Dennis
 
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)Kai Chan
 
Code Generation Cambridge 2013 Introduction to Parsing with ANTLR4
Code Generation Cambridge 2013  Introduction to Parsing with ANTLR4Code Generation Cambridge 2013  Introduction to Parsing with ANTLR4
Code Generation Cambridge 2013 Introduction to Parsing with ANTLR4Oliver Zeigermann
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodbPGConf APAC
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartMukesh Singh
 
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...PROIDEA
 
FlinkML - Big data application meetup
FlinkML - Big data application meetupFlinkML - Big data application meetup
FlinkML - Big data application meetupTheodoros Vasiloudis
 
Go and Uber’s time series database m3
Go and Uber’s time series database m3Go and Uber’s time series database m3
Go and Uber’s time series database m3Rob Skillington
 
funcs, func expressions, closure, returning funcs, recursion, the stack -goph...
funcs, func expressions, closure, returning funcs, recursion, the stack -goph...funcs, func expressions, closure, returning funcs, recursion, the stack -goph...
funcs, func expressions, closure, returning funcs, recursion, the stack -goph...sangam biradar
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
 
Lint, coverage, doc, autocompletion, transpilation, minification... powered b...
Lint, coverage, doc, autocompletion, transpilation, minification... powered b...Lint, coverage, doc, autocompletion, transpilation, minification... powered b...
Lint, coverage, doc, autocompletion, transpilation, minification... powered b...Alexandre Morgaut
 
Java Concurrency in Practice
Java Concurrency in PracticeJava Concurrency in Practice
Java Concurrency in PracticeAlina Dolgikh
 

Ähnlich wie Automata Invasion (20)

Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)
 
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaBuilding a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
 
MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103
 
M|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleM|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScale
 
Linux Locking Mechanisms
Linux Locking MechanismsLinux Locking Mechanisms
Linux Locking Mechanisms
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
 
Introduction to redis - version 2
Introduction to redis - version 2Introduction to redis - version 2
Introduction to redis - version 2
 
Cassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data ModelingCassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data Modeling
 
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
 
Code Generation Cambridge 2013 Introduction to Parsing with ANTLR4
Code Generation Cambridge 2013  Introduction to Parsing with ANTLR4Code Generation Cambridge 2013  Introduction to Parsing with ANTLR4
Code Generation Cambridge 2013 Introduction to Parsing with ANTLR4
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
 
FlinkML - Big data application meetup
FlinkML - Big data application meetupFlinkML - Big data application meetup
FlinkML - Big data application meetup
 
Presentation alfonso romero
Presentation alfonso romeroPresentation alfonso romero
Presentation alfonso romero
 
Go and Uber’s time series database m3
Go and Uber’s time series database m3Go and Uber’s time series database m3
Go and Uber’s time series database m3
 
funcs, func expressions, closure, returning funcs, recursion, the stack -goph...
funcs, func expressions, closure, returning funcs, recursion, the stack -goph...funcs, func expressions, closure, returning funcs, recursion, the stack -goph...
funcs, func expressions, closure, returning funcs, recursion, the stack -goph...
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
Lint, coverage, doc, autocompletion, transpilation, minification... powered b...
Lint, coverage, doc, autocompletion, transpilation, minification... powered b...Lint, coverage, doc, autocompletion, transpilation, minification... powered b...
Lint, coverage, doc, autocompletion, transpilation, minification... powered b...
 
Java Concurrency in Practice
Java Concurrency in PracticeJava Concurrency in Practice
Java Concurrency in Practice
 

Mehr von lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

Mehr von lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Kürzlich hochgeladen

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Kürzlich hochgeladen (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Automata Invasion

  • 1. Automata Invasion Michael McCandless Robert Muir Lucene Revolution May 9, 2012
  • 2. Agenda ● Who we are ● Overview of finite state automata/transducers ● Three Lucene implementations ○ Automaton ○ FST ○ TokenStream ● Applications in Lucene
  • 3. Who we are ● Lucene/Solr committer, PMC member ○ 6 years = old hat! ● Tika committer, PMC member ● Co-author Lucene in Action, 2nd ed. ● http://blog.mikemccandless.com ● Sponsored by IBM (thank you!)
  • 4. Who we are ● Lucene/Solr committer, PMC member ○ 3 years = young hat, yet: less hair! ● Lucid Imagination employee ● Background in internationalization ● Tangled up in finite state ○ NLP tasks ○ One of my first contributions to Lucene ● http://twitter.com/rcmuir
  • 5. Agenda ● Who we are ● Overview of finite state automata/transducers ● Three Lucene implementations ○ Automaton ○ FST ○ TokenStream ● Applications in Lucene
  • 6. Finite State Automata (FSA) ● Set<int[]>: ○ mop, moth, pop, slop, sloth, stop, top ● Start node, final node ● Append arc labels as you traverse ● Traverse all paths to get all strings ● Determinize, minimize, union, intersect, ...
  • 7. Finite State Transducer (FST) ● Adds optional output to each arc ○ Accumulate output as you traverse ● Map<int[],T> ○ mop: 0, moth: 1, pop: 2, slop: 3, sloth: 4, stop: 5, top: 6 ● T is pluggable (defined by output algebra) ● Deep theory, many algorithms...
  • 8. Agenda ● Who we are ● Overview of finite state automata/transducers ● Three Lucene implementations ○ Automaton ○ FST ○ TokenStream ● Applications in Lucene
  • 9. Lucene's FSA Implementation ● org.apache.lucene.util.automaton.* ○ Poached from Anders Møller's Brics http://www. brics.dk/automaton ● Build up any FSA one node/arc at a time ○ Automaton, Transition, State ○ Transition has min/max label ● Many standard algorithms ○ Minimize ○ Determinize ○ Intersect ○ Union ○ Regexp -> Automaton ● RunAutomaton for stepping
  • 10. Lucene's FST Implementation ● org.apache.lucene.util.fst.* ● FST encoded as a byte[] ● Write-once API ○ From Mihov & Maurel paper ○ Build minimal, acyclic FST from pre-sorted inputs ○ Fast (linear time with input size), low memory ○ Optional two-pass packing can shrink by ~25% ● Traverse FST one arc at a time ○ Decode byte[] during lookup ● SortedMap<int[],T>: arcs are sorted by label ○ getByOutput also possible if outputs are sorted ● http://s.apache.org/LuceneFSTs
  • 11. Lucene's FST Implementation Downside: Generics Policeman does not approve! @UweSays "I looked at the code, it was ununderstandable why this thing was generified"
  • 12. Lucene's TokenStreams are FSAs! ● Streaming FSA, one arc at a time ● New PositionLengthAttribute in 3.6.0 ○ How many positions does the token "span" ○ TokenStreamToAutomaton ○ Indexer ignores it (sausage!) ○ http://s.apache.org/TokenGraphs
  • 13. Lucene's TokenStreams are FSAs! ● Fixing all analyzers to behave with graphs?
  • 14. Agenda ● Who we are ● Overview of finite state automata/transducers ● Three Lucene implementations ○ Automaton ○ FST ○ TokenStream ● Applications in Lucene
  • 15. AutomatonQuery ● Automaton specifies which terms match ● New Terms.intersect(Automaton) method ○ Visits all matching terms & docs ● Example ○ RegexpQuery ○ WildcardQuery ○ FuzzyQuery ● Other possible uses ○ Stemming at query time via expansion
  • 16. How to implement algorithm from 67-page paper Hands-On
  • 17.
  • 18. FuzzyQuery ● Lev1(dogs) ● 100X faster! ● Schulz and Mihov algo is hairy ● Help from Jean-Philippe Barrette-LaPierre ● Includes transpositions ● UTF32toUTF8 conversion ● http://s.apache.org/FuzzyQuery
  • 19. DirectSpellChecker ● Use LevT automata to find respellings ● Decent performance ○ ~47 QPS on Wikipedia, single thread ● No side-car index required! "I'm confident that in three weeks, I'll be done."
  • 20. JapaneseTokenizer (Kuromoji) ● Donated by Atilika Inc. (アティリカ株式会社) ● Hard problem! (no whitespace, mixed Kanji, Hiragana, Katakana, compounds) ● Dictionaries are stored as FST ○ System dictionary and user dictionaries ○ 11.8X smaller than double array trie ● Viterbi search finds least-cost segmentation ● Token graph for compound words ○ ショッピングセンター (shopping center), ○ ショッピング(shopping), センター (center)
  • 21. Query Suggesters ● E.g: w e a suggests weather ● Compile all suggestible queries + weights into FST(s) ● Two FST based suggesters ○ FSTSuggester quantizes weights into buckets ○ WFSTSuggester doesn't ● Find all paths after user's prefix ● Fast: ~240K lookups/sec ● Active work in progress ○ Fuzzy, analyzing, ngram
  • 22. WFSTSuggester example wacky|1 "I don't think this will work but I can't waffle|4 provide a counterexample right now" wealthy|3 weather|10 weaver|7
  • 23. SynonymFilter ● Apply at index time or query time or both ● First version used recursive maps ● New version (in 3.4.0) uses FST ○ 5X faster filter time ○ 14X faster build time ○ 59X less RAM ● Multi-token synonyms mess up graph ● Cannot consume token graph ● http://s.apache.org/TokenGraphs
  • 24. BlockTree Terms Dictionary http://s.apache.org/LuceneRespellPerf
  • 25. BlockTree Terms Dictionary ● Terms dict maps terms to metadata/postings ○ SortedMap<Term,Postings> ○ Pluggable (per codec) ● Term blocks on disk; FST index in memory ○ Variable number of terms per block (vs 3.x) ● Variant of a burst trie (Heinz, Zobel, Williams) ○ Terms assigned to blocks by shared prefix, e.g. http://www.* ● Fast intersect(Automaton) ● Fast "term can't exist"
  • 27. BlockTree Example ble a* bove perfect pple preface prefecture prefix previous profit programmer project zoo
  • 28. BlockTree Example ble a* bove p* pple zoo erfect reface refecture refix revious rofit rogrammer roject
  • 29. BlockTree Example ble a* bove p* pple zoo face fecture erfect fix re* vious ro* fit grammer ject
  • 30. BlockTree Example ble a* bove p* pple zoo face fecture erfect fix re* vious ro* fit grammer ject
  • 32. MemoryPostingsFormat ● Postings format is pluggable per-field ● Store all terms + postings in an FST ○ FST is saved to disk ● Great match for primary key fields, date filters ○ 2.8X faster PK lookup ● http://s.apache.org/MemPostingFormat
  • 33. Future FSA/T Usages ● MappingCharFilter (in progress...) ● FieldCache/DocValues (prototype patch) ● FSTQuery? ● More suggesters ● Top-N most frequent terms for approx distributed IDF ● .... patches welcome!
  • 34. Conclusions ● FSA/Ts are a good match for Lucene ● Lucene has three wildly different implementations ● FSA/Ts are now used in a number of places... ● ... and more coming
  • 35. More Information ● Finite State Automata in Lucene ○ Dawid Weiss: Lucene Revolution 2011 ○ http://slidesha.re/vKtpVA ● FST API code samples ○ http://s.apache.org/pR