Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Title
Build	
  Realtime	
  Search

From	
  mobile	
  SDK	
  to	
  SaaS,	
  a	
  tech	
  POV	
  


Sylvain	
  Utard
SaaSisB...
• Today	
  Search	
  means	
  Google	
  
• Search	
  is	
  a	
  daily	
  activity	
  
• Search	
  is	
  complex	
  
• DB	
...
• Databases	
  
• Optimized	
  for	
  INSERT/UPDATE/DELETE/
SELECT	
  (that's	
  a	
  lot)	
  
• Structured	
  query	
  sy...
• Search	
  engines	
  
• HIGHLY	
  optimized	
  for	
  “SELECT”	
  (only)	
  
• Full-­‐text	
  queries:	
  understand	
  ...
• Indexing(input=documents)	
  
• Multiple	
  attributes	
  (textual,	
  numerical,	
  geo)	
  
• Search(input=query,	
  o...
• 2	
  distinct	
  processes	
  
• Indexing	
  
• Storing	
  documents	
  in	
  a	
  highly	
  optimized	
  way	
  
• Quer...
• Indexing	
  means	
  building	
  an	
  “index“	
  or	
  “inverted	
  
lists“	
  
• A	
  dedicated	
  data	
  structure	
...
8
Implementation: Indexing process
foo bar baz
Doc 1
bar foo
Doc 2
baz baz qux
Doc 3
foo
bar
baz
qux
Doc 1, Doc 2
Doc 1, D...
• Queries	
  
• Goal	
  =	
  Retrieve	
  all	
  documents	
  matching	
  a	
  
user	
  query	
  
• Order	
  results	
  fro...
10
Implementation: Query Process
foo
bar
baz
qux
Doc 1, Doc 2
Doc 1, Doc 3
Doc 1, Doc 2
Doc 3
Inverted lists
Index
User qu...
11
Implementation: Query Process
foo
bar
baz
qux
Doc 1, Doc 2
Doc 1, Doc 3
Doc 1, Doc 2
Doc 3
Inverted lists
Index
User qu...
12
Database Search
Documents* Database*entries*
• Funded	
  in	
  2012	
  
• 2012	
  →	
  Mar	
  2013	
  
• Mobile-­‐oriented	
  
• Now:	
  SaaS-­‐oriented	
  
• Search	
...
• Embed	
  a	
  Search	
  Engine	
  in	
  your	
  App	
  
• iOS,	
  Android,	
  Windows	
  Phone	
  
• SDK/library	
  prov...
• Search	
  as	
  you	
  type	
  
• Typo-­‐tolerance	
  
• High-­‐performance	
  
• Target	
  most	
  phones	
  
• Startin...
• 10-­‐20	
  queries	
  /	
  sec	
  
• Realtime	
  if	
  <100ms	
  
• 1	
  sec	
  to	
  build	
  	
  a	
  10K	
  entries	
...
• Same	
  issues	
  on	
  websites	
  &	
  apps	
  
• Used	
  to	
  Google/Amazon:	
  it	
  just	
  works	
  
• Poor	
  se...
18
Hosted Search
1. Push	
  a	
  copy	
  of	
  your	
  data	
  
2. Get	
  blazing	
  fast	
  search
• Open-­‐source	
  
• ElasticSearch,	
  Solr,	
  Sphinx	
  
• Commercial	
  
• Hosted	
  ElasticSearch/Solr/Sphinx	
  
• E...
• Mostly	
  document	
  oriented	
  
• Designed	
  to	
  search	
  in	
  “big”	
  documents	
  
• Statistical	
  ranking	
...
• Database	
  Search	
  
• Semi-­‐structured	
  objects	
  (multiple	
  
attributes)	
  
• Give	
  importance	
  to	
  the...
• No	
  stats,	
  no	
  TF-­‐IDF,	
  no	
  “score”	
  
• Tie-­‐breaking	
  based,	
  one	
  criterion	
  after	
  another	...
• C++	
  mobile	
  SDK	
  →	
  C++	
  backend	
  search	
  engine	
  
• hosted	
  as	
  a	
  NGINX	
  module	
  
• multi-­...
• Each	
  cluster	
  =	
  3	
  machines	
  
• Distributed	
  consensus	
  (SLA)	
  
• Multiple	
  datacenters	
  (EU,	
  U...
25
SaaS Architecture v1
• More	
  and	
  more	
  users	
  
• API	
  slaughter	
  
• Too	
  many	
  I/O	
  
• Writes	
  /	
  sec	
  
• Consensus
26...
27
SaaS Architecture v2
• Data	
  privacy	
  
• Send	
  us	
  only	
  non-­‐critical	
  data	
  
• Dedicated	
  cluster	
  
• Per	
  end-­‐user	
 ...
• 2B	
  operations	
  in	
  June	
  
• 30%	
  month-­‐over-­‐month	
  growth	
  in	
  MRR	
  
• 40+	
  servers
29
What abo...
30
Monitoring
• ServerDensity	
  
• Custom	
  probes	
  
• Alerts	
  
• SMS	
  
• Email
• RAM	
  over-­‐booking	
  
• Small	
  memory	
  footprint	
  per	
  index	
  
• All	
  indexes	
  are	
  mmaped	
  
• Laz...
• Do	
  NOT	
  trust	
  your	
  default	
  system	
  configuration	
  
• I/O:	
  not	
  optimized	
  for	
  SSD	
  
• Memo...
• Automatic	
  
• Ability	
  to	
  rollback	
  
• Ability	
  to	
  test	
  on	
  a	
  “fake”	
  production	
  env
33
Deplo...
• Your	
  server	
  will:	
  
• reboot	
  
• crash	
  
• explode	
  
• Make	
  it	
  happen	
  now!
34
Hardware
35
Questions?
Nächste SlideShare
Wird geladen in …5
×

Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

593 Aufrufe

Veröffentlicht am

Sylvain Utard, VP of engineering at Algolia presents how they're building a realtime search engine

Veröffentlicht in: Technologie, Design
  • Finally found a service provider which actually supplies an essay with an engaging introduction leading to the main body of the exposition Here is the site ⇒⇒⇒WRITE-MY-PAPER.net ⇐⇐⇐
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • My personal experience with research paper writing services was highly positive. I sent a request to ⇒ www.WritePaper.info ⇐ and found a writer within a few minutes. Because I had to move house and I literally didn’t have any time to sit on a computer for many hours every evening. Thankfully, the writer I chose followed my instructions to the letter. I know we can all write essays ourselves. For those in the same situation I was in, I recommend ⇒ www.WritePaper.info ⇐.
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

  1. 1. Title Build  Realtime  Search
 From  mobile  SDK  to  SaaS,  a  tech  POV   
 Sylvain  Utard SaaSisBeautiful    #2  –  June  2014
  2. 2. • Today  Search  means  Google   • Search  is  a  daily  activity   • Search  is  complex   • DB  are  not  handling  text  queries     • Speed  and  relevance  are  keys   • Fuzzy  matching  (typo-­‐tolerance) 2 Search
  3. 3. • Databases   • Optimized  for  INSERT/UPDATE/DELETE/ SELECT  (that's  a  lot)   • Structured  query  syntax  (mostly  SQL)   • Some  operations  scan  all  your  rows 3 Why Search Engines?
  4. 4. • Search  engines   • HIGHLY  optimized  for  “SELECT”  (only)   • Full-­‐text  queries:  understand  what  is  a  word   • Query  execution  time  driven  by  the  number  of   matching  documents   • And  obviously,  “LIKE  '%foo%’"  is  not  full-­‐text  search4 Why Search Engines?
  5. 5. • Indexing(input=documents)   • Multiple  attributes  (textual,  numerical,  geo)   • Search(input=query,  output=documents)   • Full-­‐text  queries  and/or  numerical  filters   • Understandable  results:  score  (ranking)  +   highlighting 5 How it works?
  6. 6. • 2  distinct  processes   • Indexing   • Storing  documents  in  a  highly  optimized  way   • Query   • Matching  documents   • Ranking  matched  documents 6 Implementation
  7. 7. • Indexing  means  building  an  “index“  or  “inverted   lists“   • A  dedicated  data  structure  optimized  for  search   (only)   • Input  =  a  set  of  documents  containing  words   • Output  =  a  set  of  words  associated  to  documents7 Implementation: Indexing process
  8. 8. 8 Implementation: Indexing process foo bar baz Doc 1 bar foo Doc 2 baz baz qux Doc 3 foo bar baz qux Doc 1, Doc 2 Doc 1, Doc 3 Doc 1, Doc 2 Doc 3 Indexing Inverted lists Documents Index
  9. 9. • Queries   • Goal  =  Retrieve  all  documents  matching  a   user  query   • Order  results  from  the  highest  ranked  to  the   lowest 9 Implementation: Query Process
  10. 10. 10 Implementation: Query Process foo bar baz qux Doc 1, Doc 2 Doc 1, Doc 3 Doc 1, Doc 2 Doc 3 Inverted lists Index User query "baz" Sort matching documents Pagination
  11. 11. 11 Implementation: Query Process foo bar baz qux Doc 1, Doc 2 Doc 1, Doc 3 Doc 1, Doc 2 Doc 3 Inverted lists Index User query "baz qux" Sort matching documents Intersect inverted lists Pagination
  12. 12. 12 Database Search Documents* Database*entries*
  13. 13. • Funded  in  2012   • 2012  →  Mar  2013   • Mobile-­‐oriented   • Now:  SaaS-­‐oriented   • Search  engine  as  a  Service 13 Title
  14. 14. • Embed  a  Search  Engine  in  your  App   • iOS,  Android,  Windows  Phone   • SDK/library  provider   • Offline   • Ideal  customers   • Evernote,  Contacts,  POI,  … 14 Mobile first
  15. 15. • Search  as  you  type   • Typo-­‐tolerance   • High-­‐performance   • Target  most  phones   • Starting  from  the  cheapest  Android  phone 15 Mobile focus
  16. 16. • 10-­‐20  queries  /  sec   • Realtime  if  <100ms   • 1  sec  to  build    a  10K  entries  index   • C++  engine  +  Objective-­‐C/C#/Java  interfaces   • <100KB  of  RAM,  whatever  the  index  size 16 Mobile Performance
  17. 17. • Same  issues  on  websites  &  apps   • Used  to  Google/Amazon:  it  just  works   • Poor  search  experience  everywhere   • SQL/NoSQL  technologies  are  not  providing   any  working  solution 17 What about hosted search?
  18. 18. 18 Hosted Search 1. Push  a  copy  of  your  data   2. Get  blazing  fast  search
  19. 19. • Open-­‐source   • ElasticSearch,  Solr,  Sphinx   • Commercial   • Hosted  ElasticSearch/Solr/Sphinx   • Enterprise-­‐oriented  on-­‐premise  engines 19 Alternatives
  20. 20. • Mostly  document  oriented   • Designed  to  search  in  “big”  documents   • Statistical  ranking  algorithm   • No  instant-­‐search  capabilities 20 Alternatives
  21. 21. • Database  Search   • Semi-­‐structured  objects  (multiple   attributes)   • Give  importance  to  the  right  attributes   • Combine  text  relevance  &  record  popularity 21 Database Search
  22. 22. • No  stats,  no  TF-­‐IDF,  no  “score”   • Tie-­‐breaking  based,  one  criterion  after  another   1.  #  typos   2.  geo   3.  proximity   4.  attribute  weight   5.  exact  match   6.  custom 22 Record rank
  23. 23. • C++  mobile  SDK  →  C++  backend  search  engine   • hosted  as  a  NGINX  module   • multi-­‐tenant  (mutualized  resources)   • fault-­‐tolerant  (SLA  99.99%)   • Faceting,  synonyms,  analytics,  … 23 Repackaging + Improvements
  24. 24. • Each  cluster  =  3  machines   • Distributed  consensus  (SLA)   • Multiple  datacenters  (EU,  US,  ASIA)   • Bare-­‐metal  servers   • 6c  (12t)  3.5Ghz   • 128GB  RAM   • 2x480GB  SSD  (RAID-­‐0) 24 SaaS Architecture
  25. 25. 25 SaaS Architecture v1
  26. 26. • More  and  more  users   • API  slaughter   • Too  many  I/O   • Writes  /  sec   • Consensus 26 SaaS Architecture v2
  27. 27. 27 SaaS Architecture v2
  28. 28. • Data  privacy   • Send  us  only  non-­‐critical  data   • Dedicated  cluster   • Per  end-­‐user  security   • Restrict  the  result  set  per  end-­‐user,  per  tag,  …   • Crawling   • Built-­‐in  rate-­‐limits 28 SaaS Security
  29. 29. • 2B  operations  in  June   • 30%  month-­‐over-­‐month  growth  in  MRR   • 40+  servers 29 What about scalability?
  30. 30. 30 Monitoring • ServerDensity   • Custom  probes   • Alerts   • SMS   • Email
  31. 31. • RAM  over-­‐booking   • Small  memory  footprint  per  index   • All  indexes  are  mmaped   • Lazy-­‐loading  (no  query  =  no  RAM  consumption)   • SSD   • Disable  swapping   • Setup  a  new  cluster  if  the  current  one  is  full 31 RAM
  32. 32. • Do  NOT  trust  your  default  system  configuration   • I/O:  not  optimized  for  SSD   • Memory:  not  optimized  for  128GB  RAM   • Network:  not  optimized  for  +10K  keep-­‐alive   connections 32 Network
  33. 33. • Automatic   • Ability  to  rollback   • Ability  to  test  on  a  “fake”  production  env 33 Deployment
  34. 34. • Your  server  will:   • reboot   • crash   • explode   • Make  it  happen  now! 34 Hardware
  35. 35. 35 Questions?

×