SlideShare a Scribd company logo
1 of 48
Realtime revolution at work REAL-TIME SEARCH AT YAMMER May 25, 2011 By Boris Aleksandrovsky  http://www.linkedin.com/in/baleksan Yammer, Inc. http://www.linkedin.com/in/baleksan
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Challenges - From information to knowledge Information Facts Knowledge Attention Engagement Retention Messages Metadata Personalized Search
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
: Putting Social Media to Work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Knowledge Management: Document-oriented Enterprise Collaboration: Outcome-focused Social Media: People-centric
Yammer: The Enterprise Social Network  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Easy. Shared. Searchable. Real-time. Where your company’s knowledge lives.
100,000+ companies, including 85% of the Fortune 500 – and growing.
What do you discuss at work, and with whom? ,[object Object],[object Object],[object Object],What do our employees think of our 401K program? Is everybody saving? What’s the latest with the XYZ account? What are our recommendations for financial and regulatory reform given the latest news about…?  What will be discussed at our Quarterly Sales Kickoff? Where can I find out more about customer events here at the ABC conference? Who’s free to meet up? How can my team better prepare for our next product release? Who has any fresh ideas for… Who will I be working with on this new project?
Search use case -  Transient Awareness ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Search use case -  Knowledge Exploration ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Challenges for Yammer’s search engine ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Team ,[object Object],[object Object],[object Object]
Indexing ,[object Object]
Replication ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing ,[object Object]
30s
Why is it hard? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How do we cope? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Delete-update race  ,[object Object],[object Object],[object Object],id timestamp tombstone 5 12:34:39 no 5 12:45:01 yes
Multiple update race ,[object Object],[object Object],[object Object],id timestamp text 5 12:34:39 hello 5 12:45:01 hello there now
Dupes ,[object Object],[object Object],[object Object],[object Object],id timestamp numLikes 5 12:34:39 0 5 12:45:01 1 5 12:45:02 1 5 12:45:04 0
Thread example
Zoie ,[object Object],[object Object],[object Object],[object Object],[object Object]
Zoie ,[object Object],[object Object],[object Object]
Indexing HA ,[object Object],[object Object],[object Object],[object Object]
Dual indexing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Index consistency problems ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Search ,[object Object]
Goal ,[object Object],[object Object],[object Object],[object Object]
REST-full API over HTTP ,[object Object]
Payload ,[object Object],[object Object],[object Object]
Payload
Web Server ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Search master ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Partitioning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Testing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Production ,[object Object],[object Object],[object Object],[object Object]
Metrics ,[object Object]
Metrics ,[object Object]
Metrics ,[object Object]
Metrics ,[object Object]
Metrics ,[object Object]
Lessons ,[object Object],[object Object],[object Object],[object Object],[object Object]
Future ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Q&A Session: What’s On Your Mind?

More Related Content

Similar to Real-time Search at Yammer - By Aleksandrovsky Boris

Puppet Camp Duesseldorf 2014: Luke Kanies - Puppet Keynote
Puppet Camp Duesseldorf 2014: Luke Kanies - Puppet KeynotePuppet Camp Duesseldorf 2014: Luke Kanies - Puppet Keynote
Puppet Camp Duesseldorf 2014: Luke Kanies - Puppet KeynoteNETWAYS
 
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Yahoo Developer Network
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...HostedbyConfluent
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBDenny Lee
 
Jive Software - Clearspace Overview
Jive Software - Clearspace OverviewJive Software - Clearspace Overview
Jive Software - Clearspace OverviewMeganRossFarrell
 
A Tale of Contemporary Software
A Tale of Contemporary SoftwareA Tale of Contemporary Software
A Tale of Contemporary SoftwareYun Zhi Lin
 
Puppet Camp Amsterdam 2015: Keynote
Puppet Camp Amsterdam 2015: KeynotePuppet Camp Amsterdam 2015: Keynote
Puppet Camp Amsterdam 2015: KeynotePuppet
 
Super Sizing Youtube with Python
Super Sizing Youtube with PythonSuper Sizing Youtube with Python
Super Sizing Youtube with Pythondidip
 
ML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesSigmoid
 
From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018Christophe Rochefolle
 
DockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopDockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopKevin Crawley
 
2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservice2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservicedevopsdaysaustin
 
Spca2014 navigating clouds sp_con14_mackie
Spca2014 navigating clouds sp_con14_mackieSpca2014 navigating clouds sp_con14_mackie
Spca2014 navigating clouds sp_con14_mackieNCCOMMS
 
Moving to Microservices with the Help of Distributed Traces
Moving to Microservices with the Help of Distributed TracesMoving to Microservices with the Help of Distributed Traces
Moving to Microservices with the Help of Distributed TracesKP Kaiser
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsDirecti Group
 
Making Observability Actionable At Scale - DBS DevConnect 2019
Making Observability Actionable At Scale - DBS DevConnect 2019Making Observability Actionable At Scale - DBS DevConnect 2019
Making Observability Actionable At Scale - DBS DevConnect 2019Squadcast Inc
 
Bluemix paas 기반 saas 개발 사례
Bluemix paas 기반 saas 개발 사례Bluemix paas 기반 saas 개발 사례
Bluemix paas 기반 saas 개발 사례uEngine Solutions
 
Introduction to Puppet Enterprise - Jan 30, 2019
Introduction to Puppet Enterprise - Jan 30, 2019Introduction to Puppet Enterprise - Jan 30, 2019
Introduction to Puppet Enterprise - Jan 30, 2019Puppet
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesIvo Andreev
 

Similar to Real-time Search at Yammer - By Aleksandrovsky Boris (20)

Puppet Camp Duesseldorf 2014: Luke Kanies - Puppet Keynote
Puppet Camp Duesseldorf 2014: Luke Kanies - Puppet KeynotePuppet Camp Duesseldorf 2014: Luke Kanies - Puppet Keynote
Puppet Camp Duesseldorf 2014: Luke Kanies - Puppet Keynote
 
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Jive Software - Clearspace Overview
Jive Software - Clearspace OverviewJive Software - Clearspace Overview
Jive Software - Clearspace Overview
 
A Tale of Contemporary Software
A Tale of Contemporary SoftwareA Tale of Contemporary Software
A Tale of Contemporary Software
 
Puppet Camp Amsterdam 2015: Keynote
Puppet Camp Amsterdam 2015: KeynotePuppet Camp Amsterdam 2015: Keynote
Puppet Camp Amsterdam 2015: Keynote
 
Os Solomon
Os SolomonOs Solomon
Os Solomon
 
Super Sizing Youtube with Python
Super Sizing Youtube with PythonSuper Sizing Youtube with Python
Super Sizing Youtube with Python
 
ML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time Series
 
From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018
 
DockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopDockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability Workshop
 
2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservice2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservice
 
Spca2014 navigating clouds sp_con14_mackie
Spca2014 navigating clouds sp_con14_mackieSpca2014 navigating clouds sp_con14_mackie
Spca2014 navigating clouds sp_con14_mackie
 
Moving to Microservices with the Help of Distributed Traces
Moving to Microservices with the Help of Distributed TracesMoving to Microservices with the Help of Distributed Traces
Moving to Microservices with the Help of Distributed Traces
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale Systems
 
Making Observability Actionable At Scale - DBS DevConnect 2019
Making Observability Actionable At Scale - DBS DevConnect 2019Making Observability Actionable At Scale - DBS DevConnect 2019
Making Observability Actionable At Scale - DBS DevConnect 2019
 
Bluemix paas 기반 saas 개발 사례
Bluemix paas 기반 saas 개발 사례Bluemix paas 기반 saas 개발 사례
Bluemix paas 기반 saas 개발 사례
 
Introduction to Puppet Enterprise - Jan 30, 2019
Introduction to Puppet Enterprise - Jan 30, 2019Introduction to Puppet Enterprise - Jan 30, 2019
Introduction to Puppet Enterprise - Jan 30, 2019
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Recently uploaded

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 

Recently uploaded (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Real-time Search at Yammer - By Aleksandrovsky Boris

  • 1. Realtime revolution at work REAL-TIME SEARCH AT YAMMER May 25, 2011 By Boris Aleksandrovsky http://www.linkedin.com/in/baleksan Yammer, Inc. http://www.linkedin.com/in/baleksan
  • 2.
  • 3.
  • 4. Challenges - From information to knowledge Information Facts Knowledge Attention Engagement Retention Messages Metadata Personalized Search
  • 5.
  • 6.
  • 7.
  • 8. 100,000+ companies, including 85% of the Fortune 500 – and growing.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19. 30s
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48. Q&A Session: What’s On Your Mind?

Editor's Notes

  1. Similar to how a single malt is made, knowledge is distilled from information, facts and experience. The role of the search engine is to capture the process and make it readily available.
  2. private and secure enterprise social network for coworkers and colleagues to communicate, collaborate, and coordinate An interactive online knowledge base that connects dispersed workers in ways that are easy, real-time, social, and searchable A way to share what’s relevant to the right colleagues , by drawing attention to and discussing important issues “ The Social Glue” to an organization , driving better collaboration and process improvements while preserving institutional knowledge Real-time communication, coordination Business continuity and relevance Global connectivity, accessible anywhere
  3. “ Introducing Yammer: combining the new ways we communicate, with the consumerization of enterprise software to achieve faster communications, better collaboration, and more productivity.” Overview of the key features but emphasize this is a Knowledge Base: Search for answers and topics, identify collaborators and experts, Messaging and Feeds: Ask questions, start discussions. Share news, links, opinions, and ideas. Streamline communication, understand context in threaded conversations.  My Feed, Company Feed, RSS Feeds: follow what and who are of most interest to you, stay on top of company news, add RSS to stay informed. Direct Messaging: Send private direct messages to co-workers, reduce email volume, add others who can catch up by reading thread histories. User Profiles: Each user creates a profile with their photo, title, and background. Easily connect with co-workers and expertise Company Directory: Upgrade to enterprise for additional security and admin features, including company directory integration. Help new employees quickly get up to speed. Groups and Communities: build engagement by creating internal Groups around projects and topics, and external Communities with partners and customers. Applications: Share files, enhance productivity, and increase collaboration through Yammer’s suite of core apps and a la carte Third Party Apps for document sharing, tracking, helpdesk ticketing, and more. Integrations: SharePoint 2007 and 2010, Outlook, Salesforce, soon: Box Access and Mobility: Access Yammer anywhere, through the web, Desktop client, IM, SMS, Microsoft Sharepoint, and mobile applications (iPhone, Blackberry, Android, Windows Mobile). Translations: soon available in 100 languages Network Consultation and Support: included with enterprise upgrade OTHER stuff to talk about if you like: @People and #Topics: Quickly loop co-workers into conversations and tag topics for further information discovery and sharing. Connectivity and Crisis Communications: connect your dispersed workforce, crowdsource ideas, and broadcast company-wide in times of critical need.
  4. “ We know our product inside and out from our work with over 100K+ company networks. From product iterations to customer use cases, to deployment and engagement services, we have a depth of expertise that has made us the market leader.”
  5. Before getting into the product – let’s get at the problem(s) Yammer is attempting to address…
  6. From the perspective of search, people use Yammer today in two modes. First, they want to simply capture the information which might have scrolled out of view in their Yammer feed. This is very similar to Twitter - I check it once in a while, but what have I missed since the last time? For this use-case we want to present search results in reverse chronological order and answer simple queries. The second mode is the knowledge exploration mode. Yammer is a knowledge base created by interactions between colleagues over time within a company. Yammer can help with the on-boarding process, faq's, tips, computer setup, company procedures and processes, practices and culture. For this, search is an entry point and quite possibly the most important interaction element. We need to answer complicated queries and present results based on textual similarity, popularity, engagement and social distance.
  7. From the perspective of search, people use Yammer today in two modes. First, they want to simply capture the information which might have scrolled out of view in their Yammer feed. This is very similar to Twitter - I check it once in a while, but what have I missed since the last time? For this use-case we want to present search results in reverse chronological order and answer simple queries. The second mode is the knowledge exploration mode. Yammer is a knowledge base created by interactions between colleagues over time within a company. Yammer can help with the on-boarding process, faq's, tips, computer setup, company procedures and processes, practices and culture. For this, search is an entry point and quite possibly the most important interaction element. We need to answer complicated queries and present results based on textual similarity, popularity, engagement and social distance. Wikis working in groups - people are creating some connections but they are not well organized.
  8. he biggest challenges for search at Yammer is the real time nature of the information and the complicated relevancy story. Information on Yammer should be indexed and available for users to search in real time, virtually in less then a second. This makes the Yammer indexing system similar to Twitter where tweets are indexed in real time. Search results likewise are available in reverse chronological order which is based on the assumption that for certain types of events, timeliness is the most pertinent characteristic. This maps really well into types of content like news where relevancy declines fairly rapidly as time passes, or for types of content which are more transient in nature, like events and meetings. There are other types of content where the relationship between the creator of the content and the searcher is important, and also the sheer popularity of the content is important. This is more of a Facebook newsfeed case, which tries to present content from people you value or interact with most. A good example will be communications from your boss, or an expert opinion you trust. Popular discussion threads which capture the attention of the company are important to find since they usually encompass the "company culture".   There are however other types of content that are much more knowledge heavy and with the retrieval of each textual similarity, reputation and potential for engagement are more important then timeliness. For instance when the sales representative is searching for a relevant approach to a particular client industry, then he would be interested in the experiences of all other sales people who tried to sell to that industry, and he would want to look back as far as the records go. This is a case where Yammer's search system is trying to act more like Google search system
  9. Out of order delivery source of all (most) evil Easily 50% of complexity is there. Solution Garanteee in-order delivery - buffer and wait - degrades performance, availability and only garantees very eventual consistency Minimize the probability and forget - ts precesion - clock skew Solution Arbitrate - based on ts / vector clocks (ts+versions) - based on semantics - based on business cases - need to index tombstones (mark-for-delete)
  10. Out of order delivery source of all (most) evil Easily 50% of complexity is there. Solution Garanteee in-order delivery - buffer and wait - degrades performance, availability and only garantees very eventual consistency Minimize the probability and forget - ts precesion - clock skew Solution Arbitrate - based on ts / vector clocks (ts+versions) - based on semantics - based on business cases - need to index tombstones (mark-for-delete)
  11. Editable TOC or bullet slide
  12. Editable TOC or bullet slide
  13. - Dual indexing - primary index for serving out - secondary index for reindexing - Verify secondary index consistency - foreach replica in turn - shutdown - mv secondary to primary - restart - Availability should not be affected except for slight chance of system failure on the serving replica.
  14. - Indexing problems Detect - index integrity tool checks against the :source of truth: - identifies patches Reindex - gaps - whole - reindex into secondary, swap with primary Repair job - patch in place
  15. Call all - more predictable latency profile, index warmup advantage Round robin - when under load stress Least busy - most complicated, requires metrics poll, prone to errors when burstable activity
  16. - Testing Indexing Idempotent Out-of-order delivery 10K docs delivered in random order with X% of dupes Search Build small manual index by recording events Create unit-test style tests with Asserts
  17. - Production Metrics Alerts via Zabbix (Zabbix is awesome) Puppet Ganglia for machine level diagnostics Have enough redundancy
  18. Gauges are instantaneous readings of values (e.g., a queue depth). Counters are 64-bit integers which can be incremented or decremented. Meters are increment-only counters which keep track of the rate of events. They provide mean rates, plus exponentially-weighted moving averages which use the same formula that the UNIX 1-, 5-, and 15-minute load averages use. Histograms capture distribution measurements about a metric: the count, maximum, minimum, mean, standard deviation, median, 75th percentile, 95th percentile, 98th percentile, 99th percentile, and 99.9th percentile of the recorded values. (They do so using a method called reservoir sampling which allows them to efficiently keep a small, statistically representative sample of all the measurements.) Timers record the duration as well as the rate of events. In addition to the rate information that meters provide, timers also provide the same metrics as histograms about the recorded durations. (The samples that timers keep in order to calculate percentiles and such are biased towards more recent data, since you probably care more about how your application is doing now as opposed to how it's done historically.)
  19. Gauges are instantaneous readings of values (e.g., a queue depth). Counters are 64-bit integers which can be incremented or decremented. Meters are increment-only counters which keep track of the rate of events. They provide mean rates, plus exponentially-weighted moving averages which use the same formula that the UNIX 1-, 5-, and 15-minute load averages use. Histograms capture distribution measurements about a metric: the count, maximum, minimum, mean, standard deviation, median, 75th percentile, 95th percentile, 98th percentile, 99th percentile, and 99.9th percentile of the recorded values. (They do so using a method called reservoir sampling which allows them to efficiently keep a small, statistically representative sample of all the measurements.) Timers record the duration as well as the rate of events. In addition to the rate information that meters provide, timers also provide the same metrics as histograms about the recorded durations. (The samples that timers keep in order to calculate percentiles and such are biased towards more recent data, since you probably care more about how your application is doing now as opposed to how it's done historically.)
  20. Gauges are instantaneous readings of values (e.g., a queue depth). Counters are 64-bit integers which can be incremented or decremented. Meters are increment-only counters which keep track of the rate of events. They provide mean rates, plus exponentially-weighted moving averages which use the same formula that the UNIX 1-, 5-, and 15-minute load averages use. Histograms capture distribution measurements about a metric: the count, maximum, minimum, mean, standard deviation, median, 75th percentile, 95th percentile, 98th percentile, 99th percentile, and 99.9th percentile of the recorded values. (They do so using a method called reservoir sampling which allows them to efficiently keep a small, statistically representative sample of all the measurements.) Timers record the duration as well as the rate of events. In addition to the rate information that meters provide, timers also provide the same metrics as histograms about the recorded durations. (The samples that timers keep in order to calculate percentiles and such are biased towards more recent data, since you probably care more about how your application is doing now as opposed to how it's done historically.)
  21. Gauges are instantaneous readings of values (e.g., a queue depth). Counters are 64-bit integers which can be incremented or decremented. Meters are increment-only counters which keep track of the rate of events. They provide mean rates, plus exponentially-weighted moving averages which use the same formula that the UNIX 1-, 5-, and 15-minute load averages use. Histograms capture distribution measurements about a metric: the count, maximum, minimum, mean, standard deviation, median, 75th percentile, 95th percentile, 98th percentile, 99th percentile, and 99.9th percentile of the recorded values. (They do so using a method called reservoir sampling which allows them to efficiently keep a small, statistically representative sample of all the measurements.) Timers record the duration as well as the rate of events. In addition to the rate information that meters provide, timers also provide the same metrics as histograms about the recorded durations. (The samples that timers keep in order to calculate percentiles and such are biased towards more recent data, since you probably care more about how your application is doing now as opposed to how it's done historically.)
  22. Gauges are instantaneous readings of values (e.g., a queue depth). Counters are 64-bit integers which can be incremented or decremented. Meters are increment-only counters which keep track of the rate of events. They provide mean rates, plus exponentially-weighted moving averages which use the same formula that the UNIX 1-, 5-, and 15-minute load averages use. Histograms capture distribution measurements about a metric: the count, maximum, minimum, mean, standard deviation, median, 75th percentile, 95th percentile, 98th percentile, 99th percentile, and 99.9th percentile of the recorded values. (They do so using a method called reservoir sampling which allows them to efficiently keep a small, statistically representative sample of all the measurements.) Timers record the duration as well as the rate of events. In addition to the rate information that meters provide, timers also provide the same metrics as histograms about the recorded durations. (The samples that timers keep in order to calculate percentiles and such are biased towards more recent data, since you probably care more about how your application is doing now as opposed to how it's done historically.)