SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
1 
Box + Solr = Content Search for 
Business 
June 2014
2 
Wei Zhao 
Box backend engineer 
wzhao@box.com
3 
Box mission 
to make organizations more productive, 
competitive and collaborative by connecting 
people and their most important information
4 
25MM+ 
Users 
225K+ 
Businesses 
99% 
Fortune 500
5 
Box search mission is to make user content 
easy to discover.
6 
10Billion+ 
Documents 
10TB+ 
Index size 
100M+ 
Daily requests 
Box uses Solr for search
7 
Quick Search
8 
Quick Search
9 
Full Search
10 
Sharding – splitting the index 
Agenda 
Highly available search 
A few more things 
1 
2 
3 
4 
5 Q&A 
Currently working on
11 
We shard things
12 
Shard ID = File ID % Total Shards
13 
Multi-tenant – One big logical index for all users 
Solr index 
Shard1 Shard2 Shard3 ShardN
14 
Search scope
15 
A typical Solr Document 
File ID: 12345 
OwnerID: user1 
Parent Folders IDs: folder1, folder2 
File Name: Solr.ppt 
File Content: blah 
......
16 
File 1 File 2 
Owner: User1 
Parent: Folder1 
File 3 File 4 
Owner: User2 
Parent: Folder3 
Owner: User2 
Parent: Folder2 
Owner: User1 
Parent: 
Folder1 
Folder4
17 
User1 with no share folder 
File 1 File 2 
Owner: User1 
Parent: Folder1 
File 3 File 4 
Owner: User2 
Parent: Folder3 
Owner: User2 
Parent: Folder2 
Owner: User1 
Parent: 
Folder1 
Folder4
18 
User2 shares Folder2 with User1 
File 1 File 2 
Owner: User1 
Parent: Folder1 
File 3 File 4 
Owner: User2 
Parent: Folder3 
Owner: User2 
Parent: Folder2 
Owner: User1 
Parent: 
Folder1 
Folder4
19 
User2 shares Folder2 with User1 
File 1 File 2 
Owner: User1 
Parent: Folder1 
File 3 File 4 
Owner: User2 
Parent: Folder3 
Owner: User2 
Parent: Folder2 
Owner: User1 
Parent: 
Folder1 
Folder4
20 
User2 shares Folder2 with User1 
File 1 File 2 
Owner: User1 
Parent: Folder1 
File 3 File 4 
Owner: User2 
Parent: Folder3 
Owner: User2 
Parent: Folder5 
Owner: User1 
Parent: 
Folder1 
Folder4 
Removed 
out of 
Folder2
21 
User2 shares Folder2 with User1 
File 1 File 2 
Owner: User1 
Parent: Folder1 
File 3 File 4 
Owner: User2 
Parent: Folder3 
Owner: User2 
Parent: Folder5 
Owner: User1 
Parent: 
Folder1 
Folder4 
Removed 
out of 
Folder2
22 
Highly Available Search
23 
• Index is highly available 
• Search functionality is highly available
24 
Index workflow
25 
Box 
Front 
End 
Upload 
Index Queue 
Queue 1 
Queue 2 
Queue 3 
Indexer 1 
Indexer 2 
Indexer 3 
MySQL 
Index1 
Index2 
Index2
26 
Search workflow
27 
query HA 
Box 
Front 
End 
Proxy 
Head 
node 
1 2 3 N 
HA Proxy 
Box 
Front 
End 
query 
HA 
Proxy 
Data center boundary 
Head 
node 
HA Proxy 
1 2 3 N
28 
A few more things
29 
File Content Search
30 
Box 
Front 
End 
Upload 
MySQL Box File 
Storage 
Indexer 
Solr 
Index 
Text Extraction 
Extracted 
Text
31 
Multi-language support
32 
Raw file 
content 
Language 
detector 
English tokenizer 
Spanish tokenizer 
Japanese tokenizer 
German tokenizer 
file_content_en 
File_content_es 
{hola} 
file_content_ja 
. 
. 
. 
. 
File_content_de
33 
To Dos 
• Scale language support 
• Support document with mixed languages
34 
Search Warm-up
35 
• Front end informs backend to warm up on keyboard focus 
• Backend prepares the search filter and caches it in a search session 
• Backend sends a warm-up query to Solr
36 
What we are working on
37 
• Search suggestions 
• Search operators 
• Use machine learning to influence ranking 
• Logical sharding 
Things we are working on
38 
Question?
39 
We are hiring! 
Contact: wzhao@box.com

Weitere ähnliche Inhalte

Ähnlich wie Box + Solr = Content Search for Business

Extracts from AS/400 Concepts & Tools workshop
Extracts from AS/400 Concepts & Tools workshopExtracts from AS/400 Concepts & Tools workshop
Extracts from AS/400 Concepts & Tools workshopRamesh Joshi
 
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Databricks
 
[IC Manage] Workspace Acceleration & Network Storage Reduction
[IC Manage] Workspace Acceleration & Network Storage Reduction[IC Manage] Workspace Acceleration & Network Storage Reduction
[IC Manage] Workspace Acceleration & Network Storage ReductionPerforce
 
How to write memory efficient code?
How to write memory efficient code?How to write memory efficient code?
How to write memory efficient code?Tier1 app
 
A Practical Introduction to Apache Solr
A Practical Introduction to Apache SolrA Practical Introduction to Apache Solr
A Practical Introduction to Apache SolrAngel Borroy López
 
File handling.pptx
File handling.pptxFile handling.pptx
File handling.pptxVishuSaini22
 
Dancing faster in the datasphere
Dancing faster in the datasphereDancing faster in the datasphere
Dancing faster in the datasphereJ T "Tom" Johnson
 
Report blocking ,management of files in secondry memory , static vs dynamic a...
Report blocking ,management of files in secondry memory , static vs dynamic a...Report blocking ,management of files in secondry memory , static vs dynamic a...
Report blocking ,management of files in secondry memory , static vs dynamic a...NoorMustafaSoomro
 
Powerpoint versiebeheer there is no such thing as a final version 1
Powerpoint versiebeheer there is no such thing as a final version 1Powerpoint versiebeheer there is no such thing as a final version 1
Powerpoint versiebeheer there is no such thing as a final version 1Hugo Besemer
 
Feeding the Elephant: Approaching 1PB/Day
Feeding the Elephant: Approaching 1PB/DayFeeding the Elephant: Approaching 1PB/Day
Feeding the Elephant: Approaching 1PB/DayDataWorks Summit
 
Introduction to Plone (November 2003)
Introduction to Plone (November 2003)Introduction to Plone (November 2003)
Introduction to Plone (November 2003)Kiran Jonnalagadda
 
What the git? - SAP Inside Track Munich 2016
What the git?  - SAP Inside Track Munich 2016What the git?  - SAP Inside Track Munich 2016
What the git? - SAP Inside Track Munich 2016Hendrik Neumann
 
Building a digital repository for the NAi
Building a digital repository for the NAiBuilding a digital repository for the NAi
Building a digital repository for the NAidatable_be
 
[India Merge World Tour] IC Manage
[India Merge World Tour] IC Manage[India Merge World Tour] IC Manage
[India Merge World Tour] IC ManagePerforce
 
Realtimestream and realtime fastcatsearch
Realtimestream and realtime fastcatsearchRealtimestream and realtime fastcatsearch
Realtimestream and realtime fastcatsearch상욱 송
 
Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...
Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...
Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...Artefactual Systems - Archivematica
 

Ähnlich wie Box + Solr = Content Search for Business (20)

Extracts from AS/400 Concepts & Tools workshop
Extracts from AS/400 Concepts & Tools workshopExtracts from AS/400 Concepts & Tools workshop
Extracts from AS/400 Concepts & Tools workshop
 
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
 
[IC Manage] Workspace Acceleration & Network Storage Reduction
[IC Manage] Workspace Acceleration & Network Storage Reduction[IC Manage] Workspace Acceleration & Network Storage Reduction
[IC Manage] Workspace Acceleration & Network Storage Reduction
 
How to write memory efficient code?
How to write memory efficient code?How to write memory efficient code?
How to write memory efficient code?
 
A Practical Introduction to Apache Solr
A Practical Introduction to Apache SolrA Practical Introduction to Apache Solr
A Practical Introduction to Apache Solr
 
INT 1010 04-5.pdf
INT 1010 04-5.pdfINT 1010 04-5.pdf
INT 1010 04-5.pdf
 
File handling.pptx
File handling.pptxFile handling.pptx
File handling.pptx
 
Dancing faster in the datasphere
Dancing faster in the datasphereDancing faster in the datasphere
Dancing faster in the datasphere
 
Report blocking ,management of files in secondry memory , static vs dynamic a...
Report blocking ,management of files in secondry memory , static vs dynamic a...Report blocking ,management of files in secondry memory , static vs dynamic a...
Report blocking ,management of files in secondry memory , static vs dynamic a...
 
Powerpoint versiebeheer there is no such thing as a final version 1
Powerpoint versiebeheer there is no such thing as a final version 1Powerpoint versiebeheer there is no such thing as a final version 1
Powerpoint versiebeheer there is no such thing as a final version 1
 
Feeding the Elephant: Approaching 1PB/Day
Feeding the Elephant: Approaching 1PB/DayFeeding the Elephant: Approaching 1PB/Day
Feeding the Elephant: Approaching 1PB/Day
 
Introduction to Plone (November 2003)
Introduction to Plone (November 2003)Introduction to Plone (November 2003)
Introduction to Plone (November 2003)
 
What the git? - SAP Inside Track Munich 2016
What the git?  - SAP Inside Track Munich 2016What the git?  - SAP Inside Track Munich 2016
What the git? - SAP Inside Track Munich 2016
 
Building a digital repository for the NAi
Building a digital repository for the NAiBuilding a digital repository for the NAi
Building a digital repository for the NAi
 
[India Merge World Tour] IC Manage
[India Merge World Tour] IC Manage[India Merge World Tour] IC Manage
[India Merge World Tour] IC Manage
 
Big Search 4 Big Data War Stories
Big Search 4 Big Data War StoriesBig Search 4 Big Data War Stories
Big Search 4 Big Data War Stories
 
LOD2 Webinar Series: Zemanta / Open refine
LOD2 Webinar Series: Zemanta / Open refine LOD2 Webinar Series: Zemanta / Open refine
LOD2 Webinar Series: Zemanta / Open refine
 
Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
 
Realtimestream and realtime fastcatsearch
Realtimestream and realtime fastcatsearchRealtimestream and realtime fastcatsearch
Realtimestream and realtime fastcatsearch
 
Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...
Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...
Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...
 

Mehr von Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

Mehr von Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Kürzlich hochgeladen

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 

Kürzlich hochgeladen (20)

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 

Box + Solr = Content Search for Business

  • 1. 1 Box + Solr = Content Search for Business June 2014
  • 2. 2 Wei Zhao Box backend engineer wzhao@box.com
  • 3. 3 Box mission to make organizations more productive, competitive and collaborative by connecting people and their most important information
  • 4. 4 25MM+ Users 225K+ Businesses 99% Fortune 500
  • 5. 5 Box search mission is to make user content easy to discover.
  • 6. 6 10Billion+ Documents 10TB+ Index size 100M+ Daily requests Box uses Solr for search
  • 10. 10 Sharding – splitting the index Agenda Highly available search A few more things 1 2 3 4 5 Q&A Currently working on
  • 11. 11 We shard things
  • 12. 12 Shard ID = File ID % Total Shards
  • 13. 13 Multi-tenant – One big logical index for all users Solr index Shard1 Shard2 Shard3 ShardN
  • 15. 15 A typical Solr Document File ID: 12345 OwnerID: user1 Parent Folders IDs: folder1, folder2 File Name: Solr.ppt File Content: blah ......
  • 16. 16 File 1 File 2 Owner: User1 Parent: Folder1 File 3 File 4 Owner: User2 Parent: Folder3 Owner: User2 Parent: Folder2 Owner: User1 Parent: Folder1 Folder4
  • 17. 17 User1 with no share folder File 1 File 2 Owner: User1 Parent: Folder1 File 3 File 4 Owner: User2 Parent: Folder3 Owner: User2 Parent: Folder2 Owner: User1 Parent: Folder1 Folder4
  • 18. 18 User2 shares Folder2 with User1 File 1 File 2 Owner: User1 Parent: Folder1 File 3 File 4 Owner: User2 Parent: Folder3 Owner: User2 Parent: Folder2 Owner: User1 Parent: Folder1 Folder4
  • 19. 19 User2 shares Folder2 with User1 File 1 File 2 Owner: User1 Parent: Folder1 File 3 File 4 Owner: User2 Parent: Folder3 Owner: User2 Parent: Folder2 Owner: User1 Parent: Folder1 Folder4
  • 20. 20 User2 shares Folder2 with User1 File 1 File 2 Owner: User1 Parent: Folder1 File 3 File 4 Owner: User2 Parent: Folder3 Owner: User2 Parent: Folder5 Owner: User1 Parent: Folder1 Folder4 Removed out of Folder2
  • 21. 21 User2 shares Folder2 with User1 File 1 File 2 Owner: User1 Parent: Folder1 File 3 File 4 Owner: User2 Parent: Folder3 Owner: User2 Parent: Folder5 Owner: User1 Parent: Folder1 Folder4 Removed out of Folder2
  • 23. 23 • Index is highly available • Search functionality is highly available
  • 25. 25 Box Front End Upload Index Queue Queue 1 Queue 2 Queue 3 Indexer 1 Indexer 2 Indexer 3 MySQL Index1 Index2 Index2
  • 27. 27 query HA Box Front End Proxy Head node 1 2 3 N HA Proxy Box Front End query HA Proxy Data center boundary Head node HA Proxy 1 2 3 N
  • 28. 28 A few more things
  • 29. 29 File Content Search
  • 30. 30 Box Front End Upload MySQL Box File Storage Indexer Solr Index Text Extraction Extracted Text
  • 32. 32 Raw file content Language detector English tokenizer Spanish tokenizer Japanese tokenizer German tokenizer file_content_en File_content_es {hola} file_content_ja . . . . File_content_de
  • 33. 33 To Dos • Scale language support • Support document with mixed languages
  • 35. 35 • Front end informs backend to warm up on keyboard focus • Backend prepares the search filter and caches it in a search session • Backend sends a warm-up query to Solr
  • 36. 36 What we are working on
  • 37. 37 • Search suggestions • Search operators • Use machine learning to influence ranking • Logical sharding Things we are working on
  • 39. 39 We are hiring! Contact: wzhao@box.com

Hinweis der Redaktion

  1. This has changed many of the access patterns to data in the enterprise.
  2. This has changed many of the access patterns to data in the enterprise.
  3. This has changed many of the access patterns to data in the enterprise.