1. How to build a small
distributed search
engine using open
source software
2. Building a distributed search engine
Search engine subsytems:
●
Page database
●
List of the pages to retrieve
●
Pages retrieval and save
●
Page content parsing
●
Full-text indexing of the contents
●
Graph database of the links for ranking
3. Building a distributed search engine
Open Source Software
•
Apache Hadoop
•
•
•
•
MapReduce
HDFS
HBase
Apache Lucene
5. Building a distributed search engine
HDFS – Assumptions and goals
●
Hardware failure
●
Big data
●
Write once / read many
●
Moving computation, not data