A customized web search engine is graduation project .This presentation displays what search engine is and open source software which used in this project
What Are The Drone Anti-jamming Systems Technology?
A customized web search engine [autosaved]
1. Supervised By
Dr. Mohamed A. El-Rashidy Eng. Ahmed Ghozia
Dept. of Computer Science & Engineering
Faculty of Electronic Engineering,
Menoufiya University.
2. The main purpose of this project is to build our own
search engine that should suffice for our needs as a
nation
In this project has been tried to add customized
features to search engine such as building and
developing a time-based search engine that is meant
to deal with local and international news
3. Question : What is a Search Engine?
How web search engine work?
Web crawler , Indexing , Ranking
Lucene , Nutch , Solr
Who uses solr?
Setup Nutch for web crawling
Setup Solr for search
Running Nutch in Eclipse for developing
Experiments
4. Answer: A software that
builds an index on text
answers queries using that index
A search engine offers
Scalability
Relevance Ranking
Integrates different data sources (email,
web pages, files, database,...)
5. A search engine operates, in the following order
1. Web crawling
2. Indexing
3. Ranking
6. a program or automated script which browses the
World Wide Web
used to create a copy of all the visited pages for later
processing by a search engine
it starts with a list of URLs to visit, called the seeds
URLs recursively visited according to a set of policies
A selection policy
A re-visit policy
A politeness policy
A parallelization policy
7. Indexing process entails how data is collected, parsed,
and stored to facilitate fast and accurate search query
evaluation.
The process involves the following steps
Data collection
Data traversal
Indexing
8. Indexing process:
Convert document
Extract text and meta data
Normalize text(stop word,stim)
Write (inverted) index
Example:
Document 1: “Apache Lucene at Jazoon“
Document 2: “Jazoon conference“
Index:
apache -> 1
conference -> 2
Jazoon -> 1, 2
lucene -> 1
9. The web search engine responds to a query that a user
enters into a web search engine to satisfy his or her
information needs
10.
11. a high-performance, scalable information retrieval
(IR) library
lets you add searching capabilities to your
applications.
free, open source project implemented in Java
With Lucene, you can index and search email
messages, mailing-list archives, instant messenger
chats, your wiki pages…the list goes on.
12. Web Search Engine Software
Open source web crawler
Coded entirely in the Java programming language
Advantages
Scalability
Crawler Politeness
Crawler Management
Quality
13. Open source enterprise search platform based on
Apache Lucene project.
Powerful full-text search, hit highlighting, faceted
search
Database integration, and rich document (e.g.,
Word, PDF) handling
14.
15. Download a binary package (apache-nutch-bin.zip)
cd apache-nutch-1.X/
bin/nutch crawl urls -dir crawl -depth 3 -topN 5
Now you should be able to see the following directories
created:
crawl/crawldb
crawl/linkdb
crawl/segments
16. If you have a Solr core already set up and wish to index
to it we should use
bin/nutch crawl urls -solr http://localhost:8983/solr/ -
depth 3 -topN 5
Now skip to here for how to set up your Solr instance
and index your crawl data.
17. Download binary file (apache-Solr-bin.zip)
cd ${APACHE_SOLR_HOME}/example
java -jar start.jar
After you started Solr admin console, you should be
able to access the following link:
http://localhost:8983/solr/admin/
Integrate Solr with Nutch
cp ${NUTCH_RUNTIME_HOME}/conf/schema.xml
${APACHE_SOLR_HOME}/example/solr/conf/
18. restart Solr with the command “java -jar start.jar”
under ${APACHE_SOLR_HOME}/example
run the Solr Index command:
bin/nutch solrindex http://127.0.0.1:8983/solr/
crawl/crawldb -linkdb crawl/linkdb crawl/segments/*