Recommendations for the automatic enrichment of digital library content using...
Instasearch -- Eclipse IT 2010
1. An Eclipse Plug-in for Code Search
using Full-text
Information Retrieval Engine
Andrejs Jermakovics
Francesco Di Cerbo
Free University of Bolzano-Bozen
Bolzano-Bozen, Italy
2. Introduction
● Search is becoming an important aspect of
software development motivated by
growing code sizes and open-source
availability.
● Moreover, users (and also developers) are
getting used to full-text searches for
emails and file contents.
● Why not creating a plugin to support full-
text searches in Eclipse?
2
3. Searches in Java: Lucene
● Lucene is an
information
retrieval software.
● It is FLOSS, and is
used in many other
software:
● Alfresco
● SolR
● Eclipse
● ...
3
4. Lucene in Eclipse
● Lucene is part of Eclipse.
● NOT for the Search functions!
● It is mostly used for the Eclipse Help!
● … Why not using it for CODE?
CODE
4
5. Instasearch
● InstaSearch provides powerful and flexible
code search with high performance.
● It offers flexibility through Lucene query
syntax, defining a set of specific fields for
file searches.
● The currently available fields are:
Field Description
file Full path of the file
name Name of the file
ext Extension of the file
proj Name of the project containing the file
jar Name of the jar if file is stored in a jar
contents Contents of the file (default search field)
ws Working set containing projects (virtual field)
5
6. Instasearch searches
● Supported search types:
● wildcard searches to search using a substring:
● app* initialize
● searches on fields value:
● proj:MyProject ext:java,xml application init
● ws:MyWorkingSet application init
● fuzzy searches to find similar matches:
● application init~
● advanced queries:
● index AND (directory OR dir)
6
11. Instasearch components (1/2)
● Analyzer
● It reads files from the workspace and splits the
text into a set of tokens. Both the original
word and its split parts are indexed thus
allowing to search for parts of identifiers as
well as searching for the exact identifier.
● Indexer
● The Indexer collects files with their tokens and
writes them to Lucene index. The meta-data
associated with each file is specified using
several fields which can later be used in a
Lucene search query to filter results.
11
12. Instasearch components (2/2)
● Query analyzer
● Parses the search text entered by the user and
creates a search query which is used to
retrieve the files from the index. It also
elaborate the search queries, e.g. dealing with
virtual fields.
● Instasearch View
● Performs all UI interactions for getting the
search text and displaying a list of matching
files. The search is performed while the search
text is being typed thus allows the user to tune
it quickly for more relevant results.
12
13. Conclusions
● Instasearch is very fast, using a dynamic
update mechanism it does not intefere
with usual Eclipse tasks.
● It is released under EPL license, it is
available on the Eclipse Market, and it is
hosted on Free University of Bolzano-
Bozen FLOSS forge.
13
15. Thanks for your attention!
Andrejs Jermakovics
andrejs.jermakovics@unibz.it
Francesco Di Cerbo
fdicerbo@unibz.it
https://code.inf.unibz.it/projects/instasearch
Mark Instasearch as favourite on
the Eclipse Market!
15