Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Using Lucene for Search within XIS

1.423 Aufrufe

Veröffentlicht am

Allex Lyons, a programmer at Access Innovations, Inc., talks about the decision made by this company to apply a faster, more reliable and efficient Lucene index to XIS for searching docsets, instead of a random access file.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Using Lucene for Search within XIS

  1. 1. XIS Lucene Indexing and Search
  2. 2. What is XIS?  XIS is a XML schema-based database system used to store user data  All records are stored in individual XML files  Option to zip XML files available with XIS Project DTD
  3. 3. How XIS Data Is Stored  Docsets  Stores records with multiple fields (similar to SQL Table)  Can also have subfields and lists of field values nested within a record  Can look up values from other fields in other Docsets or other tables  Tables  Stores a single list of values  Can be referenced by other Docsets  Can be directly accessible for editing or kept hidden from user view
  4. 4. How to Create a XIS Project  Create DTD file for XIS project  Specify MAI Thesaurus to link to project  Create Docset and Tables  Specify ID lengths for each Docset  Create fields for Docsets  Save DTD to dhserver/projects/projects/xml folder  Create XIS Project folder under dhserver/data  Create subfolders for each Docset under XIS Project folder as well as Tables directory  XIS Projects can only be created by administrators
  5. 5. Starting a XIS Project  Start Data Harmony server where project is located  Log in to Admin module  Start MAI Thesaurus  Start XIS Project  Index XIS Project, especially if just created  Run startXis program  Enter server, port, thesaurus, username, and password to log in
  6. 6. Indexing a XIS Project
  7. 7. XIS Login Screen
  8. 8. XIS Project View
  9. 9. XIS Docset View
  10. 10. XIS Table View
  11. 11. XIS Record Format  Saved in XML file  Starts with tag to represent Docset name along with ID as attribute  Fields are listed within Docset tag along with values. Subfields are nested within their parent fields
  12. 12. XIS Search View
  13. 13. XIS Search Results
  14. 14. Current XIS Indexing and Search  Uses text-based indexes  Creates large number of index files (one for each field)  Generates temporary files for results  Uses less reliable RandomAccessFile search  Has limited amount of search operands  Does not take into account numerical values
  15. 15. Lucene vs. Current XIS Index  Fewer index files needed  Allows for broader searches  Fuzzy matching  Start and end wildcard searches  Recognizes numerical and date fields as such  Can be utilized to remove stopwords
  16. 16. New Lucene Search Process  Establish index reader to perform search  Submit query string containing fields and parameters  Return results
  17. 17. Other Lucene Functions  Will be used for adding, updating, and deleting XIS records  Indexes will be housed on Data Harmony server
  18. 18. Any Questions?