SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
Full­text searching with Marjory




               Markus Wolff




                     
What's Marjory?

        A webservice for full­text indexing and 
    


        searching of documents
        Written in PHP
    



        Based on Zend Framework
    



        (Very) Roughly comparable to Solr
    



        BSD­licensed, available on Google Code
    




                                
How does Marjory work?
                                Your application



                                 Sends search 
    Sends Document data                                Returns result in desired
                                 terms via GET
     or location via POST                             output format (default: XML)




                                   Marjory
                            (ReST­based webservice)



    Stores document data                                    Returns query
                                 Queries search
       in search engine                                        results
                                    engine




                                Search engine
                                            
                               (Default: Lucene)
Features

        Search engine abstraction
    


            use the engine that suits your needs, just write a 
        


            small adaptor class
            Zend_Search_Lucene built­in by default
        



        Multiple search catalogs
    


            Index many sites with one dedicated search server
        



            Put all documents matching any criteria into 
        


            separate search indexes to speed up search

                                     
More features

        Two ways to index documents:
    


            submit an XML snippet containing any content you 
        


            want to index
            or, just submit an URI (valid PHP stream resource) 
        


            and let Marjory extract the content from the 
            document
                 HTML supported by default (for now)
             



                 add your own document parser class to extract plain text 
             


                 from any other document format (or special markup 
                 structures)
                                         
Even more features

        Index documents asynchronously using Dropr 
    


        as a messaging service
            Dropr: PHP­based durable messaging service
        



            Example webservice and Dropr client included with 
        


            Marjory
            Application does not need to wait for document 
        


            retrieval, parsing and adding to the index
            More info: www.dropr.org
        




                                   
Latest additions

        Search results as a Dojo.Data compatible 
    


        JSON data source
        API exposure via JSON­RPC as alternative to 
    


        XML over ReST (experimental!)




                               
How to add a catalog

        Send a POST request to:
    


        http://marjory.example.com/rest/catalog/
        Containing this XML snippet:
    


        <add catalog=quot;MyGloriousCatalogquot; />
        Et voilá, you got yourself a new search index
    




                               
Adding a document

        Make a POST request to:
    


        http://marjory.example.com/rest/add/
        Send the document content as XML like this:
    


    <add catalog=quot;defaultquot;>
      <doc uri=quot;MyUniqueDocumentIdquot;>
          <field name=quot;titlequot;>Marjory: Search as a service</field>
          <field name=quot;abstractquot;>
            An epic novel about full­text indexing in an SOA environment
          </field>
          <field name=quot;contentquot;>Lorem ipsum dolor sit amet... (to be continued)</field> 
      </doc>

    </add>


                                                
Adding a document, the easy way

        Or, if Marjory should retrieve and parse the 
    


        document:
    <add catalog=quot;defaultquot;>
      <doc src=quot;http://my.website.tld/my/document.htmlquot; />
    </add>
        If you have many and/or complex documents, 
    


        better use Dropr to send messages to Marjory


                                
Searching for documents

        Make a GET request including the query terms:
    

        http://marjory.example.com/rest/select?q=Marjory
        Additional parameters to...
    


            Limit number of results
        



            Include only specific fields in response
        



            Specify a search catalog
        


                 Default catalog name: „default“ ­ who would have 
             


                 guessed?


                                         
Search response format

<?xml version=quot;1.0quot; encoding=quot;UTF­8quot;?>
<response>
  <responseHeader>
    <status>0</status><QTime>1</QTime>
  </responseHeader>

  <result numFound=quot;2quot; start=quot;0quot;>
   <doc>
    <str name=quot;idquot;>MA147LL/A</str>
    <str name=quot;namequot;>Apple 60 GB iPod Black</str>
   </doc>
   <doc>
    <str name=quot;idquot;>EN7800GTX/2DHTV/256M</str>
    <str name=quot;namequot;>ASUS Extreme N7800GTX</str>
   </doc>
  </result>
</response>


                                             
Looks familiar?

        Blatantly stolen from Solr :­)
    



        Why reinvent the wheel?
    



        Makes switching between the two projects easy 
    


        if need be
        Don't like it? Try JSON­RPC instead.
    




                                 
Access control

        No access control provided by Marjory
    



        Use your webserver's authentication and ACL 
    


        capabilities
        There are currently no plans to add anything 
    


        built­in, unless someone convinces me 
        otherwise :­)



                               
Things to do

        Fully unit­test the beast
    



        Add a nice admin GUI (currently in progress)
    



        Add other engines
    



        Support more document formats out of the box
    


        (PDF likely to be next addition)
        Fine­tuning (how about renaming or removing 
    


        catalogs, for example?)

                                 
Is it production­ready?

        Yes, and it's already being used on production 
    


        websites




                               
That's all, folks!

        More information:
    


            http://code.google.com/p/marjory/
        



            http://www.dropr.org/
        



            http://blog.wolff­hamburg.de/
        




                                     

Weitere ähnliche Inhalte

Was ist angesagt?

ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)
ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)
ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)Clément Wehrung
 
HTML5 - Introduction
HTML5 - IntroductionHTML5 - Introduction
HTML5 - IntroductionDavy De Pauw
 
Elements of html powerpoint
Elements of html powerpointElements of html powerpoint
Elements of html powerpointAnastasia1993
 
Web Development Using CSS3
Web Development Using CSS3Web Development Using CSS3
Web Development Using CSS3Anjan Mahanta
 
Web Development Using CSS3
Web Development Using CSS3Web Development Using CSS3
Web Development Using CSS3Anjan Mahanta
 
How to migrate from any CMS (thru the front-door)
How to migrate from any CMS (thru the front-door)How to migrate from any CMS (thru the front-door)
How to migrate from any CMS (thru the front-door)ICF CIRCUIT
 
Things I wish web graduates knew
Things I wish web graduates knewThings I wish web graduates knew
Things I wish web graduates knewLorna Mitchell
 
Choose Your Own Adventure: SEO For Web Developers | Unified Diff
Choose Your Own Adventure: SEO For Web Developers | Unified DiffChoose Your Own Adventure: SEO For Web Developers | Unified Diff
Choose Your Own Adventure: SEO For Web Developers | Unified DiffSteve Morgan
 
HTML Web design english & sinhala mix note
HTML Web design english & sinhala mix noteHTML Web design english & sinhala mix note
HTML Web design english & sinhala mix noteMahinda Gamage
 
DIWE - Coding HTML for Basic Web Designing
DIWE - Coding HTML for Basic Web DesigningDIWE - Coding HTML for Basic Web Designing
DIWE - Coding HTML for Basic Web DesigningRasan Samarasinghe
 

Was ist angesagt? (20)

ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)
ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)
ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)
 
Sightly - Part 2
Sightly - Part 2Sightly - Part 2
Sightly - Part 2
 
Html and Xhtml
Html and XhtmlHtml and Xhtml
Html and Xhtml
 
HTML5 - Introduction
HTML5 - IntroductionHTML5 - Introduction
HTML5 - Introduction
 
Introduction to WEB HTML, CSS
Introduction to WEB HTML, CSSIntroduction to WEB HTML, CSS
Introduction to WEB HTML, CSS
 
Ferret
FerretFerret
Ferret
 
Elements of html powerpoint
Elements of html powerpointElements of html powerpoint
Elements of html powerpoint
 
Web Development Using CSS3
Web Development Using CSS3Web Development Using CSS3
Web Development Using CSS3
 
Web Development Using CSS3
Web Development Using CSS3Web Development Using CSS3
Web Development Using CSS3
 
How to migrate from any CMS (thru the front-door)
How to migrate from any CMS (thru the front-door)How to migrate from any CMS (thru the front-door)
How to migrate from any CMS (thru the front-door)
 
Things I wish web graduates knew
Things I wish web graduates knewThings I wish web graduates knew
Things I wish web graduates knew
 
WWW and HTTP
WWW and HTTPWWW and HTTP
WWW and HTTP
 
Html and html5 cheat sheets
Html and html5 cheat sheetsHtml and html5 cheat sheets
Html and html5 cheat sheets
 
Choose Your Own Adventure: SEO For Web Developers | Unified Diff
Choose Your Own Adventure: SEO For Web Developers | Unified DiffChoose Your Own Adventure: SEO For Web Developers | Unified Diff
Choose Your Own Adventure: SEO For Web Developers | Unified Diff
 
HTML Web design english & sinhala mix note
HTML Web design english & sinhala mix noteHTML Web design english & sinhala mix note
HTML Web design english & sinhala mix note
 
DIWE - Coding HTML for Basic Web Designing
DIWE - Coding HTML for Basic Web DesigningDIWE - Coding HTML for Basic Web Designing
DIWE - Coding HTML for Basic Web Designing
 
HTML5
HTML5 HTML5
HTML5
 
Web page concept final ppt
Web page concept  final pptWeb page concept  final ppt
Web page concept final ppt
 
html
htmlhtml
html
 
Xhtml 2010
Xhtml 2010Xhtml 2010
Xhtml 2010
 

Andere mochten auch

Lezersonderzoek jo steyaert - indigov op Kortom studiedag dag van het informa...
Lezersonderzoek jo steyaert - indigov op Kortom studiedag dag van het informa...Lezersonderzoek jo steyaert - indigov op Kortom studiedag dag van het informa...
Lezersonderzoek jo steyaert - indigov op Kortom studiedag dag van het informa...Jo Steyaert
 
Corp Web Risks and Concerns
Corp Web Risks and ConcernsCorp Web Risks and Concerns
Corp Web Risks and ConcernsPINT Inc
 
California water footprint
California water footprintCalifornia water footprint
California water footprintprostoalex
 
PHP, AJAX und XUL im Intranet
PHP, AJAX und XUL im IntranetPHP, AJAX und XUL im Intranet
PHP, AJAX und XUL im IntranetMarkus Wolff
 
why LG IPS technology?
why LG IPS technology?why LG IPS technology?
why LG IPS technology?moshimoshi
 
Rapid Evolution of Web Dev? aka Talking About The Web
Rapid Evolution of Web Dev? aka Talking About The WebRapid Evolution of Web Dev? aka Talking About The Web
Rapid Evolution of Web Dev? aka Talking About The WebPINT Inc
 
Irrigation of agricultural crops in California
Irrigation of agricultural crops in CaliforniaIrrigation of agricultural crops in California
Irrigation of agricultural crops in Californiaprostoalex
 
Thoughts on Defensive Development for Sitecore
Thoughts on Defensive Development for SitecoreThoughts on Defensive Development for Sitecore
Thoughts on Defensive Development for SitecorePINT Inc
 
Omni-Proofing Your Organization
Omni-Proofing Your OrganizationOmni-Proofing Your Organization
Omni-Proofing Your OrganizationMonica Gout
 
Magento Performance Improvements with Client Side Optimizations
Magento Performance Improvements with Client Side OptimizationsMagento Performance Improvements with Client Side Optimizations
Magento Performance Improvements with Client Side OptimizationsPINT Inc
 
Review of Malcolm Gladwell's Outliers
Review of Malcolm Gladwell's OutliersReview of Malcolm Gladwell's Outliers
Review of Malcolm Gladwell's Outliersbrownab
 

Andere mochten auch (18)

Lezersonderzoek jo steyaert - indigov op Kortom studiedag dag van het informa...
Lezersonderzoek jo steyaert - indigov op Kortom studiedag dag van het informa...Lezersonderzoek jo steyaert - indigov op Kortom studiedag dag van het informa...
Lezersonderzoek jo steyaert - indigov op Kortom studiedag dag van het informa...
 
Corp Web Risks and Concerns
Corp Web Risks and ConcernsCorp Web Risks and Concerns
Corp Web Risks and Concerns
 
California water footprint
California water footprintCalifornia water footprint
California water footprint
 
The Shift Home
The Shift HomeThe Shift Home
The Shift Home
 
Mis 14 FAVs de #Empleo2020
Mis 14 FAVs de #Empleo2020Mis 14 FAVs de #Empleo2020
Mis 14 FAVs de #Empleo2020
 
PHP, AJAX und XUL im Intranet
PHP, AJAX und XUL im IntranetPHP, AJAX und XUL im Intranet
PHP, AJAX und XUL im Intranet
 
why LG IPS technology?
why LG IPS technology?why LG IPS technology?
why LG IPS technology?
 
Rapid Evolution of Web Dev? aka Talking About The Web
Rapid Evolution of Web Dev? aka Talking About The WebRapid Evolution of Web Dev? aka Talking About The Web
Rapid Evolution of Web Dev? aka Talking About The Web
 
Irrigation of agricultural crops in California
Irrigation of agricultural crops in CaliforniaIrrigation of agricultural crops in California
Irrigation of agricultural crops in California
 
Thoughts on Defensive Development for Sitecore
Thoughts on Defensive Development for SitecoreThoughts on Defensive Development for Sitecore
Thoughts on Defensive Development for Sitecore
 
Omni-Proofing Your Organization
Omni-Proofing Your OrganizationOmni-Proofing Your Organization
Omni-Proofing Your Organization
 
Magento Performance Improvements with Client Side Optimizations
Magento Performance Improvements with Client Side OptimizationsMagento Performance Improvements with Client Side Optimizations
Magento Performance Improvements with Client Side Optimizations
 
The Romanticism of Things
The Romanticism of ThingsThe Romanticism of Things
The Romanticism of Things
 
Who Gamifies the Gamificators?
Who Gamifies the Gamificators?Who Gamifies the Gamificators?
Who Gamifies the Gamificators?
 
Desarrollemos el negocio de la #mHealth
Desarrollemos el negocio de la #mHealthDesarrollemos el negocio de la #mHealth
Desarrollemos el negocio de la #mHealth
 
Transmisión de Conocimiento en Apps Sanitarias
Transmisión de Conocimiento en Apps SanitariasTransmisión de Conocimiento en Apps Sanitarias
Transmisión de Conocimiento en Apps Sanitarias
 
Dr. House Design Thinking
Dr. House Design ThinkingDr. House Design Thinking
Dr. House Design Thinking
 
Review of Malcolm Gladwell's Outliers
Review of Malcolm Gladwell's OutliersReview of Malcolm Gladwell's Outliers
Review of Malcolm Gladwell's Outliers
 

Ähnlich wie Search As A Service

Switching search to SOLR
Switching search to SOLRSwitching search to SOLR
Switching search to SOLRPhase2
 
REST Introduction (PHP London)
REST Introduction (PHP London)REST Introduction (PHP London)
REST Introduction (PHP London)Paul James
 
Turbogears Presentation
Turbogears PresentationTurbogears Presentation
Turbogears Presentationdidip
 
Advanced SEO for Web Developers
Advanced SEO for Web DevelopersAdvanced SEO for Web Developers
Advanced SEO for Web DevelopersNathan Buggia
 
Web Scraping In Ruby Utosc 2009.Key
Web Scraping In Ruby Utosc 2009.KeyWeb Scraping In Ruby Utosc 2009.Key
Web Scraping In Ruby Utosc 2009.Keyjtzemp
 
Guide 6 - Tapping Into Your Website Configuration File.pdf
Guide 6 - Tapping Into Your Website Configuration File.pdfGuide 6 - Tapping Into Your Website Configuration File.pdf
Guide 6 - Tapping Into Your Website Configuration File.pdfpersuebusiness
 
The Django Web Application Framework 2
The Django Web Application Framework 2The Django Web Application Framework 2
The Django Web Application Framework 2fishwarter
 
The Django Web Application Framework 2
The Django Web Application Framework 2The Django Web Application Framework 2
The Django Web Application Framework 2fishwarter
 
The Django Web Application Framework 2
The Django Web Application Framework 2The Django Web Application Framework 2
The Django Web Application Framework 2fishwarter
 
The Django Web Application Framework 2
The Django Web Application Framework 2The Django Web Application Framework 2
The Django Web Application Framework 2fishwarter
 
Cwinters Intro To Rest And JerREST and Jersey Introductionsey
Cwinters Intro To Rest And JerREST and Jersey IntroductionseyCwinters Intro To Rest And JerREST and Jersey Introductionsey
Cwinters Intro To Rest And JerREST and Jersey Introductionseyelliando dias
 
Getting More Traffic From Search Advanced Seo For Developers Presentation
Getting More Traffic From Search  Advanced Seo For Developers PresentationGetting More Traffic From Search  Advanced Seo For Developers Presentation
Getting More Traffic From Search Advanced Seo For Developers PresentationSeo Indonesia
 
Open Source Web Technologies
Open Source Web TechnologiesOpen Source Web Technologies
Open Source Web TechnologiesAastha Sethi
 
Rapid Application Development with WSO2 Platform
Rapid Application Development with WSO2 PlatformRapid Application Development with WSO2 Platform
Rapid Application Development with WSO2 PlatformWSO2
 
Fast and Easy Website Tuneups
Fast and Easy Website TuneupsFast and Easy Website Tuneups
Fast and Easy Website TuneupsJeff Wisniewski
 
Practical catalyst
Practical catalystPractical catalyst
Practical catalystdwm042
 
High Performance Web Pages - 20 new best practices
High Performance Web Pages - 20 new best practicesHigh Performance Web Pages - 20 new best practices
High Performance Web Pages - 20 new best practicesStoyan Stefanov
 

Ähnlich wie Search As A Service (20)

Switching search to SOLR
Switching search to SOLRSwitching search to SOLR
Switching search to SOLR
 
REST Introduction (PHP London)
REST Introduction (PHP London)REST Introduction (PHP London)
REST Introduction (PHP London)
 
Boost and SEO
Boost and SEOBoost and SEO
Boost and SEO
 
Turbogears Presentation
Turbogears PresentationTurbogears Presentation
Turbogears Presentation
 
Advanced SEO for Web Developers
Advanced SEO for Web DevelopersAdvanced SEO for Web Developers
Advanced SEO for Web Developers
 
Web Scraping In Ruby Utosc 2009.Key
Web Scraping In Ruby Utosc 2009.KeyWeb Scraping In Ruby Utosc 2009.Key
Web Scraping In Ruby Utosc 2009.Key
 
Guide 6 - Tapping Into Your Website Configuration File.pdf
Guide 6 - Tapping Into Your Website Configuration File.pdfGuide 6 - Tapping Into Your Website Configuration File.pdf
Guide 6 - Tapping Into Your Website Configuration File.pdf
 
The Django Web Application Framework 2
The Django Web Application Framework 2The Django Web Application Framework 2
The Django Web Application Framework 2
 
The Django Web Application Framework 2
The Django Web Application Framework 2The Django Web Application Framework 2
The Django Web Application Framework 2
 
The Django Web Application Framework 2
The Django Web Application Framework 2The Django Web Application Framework 2
The Django Web Application Framework 2
 
The Django Web Application Framework 2
The Django Web Application Framework 2The Django Web Application Framework 2
The Django Web Application Framework 2
 
Cwinters Intro To Rest And JerREST and Jersey Introductionsey
Cwinters Intro To Rest And JerREST and Jersey IntroductionseyCwinters Intro To Rest And JerREST and Jersey Introductionsey
Cwinters Intro To Rest And JerREST and Jersey Introductionsey
 
Getting More Traffic From Search Advanced Seo For Developers Presentation
Getting More Traffic From Search  Advanced Seo For Developers PresentationGetting More Traffic From Search  Advanced Seo For Developers Presentation
Getting More Traffic From Search Advanced Seo For Developers Presentation
 
Fast by Default
Fast by DefaultFast by Default
Fast by Default
 
Open Source Web Technologies
Open Source Web TechnologiesOpen Source Web Technologies
Open Source Web Technologies
 
Rapid Application Development with WSO2 Platform
Rapid Application Development with WSO2 PlatformRapid Application Development with WSO2 Platform
Rapid Application Development with WSO2 Platform
 
Php frameworks
Php frameworksPhp frameworks
Php frameworks
 
Fast and Easy Website Tuneups
Fast and Easy Website TuneupsFast and Easy Website Tuneups
Fast and Easy Website Tuneups
 
Practical catalyst
Practical catalystPractical catalyst
Practical catalyst
 
High Performance Web Pages - 20 new best practices
High Performance Web Pages - 20 new best practicesHigh Performance Web Pages - 20 new best practices
High Performance Web Pages - 20 new best practices
 

Kürzlich hochgeladen

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Kürzlich hochgeladen (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Search As A Service

  • 2. What's Marjory? A webservice for full­text indexing and   searching of documents Written in PHP  Based on Zend Framework  (Very) Roughly comparable to Solr  BSD­licensed, available on Google Code     
  • 3. How does Marjory work? Your application Sends search  Sends Document data Returns result in desired terms via GET or location via POST output format (default: XML) Marjory (ReST­based webservice) Stores document data Returns query Queries search in search engine results engine Search engine     (Default: Lucene)
  • 4. Features Search engine abstraction  use the engine that suits your needs, just write a   small adaptor class Zend_Search_Lucene built­in by default  Multiple search catalogs  Index many sites with one dedicated search server  Put all documents matching any criteria into   separate search indexes to speed up search    
  • 5. More features Two ways to index documents:  submit an XML snippet containing any content you   want to index or, just submit an URI (valid PHP stream resource)   and let Marjory extract the content from the  document HTML supported by default (for now)  add your own document parser class to extract plain text   from any other document format (or special markup  structures)    
  • 6. Even more features Index documents asynchronously using Dropr   as a messaging service Dropr: PHP­based durable messaging service  Example webservice and Dropr client included with   Marjory Application does not need to wait for document   retrieval, parsing and adding to the index More info: www.dropr.org     
  • 7. Latest additions Search results as a Dojo.Data compatible   JSON data source API exposure via JSON­RPC as alternative to   XML over ReST (experimental!)    
  • 8. How to add a catalog Send a POST request to:  http://marjory.example.com/rest/catalog/ Containing this XML snippet:  <add catalog=quot;MyGloriousCatalogquot; /> Et voilá, you got yourself a new search index     
  • 9. Adding a document Make a POST request to:  http://marjory.example.com/rest/add/ Send the document content as XML like this:  <add catalog=quot;defaultquot;> <doc uri=quot;MyUniqueDocumentIdquot;>     <field name=quot;titlequot;>Marjory: Search as a service</field>     <field name=quot;abstractquot;> An epic novel about full­text indexing in an SOA environment     </field>     <field name=quot;contentquot;>Lorem ipsum dolor sit amet... (to be continued)</field>  </doc> </add>    
  • 10. Adding a document, the easy way Or, if Marjory should retrieve and parse the   document: <add catalog=quot;defaultquot;>   <doc src=quot;http://my.website.tld/my/document.htmlquot; /> </add> If you have many and/or complex documents,   better use Dropr to send messages to Marjory    
  • 11. Searching for documents Make a GET request including the query terms:  http://marjory.example.com/rest/select?q=Marjory Additional parameters to...  Limit number of results  Include only specific fields in response  Specify a search catalog  Default catalog name: „default“ ­ who would have   guessed?    
  • 13. Looks familiar? Blatantly stolen from Solr :­)  Why reinvent the wheel?  Makes switching between the two projects easy   if need be Don't like it? Try JSON­RPC instead.     
  • 14. Access control No access control provided by Marjory  Use your webserver's authentication and ACL   capabilities There are currently no plans to add anything   built­in, unless someone convinces me  otherwise :­)    
  • 15. Things to do Fully unit­test the beast  Add a nice admin GUI (currently in progress)  Add other engines  Support more document formats out of the box  (PDF likely to be next addition) Fine­tuning (how about renaming or removing   catalogs, for example?)    
  • 16. Is it production­ready? Yes, and it's already being used on production   websites    
  • 17. That's all, folks! More information:  http://code.google.com/p/marjory/  http://www.dropr.org/  http://blog.wolff­hamburg.de/ 