SlideShare a Scribd company logo
1 of 32
23 april 2012
SEARCH SYSTEM ARCHITECTURE
    & MEASURING SEARCH
       EFFECTIVENESS
  Introduction to text mining – Warsaw University of Technology
Plan

 Findwise – who we are, what we do.
 General architecture of search engines
 Data sources
 Content processing
 Search index
 Query and result processing
 Security in search engines
 Applications based on search
 Leading search technologies
 The concept of Findability
 Differences in online and enterprise search
 Measuring of search effectiveness
 Questions and answers
Findwise – Search Driven Solutions

 • Founded in 2005


 • Offices in Sweden, Denmark,
     Norway and Poland


 • 75+ employees


 Our objective is to be a leading provider of Findability solutions utilising
 the full potential of search technology to create customer business value.


 •   Paweł Wróblewski – search enthusiast
General architecture of search engines
Important terms




           Latency
Feeding     Indexing   Searching
Data sources

 Everything that has an information is a good source!
 We need a connector to feed the data into a search system:
 Take the content
 Take the metadata
 Take the security information
 Different strategies to feed the data:
 Push – external applications invokes search system connector’s API
 to feed the content (e.g. transactional systems)
 Pull – connector periodically scans the source and takes the data
 (e.g. web crawler, file system)
 Hybrid – external systems dumps the data which are pulled by a
 connector
Content Processing – the idea

               Format          Language                     Spell          Lemmas
                                            Synonyms
               Conversion      Detection                    Checking       (tenses, forms)




Document
                                                                              Geography
              Taxonomy                       Custom                           Companies
                               Vectorizer                    Entities
              Classification                 PLUG-IN                          People




                Scopifier       index            PARIS (Reuters) - Venus Williams raced into the second round of
                                                      the $11.25 million French Open Monday, brushing aside
                                                      Bianka Lamade, 6-3, 6-3, in 65 minutes.

                                                  The Wimbledon and U.S. Open champion, seeded second, breezed
                                                      past the German on a blustery center court to become the
                                                      first seed to advance at Roland Garros. "I love being here, I
                                                      love the French Open and more than anything I'd love to do
                                                      well here," the American said.
Input: byte stream
Output: structured document ready to be indexed
Content Processing – the implementation

 Hydra is used in order to refine content before it hits the index. Every
 document fetched from a source runs through a targeted pipeline,
 which includes a number of stages. A stage can be considered as an
 “app” within Appstore or the Android market. Findwise have created
 a huge amount of such stages, where each stage has a small
 purpose to enhance the content of the item. It is possible to create
 additional stages to serve a specific customer functionality.
Hydra - example

 Select stages to use in the pipeline, the left column corresponds to the
 “market”, and the right is the stages used.
Hydra - example

 Modify the format of the date to only include year.
Hydra - example

 The new year meta-data can be used as a facet
Hydra - example

 Map every author field to a metadata field called author.
 Pipeline A




 Pipeline B
Hydra - example

 In the search result…
Search index – the problem

 Input: structured document (content + metadata)
 Output: binary represenation of inverted index optimised for speed
 and acuracy
 Search index has a flat structure – no internal relations
 Usually changes to the index structure require index rebuild (re-
 indexing)
Search index – the problem

 Inverted index
                                                    Index split
 Theory in previous lectures
                                      M
 How to achieve
                                                                       …
 Petabytes of indexed data     Indexing / Search
                               Node 00
                                                   Indexing / Search
                                                   Node 10
                                                                           Indexing / Search
                                                                           Node N0

 Thousands of queries                                     M




                                                                                               Index mirror
 per second




                                                                           ...
                                                                           ……
 Thousands of index            Indexing / Search
                               Node 01
                                                   Indexing / Search
                                                   Node 11

 updates per second

                                     …                                            M
 FAST Enterprise                                                       …
 Search Platform –             Indexing / Search
                               Node 0M
                                                   Indexing / Search
                                                   Node 1M
                                                                           Indexing / Search
                                                                           Node NM

 search cluster example
                                                   Search Cluster
Search index – the implementation

 In order to perform effective updates (index rebuilds) several index
 partitions are produced




                                            Index
                                   Index
                           Index
                   Index




 Small partition rebuilds quickly unlike the big one
 Rebuild of larger partition involves merging index from smaller
 one(s)
 Rebuilds can be triggered by: number or rebuild operations, number
 of documents, percent of total volume
Query processing

Query:                                                           Do you have a


Do you have an                   Spell-                          Anti-
                 Tokenizer                          Phrasing                     Normalization
LCD monitor                      checking                        phrasing
under $900?
                 Under $900?     LCD monitors        Flat TV                       YES!
                 
                 price < 900     TFT monitors        Plasma TV                     X = LCD monitor

                                Lemmas
                     NLQ                          Thesaurus      PLUG-IN          BUY( X )
                                Synonyms


                               Use “Product” collection
                               Rank profile = “Profit margin”
                                                                                 Modified query
                 Geography         Adaptive
                                   Evaluation




                                                                                                     18
Result processing

 The following issues might apply to results processing:
 Ranking generation
         Factors that can be considered: number of hits, proximity of hits,
           freshness (date), web measures (e.g. page rank), business and context
           factors (boosting or blocking)
 Search federation
         Integration of results from multiple search engines: round robin,
            normalized ranks, searchlets (multiple results lists presented in
            different way).
 Security trimming
         Filtering out the results that do not match user’s credentials
         Last second check
Security in search solution




                                             Search Application Security
 Content-level Security




                          Secure Server Environment
                                                            20
Search Based Applications




Search Driven Solutions = Customisation of search system
components
Catalogue of Search Based Applications

                             Intelligence          Database           Commerce
  Corporate Search                                                                     Media Systems
                               System              Offloading          Systems
  • Intranets/portals     • Market              • Data warehouse   • Search           • Public news
  • Information             intelligence        • Data               merchandising      syndication
    gateways              • Customer              transformation   • Customer         • Mulitmedia
  • Expertise               intelligence        • Data caches        analytics          search
    location              • Surveillance                           • Campaign         • Proprietary
  • ECM                   • IP protection                            management         research and
    repositories          • Fraud detection                        • Call centre        publications
  • Collaboration         • eDiscovery                               enablement       • Libraries
  • Knowledge             • Quality                                • Customer self-
    Management              Management                               service
  • Enterprise apps       • Information risk
                            management



                                               Search subsystem

                            Data connectors – out of the box, custom made

                        Repositories – Web, Databases, Files, Enterprise systems
Leading search engine technologies

 • HP / Autonomy IDOL
 • Microsoft (SharePoint and FAST Search products)
 • Google Search Appliance (GSA )
 • IBM Content Analytics/OmniFind
 • Oracle Secure Enterprise Search/Endeca
 • Apache Lucene/Solr (Open source)
 • Exalead CloudView

 • and more…
Comparison of different technology vendors
  What is the goal of Enterprise Findability
   (EF)?                                                       Core search
  How should EF improve business?                             technology


  What user groups are targeted?               Usability                          Vendor
                                                                                   capabilitie
  What does the users’ want and need?                                             s

  What information is available and where is it
   stored?
  How should EF be rolled out and governed?                                 Total cost
  What costs are involved?                         Connectivity             of
                                                                             ownership
                                                    and security
  Are there any IT strategy considerations?
  Vendor mapping provides an answer to which
   EF platform matches the overall requirements
   best on the short and long term
Findability – what is it?

Negligible        Business value gained from search technology       High


                       Business (needs & goals)

                     Users (needs & capabilities)

 SEARCH
                          Search Technology
 <simple>

                   Information (quality & structure)

                Organisation (ownership & governance)



Basic                  Use of search technology/platform
                                                                 Advanced


– a holistic approach to leverage business value with search
technology
Online vs. Enterprise Search

 According to Stephen E. Arnold, „The New Landscape of Enterprise
 Search”, Pandia, July 2011
Online vs. Enterprise Search

 According to Stephen E. Arnold, „The New Landscape of Enterprise
 Search”, Pandia, July 2011
Measuring the search effectiveness

 Enterprise case
 Relevance of search results is highly subjective
 Search is highly bound to business otherwise not important to
 consider
         Increase income or reduce costs
 Take into consideration all the dimensions of Findability:
         Business: Needs & Goals
         Users: Needs & Capabilities
         Information: Quality & Structure
         Organization: Ownership & Governance
         Search Technology: correctness of implementation
 Tools: reviews, workshops, presentations, strategies drafting, audits
 etc.
Measuring the search effectiveness

 Online case
 Relevance of search results is highly subjective
 Search is highly bound to business otherwise not important to
 consider
         Increase conversion rate
 Verification od search functions and their impact on conversion rate
         Make isolated tests per each identified feature
         Create a score based on a weighted average
the results reported for each single test is composed of the two following elements:

                                                              Overall benchmark
                                                              Cumulated results for test groups

             Measuring the search effectiveness
udit – the Final Report                                     Overall benchmark     IPMS
                                                            Test categories designed for the purpose of audit are generally applicable to any kind of a search
                          actively find and filter items in service or solution. Nevertheless some of them are less while some are more important in specific
                                                             a map.
g by           3          It is useful feature that aids in finding items closest to
                                                            application like online Yellow Pages catalogue. That is why a weight is assigned to each test that
 ce
 ased
               Online case
               3
                          selected position.
                                                            represents an importance and influence on the whole YP solution. The defined weights are described
                          Useful feature enabling mining the neighborhood of
stions                    selected item.                    in the following table.
h starting     Example as first impression and encouraging users
               4   As important
                          to interact with the service. Test           Name            Weight     Remarks
esult page     3          It is important not to miss any category to offer
                                                          opportunity                  [1-5]
                          another kind of search, content or advertisements.
                                                          I.a          Keyword match   5          This is basic feature of any full-text search system and it
h              5          Extremely important factor in online search solutions.
 mance                                                                                            mostly influences the overall precision of search.
                                                     I.b          Wildcard             2          Users of YP solutions rarely uses such features.
                                                                  expansion
mark score is presented in the following chart.
                                                     I.c          Accuracy of result   4          The importance of properly assigned categories to
                                                                  categories                      registered entries is high since it influences usability and
             Overall weighted scores                                                              relevance of categories.
     6                                               I.d          Query operators      1          Users of YP solutions uses such features hardly ever.
     5
                                                     I.e          Exact phrases        3          It might be important to catch exact phrase in a search
                                                                                                  preventing any background processing.
     4
                                                     II.a
                                                     iFind
                                                                  Lemmatization        5          This is a must-be for any kind of search, especially for
     3                                                                                            Polish language.
                                                     PKT
     2                                               II.b         Synonym              3          It is useful to improve recall of search thus preventing
                                                     PF
     1
                                                                  expansion                       zero results.
                                                     II.c         Spellchecking        4          Very useful feature as people tend to make simple
     0
                                                                                                  spelling mistakes while typing at keyboard.
                                                     II.d         Anti-phrasing        3          It is useful not to search for irrelevant and meaningless
                                                                                                  terms.
alculation the overall benchmark can be expressed as cumulative weighted score
                                                       II.e          Name and phrase   3          It is useful to capture some multi-word expressions or
es 1-10. The ideal hypothetic search system should achieve score 10.
                                                                  recognition                     names as a whole – in single meaning.
re as follows for the conducted tests:               II.f         Natural Language     2          Vey advanced yet hard to implement feature.
                                                                  Processing
53
                                                     III.a        Navigation           4          Very useful feature enabling easy to use and intuitive
QUESTIONS?
Paweł Wróblewski
pawel.wroblewski@findwise.com

More Related Content

Viewers also liked

Enterprise Search in SharePoint 2013
Enterprise Search in SharePoint 2013Enterprise Search in SharePoint 2013
Enterprise Search in SharePoint 2013Findwise
 
Best Practices for Enterprise Search - What Leading Practitioners Do
Best Practices for Enterprise Search - What Leading Practitioners DoBest Practices for Enterprise Search - What Leading Practitioners Do
Best Practices for Enterprise Search - What Leading Practitioners DoFindwise
 
Findability Day 2016 - Structuring content for user experience
Findability Day 2016 - Structuring content for user experienceFindability Day 2016 - Structuring content for user experience
Findability Day 2016 - Structuring content for user experienceFindwise
 
Findability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learningFindability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learningFindwise
 
Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findwise
 
Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findwise
 
How to be successful with search in your organisation
How to be successful with search in your organisationHow to be successful with search in your organisation
How to be successful with search in your organisationvoginip
 
Findability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligenceFindability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligenceFindwise
 

Viewers also liked (8)

Enterprise Search in SharePoint 2013
Enterprise Search in SharePoint 2013Enterprise Search in SharePoint 2013
Enterprise Search in SharePoint 2013
 
Best Practices for Enterprise Search - What Leading Practitioners Do
Best Practices for Enterprise Search - What Leading Practitioners DoBest Practices for Enterprise Search - What Leading Practitioners Do
Best Practices for Enterprise Search - What Leading Practitioners Do
 
Findability Day 2016 - Structuring content for user experience
Findability Day 2016 - Structuring content for user experienceFindability Day 2016 - Structuring content for user experience
Findability Day 2016 - Structuring content for user experience
 
Findability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learningFindability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learning
 
Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016
 
Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016
 
How to be successful with search in your organisation
How to be successful with search in your organisationHow to be successful with search in your organisation
How to be successful with search in your organisation
 
Findability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligenceFindability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligence
 

Similar to Architecture of Search Systems and Measuring the Search Effectiveness

LiquidPub: Services at Service of Science
LiquidPub: Services at Service of ScienceLiquidPub: Services at Service of Science
LiquidPub: Services at Service of ScienceAliaksandr Birukou
 
Solr the intelligent search engine
Solr the intelligent search engineSolr the intelligent search engine
Solr the intelligent search engineCS2 AG
 
Recommendations play @flipkart (3)
Recommendations play @flipkart (3)Recommendations play @flipkart (3)
Recommendations play @flipkart (3)hava101
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingyalisassoon
 
OneTick and the R mathematical language, a presentation from R in Finance
OneTick and the R mathematical language, a presentation from R in FinanceOneTick and the R mathematical language, a presentation from R in Finance
OneTick and the R mathematical language, a presentation from R in FinanceOneMarketData, LLC
 
Microsoft - The Big Data opportunity
Microsoft - The Big Data opportunityMicrosoft - The Big Data opportunity
Microsoft - The Big Data opportunityLee Stott
 
AWS Update | London - Amazon CloudSearch
AWS Update | London - Amazon CloudSearchAWS Update | London - Amazon CloudSearch
AWS Update | London - Amazon CloudSearchAmazon Web Services
 
Keyword Services Platform (KSP) from Microsoft adCenter
Keyword Services Platform (KSP) from Microsoft adCenterKeyword Services Platform (KSP) from Microsoft adCenter
Keyword Services Platform (KSP) from Microsoft adCentergoodfriday
 
Crowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over HadoopCrowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over HadoopDataWorks Summit
 
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected IntelligenceHadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected IntelligenceTed Dunning
 
Document Management with Nuxeo: Flexible approach to document & content manag...
Document Management with Nuxeo: Flexible approach to document & content manag...Document Management with Nuxeo: Flexible approach to document & content manag...
Document Management with Nuxeo: Flexible approach to document & content manag...Nuxeo
 
Non techie journey in social internet age noiselessinnovation
Non techie journey in social internet age noiselessinnovationNon techie journey in social internet age noiselessinnovation
Non techie journey in social internet age noiselessinnovationframeworks2go.com
 
Linking Services and Linked Data: Keynote for AIMSA 2012
Linking Services and Linked Data: Keynote for AIMSA 2012Linking Services and Linked Data: Keynote for AIMSA 2012
Linking Services and Linked Data: Keynote for AIMSA 2012John Domingue
 
II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...
II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...
II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...Dr. Haxel Consult
 
Implementing Big Data at the Speed of Business
Implementing Big Data at the Speed of BusinessImplementing Big Data at the Speed of Business
Implementing Big Data at the Speed of BusinessDataWorks Summit
 
Indexing and Searching Cross Media Content in a Social Network
Indexing and Searching Cross Media Content in a Social NetworkIndexing and Searching Cross Media Content in a Social Network
Indexing and Searching Cross Media Content in a Social NetworkPaolo Nesi
 
Connecting Products, Publications and People - David Kavanagh
Connecting Products, Publications and People - David KavanaghConnecting Products, Publications and People - David Kavanagh
Connecting Products, Publications and People - David KavanaghIncisive_Events
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseSplunk
 
Tracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingTracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingHemant Kumar
 
Dreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_CasesDreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_CasesNarayan Bharadwaj
 

Similar to Architecture of Search Systems and Measuring the Search Effectiveness (20)

LiquidPub: Services at Service of Science
LiquidPub: Services at Service of ScienceLiquidPub: Services at Service of Science
LiquidPub: Services at Service of Science
 
Solr the intelligent search engine
Solr the intelligent search engineSolr the intelligent search engine
Solr the intelligent search engine
 
Recommendations play @flipkart (3)
Recommendations play @flipkart (3)Recommendations play @flipkart (3)
Recommendations play @flipkart (3)
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changing
 
OneTick and the R mathematical language, a presentation from R in Finance
OneTick and the R mathematical language, a presentation from R in FinanceOneTick and the R mathematical language, a presentation from R in Finance
OneTick and the R mathematical language, a presentation from R in Finance
 
Microsoft - The Big Data opportunity
Microsoft - The Big Data opportunityMicrosoft - The Big Data opportunity
Microsoft - The Big Data opportunity
 
AWS Update | London - Amazon CloudSearch
AWS Update | London - Amazon CloudSearchAWS Update | London - Amazon CloudSearch
AWS Update | London - Amazon CloudSearch
 
Keyword Services Platform (KSP) from Microsoft adCenter
Keyword Services Platform (KSP) from Microsoft adCenterKeyword Services Platform (KSP) from Microsoft adCenter
Keyword Services Platform (KSP) from Microsoft adCenter
 
Crowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over HadoopCrowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over Hadoop
 
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected IntelligenceHadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
 
Document Management with Nuxeo: Flexible approach to document & content manag...
Document Management with Nuxeo: Flexible approach to document & content manag...Document Management with Nuxeo: Flexible approach to document & content manag...
Document Management with Nuxeo: Flexible approach to document & content manag...
 
Non techie journey in social internet age noiselessinnovation
Non techie journey in social internet age noiselessinnovationNon techie journey in social internet age noiselessinnovation
Non techie journey in social internet age noiselessinnovation
 
Linking Services and Linked Data: Keynote for AIMSA 2012
Linking Services and Linked Data: Keynote for AIMSA 2012Linking Services and Linked Data: Keynote for AIMSA 2012
Linking Services and Linked Data: Keynote for AIMSA 2012
 
II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...
II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...
II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...
 
Implementing Big Data at the Speed of Business
Implementing Big Data at the Speed of BusinessImplementing Big Data at the Speed of Business
Implementing Big Data at the Speed of Business
 
Indexing and Searching Cross Media Content in a Social Network
Indexing and Searching Cross Media Content in a Social NetworkIndexing and Searching Cross Media Content in a Social Network
Indexing and Searching Cross Media Content in a Social Network
 
Connecting Products, Publications and People - David Kavanagh
Connecting Products, Publications and People - David KavanaghConnecting Products, Publications and People - David Kavanagh
Connecting Products, Publications and People - David Kavanagh
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk Enterprise
 
Tracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingTracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracing
 
Dreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_CasesDreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_Cases
 

More from Findwise

White Arkitekter - Findability Day Roadshow 2017
White Arkitekter - Findability Day Roadshow 2017White Arkitekter - Findability Day Roadshow 2017
White Arkitekter - Findability Day Roadshow 2017Findwise
 
AI och maskininlärning - Findability Day Roadshow 2017
AI och maskininlärning - Findability Day Roadshow 2017AI och maskininlärning - Findability Day Roadshow 2017
AI och maskininlärning - Findability Day Roadshow 2017Findwise
 
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017De kognitiva eran med IBM Watson - Findability Day Roadshow 2017
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017Findwise
 
Findwise and IBM Watson
Findwise and IBM WatsonFindwise and IBM Watson
Findwise and IBM WatsonFindwise
 
Digital workplace och informationshantering i office 365
Digital workplace och informationshantering i office 365Digital workplace och informationshantering i office 365
Digital workplace och informationshantering i office 365Findwise
 
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...Findwise
 
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any mess
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any messFindability Day 2015 - Abby Covert - Keynote - How to make sense of any mess
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any messFindwise
 
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...Findwise
 
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...Findwise
 
Findability Day 2015 - Martin White - The future is search!
Findability Day 2015 - Martin White - The future is search!Findability Day 2015 - Martin White - The future is search!
Findability Day 2015 - Martin White - The future is search!Findwise
 
Findability Day 2015 Liam Holley - Dassault systems - Insight and discovery...
Findability Day 2015   Liam Holley - Dassault systems - Insight and discovery...Findability Day 2015   Liam Holley - Dassault systems - Insight and discovery...
Findability Day 2015 Liam Holley - Dassault systems - Insight and discovery...Findwise
 
Findability Day 2015 Joachim Dahl - Virtual Works - 360 degree view of the ...
Findability Day 2015   Joachim Dahl - Virtual Works - 360 degree view of the ...Findability Day 2015   Joachim Dahl - Virtual Works - 360 degree view of the ...
Findability Day 2015 Joachim Dahl - Virtual Works - 360 degree view of the ...Findwise
 
Findability Day 2015 Anders Fors - Volvo Bus - A cost efficient R&D with EX...
Findability Day 2015   Anders Fors - Volvo Bus - A cost efficient R&D with EX...Findability Day 2015   Anders Fors - Volvo Bus - A cost efficient R&D with EX...
Findability Day 2015 Anders Fors - Volvo Bus - A cost efficient R&D with EX...Findwise
 
Logganalys med Elastic & Findwise
Logganalys med Elastic & FindwiseLogganalys med Elastic & Findwise
Logganalys med Elastic & FindwiseFindwise
 
BigData med logganalys
BigData med logganalysBigData med logganalys
BigData med logganalysFindwise
 
Intranet focus search strategy a z - from Findability Day 2014
Intranet focus search strategy a z - from Findability Day 2014Intranet focus search strategy a z - from Findability Day 2014
Intranet focus search strategy a z - from Findability Day 2014Findwise
 
Findability Day 2014 Neo4j how graph data boost your insights
Findability Day 2014 Neo4j how graph data boost your insightsFindability Day 2014 Neo4j how graph data boost your insights
Findability Day 2014 Neo4j how graph data boost your insightsFindwise
 
Martin White it's not the technology it's the content
Martin White it's not the technology it's the contentMartin White it's not the technology it's the content
Martin White it's not the technology it's the contentFindwise
 
Models and beer Findability Day 2014
Models and beer Findability Day 2014Models and beer Findability Day 2014
Models and beer Findability Day 2014Findwise
 
Designing the search experience the language of discovery - Findability Day 2014
Designing the search experience the language of discovery - Findability Day 2014Designing the search experience the language of discovery - Findability Day 2014
Designing the search experience the language of discovery - Findability Day 2014Findwise
 

More from Findwise (20)

White Arkitekter - Findability Day Roadshow 2017
White Arkitekter - Findability Day Roadshow 2017White Arkitekter - Findability Day Roadshow 2017
White Arkitekter - Findability Day Roadshow 2017
 
AI och maskininlärning - Findability Day Roadshow 2017
AI och maskininlärning - Findability Day Roadshow 2017AI och maskininlärning - Findability Day Roadshow 2017
AI och maskininlärning - Findability Day Roadshow 2017
 
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017De kognitiva eran med IBM Watson - Findability Day Roadshow 2017
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017
 
Findwise and IBM Watson
Findwise and IBM WatsonFindwise and IBM Watson
Findwise and IBM Watson
 
Digital workplace och informationshantering i office 365
Digital workplace och informationshantering i office 365Digital workplace och informationshantering i office 365
Digital workplace och informationshantering i office 365
 
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...
 
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any mess
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any messFindability Day 2015 - Abby Covert - Keynote - How to make sense of any mess
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any mess
 
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...
 
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...
 
Findability Day 2015 - Martin White - The future is search!
Findability Day 2015 - Martin White - The future is search!Findability Day 2015 - Martin White - The future is search!
Findability Day 2015 - Martin White - The future is search!
 
Findability Day 2015 Liam Holley - Dassault systems - Insight and discovery...
Findability Day 2015   Liam Holley - Dassault systems - Insight and discovery...Findability Day 2015   Liam Holley - Dassault systems - Insight and discovery...
Findability Day 2015 Liam Holley - Dassault systems - Insight and discovery...
 
Findability Day 2015 Joachim Dahl - Virtual Works - 360 degree view of the ...
Findability Day 2015   Joachim Dahl - Virtual Works - 360 degree view of the ...Findability Day 2015   Joachim Dahl - Virtual Works - 360 degree view of the ...
Findability Day 2015 Joachim Dahl - Virtual Works - 360 degree view of the ...
 
Findability Day 2015 Anders Fors - Volvo Bus - A cost efficient R&D with EX...
Findability Day 2015   Anders Fors - Volvo Bus - A cost efficient R&D with EX...Findability Day 2015   Anders Fors - Volvo Bus - A cost efficient R&D with EX...
Findability Day 2015 Anders Fors - Volvo Bus - A cost efficient R&D with EX...
 
Logganalys med Elastic & Findwise
Logganalys med Elastic & FindwiseLogganalys med Elastic & Findwise
Logganalys med Elastic & Findwise
 
BigData med logganalys
BigData med logganalysBigData med logganalys
BigData med logganalys
 
Intranet focus search strategy a z - from Findability Day 2014
Intranet focus search strategy a z - from Findability Day 2014Intranet focus search strategy a z - from Findability Day 2014
Intranet focus search strategy a z - from Findability Day 2014
 
Findability Day 2014 Neo4j how graph data boost your insights
Findability Day 2014 Neo4j how graph data boost your insightsFindability Day 2014 Neo4j how graph data boost your insights
Findability Day 2014 Neo4j how graph data boost your insights
 
Martin White it's not the technology it's the content
Martin White it's not the technology it's the contentMartin White it's not the technology it's the content
Martin White it's not the technology it's the content
 
Models and beer Findability Day 2014
Models and beer Findability Day 2014Models and beer Findability Day 2014
Models and beer Findability Day 2014
 
Designing the search experience the language of discovery - Findability Day 2014
Designing the search experience the language of discovery - Findability Day 2014Designing the search experience the language of discovery - Findability Day 2014
Designing the search experience the language of discovery - Findability Day 2014
 

Recently uploaded

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 

Recently uploaded (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 

Architecture of Search Systems and Measuring the Search Effectiveness

  • 2. SEARCH SYSTEM ARCHITECTURE & MEASURING SEARCH EFFECTIVENESS Introduction to text mining – Warsaw University of Technology
  • 3. Plan Findwise – who we are, what we do. General architecture of search engines Data sources Content processing Search index Query and result processing Security in search engines Applications based on search Leading search technologies The concept of Findability Differences in online and enterprise search Measuring of search effectiveness Questions and answers
  • 4. Findwise – Search Driven Solutions • Founded in 2005 • Offices in Sweden, Denmark, Norway and Poland • 75+ employees Our objective is to be a leading provider of Findability solutions utilising the full potential of search technology to create customer business value. • Paweł Wróblewski – search enthusiast
  • 5. General architecture of search engines
  • 6. Important terms Latency Feeding Indexing Searching
  • 7. Data sources Everything that has an information is a good source! We need a connector to feed the data into a search system: Take the content Take the metadata Take the security information Different strategies to feed the data: Push – external applications invokes search system connector’s API to feed the content (e.g. transactional systems) Pull – connector periodically scans the source and takes the data (e.g. web crawler, file system) Hybrid – external systems dumps the data which are pulled by a connector
  • 8. Content Processing – the idea Format Language Spell Lemmas Synonyms Conversion Detection Checking (tenses, forms) Document Geography Taxonomy Custom Companies Vectorizer Entities Classification PLUG-IN People Scopifier  index PARIS (Reuters) - Venus Williams raced into the second round of the $11.25 million French Open Monday, brushing aside Bianka Lamade, 6-3, 6-3, in 65 minutes. The Wimbledon and U.S. Open champion, seeded second, breezed past the German on a blustery center court to become the first seed to advance at Roland Garros. "I love being here, I love the French Open and more than anything I'd love to do well here," the American said. Input: byte stream Output: structured document ready to be indexed
  • 9. Content Processing – the implementation Hydra is used in order to refine content before it hits the index. Every document fetched from a source runs through a targeted pipeline, which includes a number of stages. A stage can be considered as an “app” within Appstore or the Android market. Findwise have created a huge amount of such stages, where each stage has a small purpose to enhance the content of the item. It is possible to create additional stages to serve a specific customer functionality.
  • 10. Hydra - example Select stages to use in the pipeline, the left column corresponds to the “market”, and the right is the stages used.
  • 11. Hydra - example Modify the format of the date to only include year.
  • 12. Hydra - example The new year meta-data can be used as a facet
  • 13. Hydra - example Map every author field to a metadata field called author. Pipeline A Pipeline B
  • 14. Hydra - example In the search result…
  • 15. Search index – the problem Input: structured document (content + metadata) Output: binary represenation of inverted index optimised for speed and acuracy Search index has a flat structure – no internal relations Usually changes to the index structure require index rebuild (re- indexing)
  • 16. Search index – the problem Inverted index Index split Theory in previous lectures M How to achieve … Petabytes of indexed data Indexing / Search Node 00 Indexing / Search Node 10 Indexing / Search Node N0 Thousands of queries M Index mirror per second ... …… Thousands of index Indexing / Search Node 01 Indexing / Search Node 11 updates per second … M FAST Enterprise … Search Platform – Indexing / Search Node 0M Indexing / Search Node 1M Indexing / Search Node NM search cluster example Search Cluster
  • 17. Search index – the implementation In order to perform effective updates (index rebuilds) several index partitions are produced Index Index Index Index Small partition rebuilds quickly unlike the big one Rebuild of larger partition involves merging index from smaller one(s) Rebuilds can be triggered by: number or rebuild operations, number of documents, percent of total volume
  • 18. Query processing Query: Do you have a Do you have an Spell- Anti- Tokenizer Phrasing Normalization LCD monitor checking phrasing under $900? Under $900? LCD monitors Flat TV YES!  price < 900 TFT monitors Plasma TV X = LCD monitor Lemmas NLQ Thesaurus PLUG-IN BUY( X ) Synonyms Use “Product” collection Rank profile = “Profit margin” Modified query Geography Adaptive Evaluation 18
  • 19. Result processing The following issues might apply to results processing: Ranking generation Factors that can be considered: number of hits, proximity of hits, freshness (date), web measures (e.g. page rank), business and context factors (boosting or blocking) Search federation Integration of results from multiple search engines: round robin, normalized ranks, searchlets (multiple results lists presented in different way). Security trimming Filtering out the results that do not match user’s credentials Last second check
  • 20. Security in search solution Search Application Security Content-level Security Secure Server Environment 20
  • 21. Search Based Applications Search Driven Solutions = Customisation of search system components
  • 22. Catalogue of Search Based Applications Intelligence Database Commerce Corporate Search Media Systems System Offloading Systems • Intranets/portals • Market • Data warehouse • Search • Public news • Information intelligence • Data merchandising syndication gateways • Customer transformation • Customer • Mulitmedia • Expertise intelligence • Data caches analytics search location • Surveillance • Campaign • Proprietary • ECM • IP protection management research and repositories • Fraud detection • Call centre publications • Collaboration • eDiscovery enablement • Libraries • Knowledge • Quality • Customer self- Management Management service • Enterprise apps • Information risk management Search subsystem Data connectors – out of the box, custom made Repositories – Web, Databases, Files, Enterprise systems
  • 23. Leading search engine technologies • HP / Autonomy IDOL • Microsoft (SharePoint and FAST Search products) • Google Search Appliance (GSA ) • IBM Content Analytics/OmniFind • Oracle Secure Enterprise Search/Endeca • Apache Lucene/Solr (Open source) • Exalead CloudView • and more…
  • 24. Comparison of different technology vendors  What is the goal of Enterprise Findability (EF)? Core search  How should EF improve business? technology  What user groups are targeted? Usability Vendor capabilitie  What does the users’ want and need? s  What information is available and where is it stored?  How should EF be rolled out and governed? Total cost  What costs are involved? Connectivity of ownership and security  Are there any IT strategy considerations?  Vendor mapping provides an answer to which EF platform matches the overall requirements best on the short and long term
  • 25. Findability – what is it? Negligible Business value gained from search technology High Business (needs & goals) Users (needs & capabilities) SEARCH Search Technology <simple> Information (quality & structure) Organisation (ownership & governance) Basic Use of search technology/platform Advanced – a holistic approach to leverage business value with search technology
  • 26. Online vs. Enterprise Search According to Stephen E. Arnold, „The New Landscape of Enterprise Search”, Pandia, July 2011
  • 27. Online vs. Enterprise Search According to Stephen E. Arnold, „The New Landscape of Enterprise Search”, Pandia, July 2011
  • 28. Measuring the search effectiveness Enterprise case Relevance of search results is highly subjective Search is highly bound to business otherwise not important to consider Increase income or reduce costs Take into consideration all the dimensions of Findability: Business: Needs & Goals Users: Needs & Capabilities Information: Quality & Structure Organization: Ownership & Governance Search Technology: correctness of implementation Tools: reviews, workshops, presentations, strategies drafting, audits etc.
  • 29. Measuring the search effectiveness Online case Relevance of search results is highly subjective Search is highly bound to business otherwise not important to consider Increase conversion rate Verification od search functions and their impact on conversion rate Make isolated tests per each identified feature Create a score based on a weighted average
  • 30. the results reported for each single test is composed of the two following elements: Overall benchmark Cumulated results for test groups Measuring the search effectiveness udit – the Final Report Overall benchmark IPMS Test categories designed for the purpose of audit are generally applicable to any kind of a search actively find and filter items in service or solution. Nevertheless some of them are less while some are more important in specific a map. g by 3 It is useful feature that aids in finding items closest to application like online Yellow Pages catalogue. That is why a weight is assigned to each test that ce ased Online case 3 selected position. represents an importance and influence on the whole YP solution. The defined weights are described Useful feature enabling mining the neighborhood of stions selected item. in the following table. h starting Example as first impression and encouraging users 4 As important to interact with the service. Test Name Weight Remarks esult page 3 It is important not to miss any category to offer opportunity [1-5] another kind of search, content or advertisements. I.a Keyword match 5 This is basic feature of any full-text search system and it h 5 Extremely important factor in online search solutions. mance mostly influences the overall precision of search. I.b Wildcard 2 Users of YP solutions rarely uses such features. expansion mark score is presented in the following chart. I.c Accuracy of result 4 The importance of properly assigned categories to categories registered entries is high since it influences usability and Overall weighted scores relevance of categories. 6 I.d Query operators 1 Users of YP solutions uses such features hardly ever. 5 I.e Exact phrases 3 It might be important to catch exact phrase in a search preventing any background processing. 4 II.a iFind Lemmatization 5 This is a must-be for any kind of search, especially for 3 Polish language. PKT 2 II.b Synonym 3 It is useful to improve recall of search thus preventing PF 1 expansion zero results. II.c Spellchecking 4 Very useful feature as people tend to make simple 0 spelling mistakes while typing at keyboard. II.d Anti-phrasing 3 It is useful not to search for irrelevant and meaningless terms. alculation the overall benchmark can be expressed as cumulative weighted score II.e Name and phrase 3 It is useful to capture some multi-word expressions or es 1-10. The ideal hypothetic search system should achieve score 10. recognition names as a whole – in single meaning. re as follows for the conducted tests: II.f Natural Language 2 Vey advanced yet hard to implement feature. Processing 53 III.a Navigation 4 Very useful feature enabling easy to use and intuitive