SlideShare ist ein Scribd-Unternehmen logo
1 von 58
Search Engine Face-Off
        Keyword Search versus Metadata Search
Don Miller, VP of Business Development   Val Orekhov, Chief Architect
(408) 828-3400                           (240) 450-2166 x103
donm@conceptsearching.com                val@portalsolutions.net
Agenda
 Introductions
 Concept Searching:
    What is Metadata
    Keyword vs.. Metadata Search
    Keyword vs.. Metadata Costs
    Google vs.. SharePoint vs.. FAST
 Portal Solutions:
    Enterprise Search – Google vs. FAST in SharePoint 2010
    Indexing Options
    Approach to Security Trimming
    Ranking Algorithms & Sorting Options
    Metadata & Search Refinements
 Concept Searching - How Do I apply metadata:
    Microsoft‟s approach to applying metadata
    How to automate the Microsoft approach with conceptClassifier for
       SharePoint 2010
 Demo
Concept Searching, Inc.
Company founded in 2002
     Product launched in 2003
     Focus on management of structured and unstructured information

 Technology
      Automatic concept identification, content tagging, auto-
       classification, taxonomy management
      Only statistical vendor that can extract conceptual metadata

 2009 and 2010 „100 Companies that Matter in KM‟ (KM World
  Magazine)

 KMWorld „Trend Setting Product‟ of 2009
  and 2010

 Locations: US, UK, & South Africa

Client base: Fortune 500/1000 organizations

 Managed Partner under Microsoft global ISV Program - “go to partner”
  for Microsoft for auto-classification and taxonomy management

 Microsoft Enterprise Search ISV , FAST Partner

 Product Suite: conceptSearch, conceptTaxonomyManager,
  conceptClassifier, conceptClassifier for SharePoint,
  contentTypeUpdater for SharePoint
What is metadata

•   Metadata is a means to apply structure to unstructured or structured content or
    information. Metadata describes what the document is about.
•   Metadata makes it easier to find information.
•   There are usually multiple metadata terms per item or document.
•   Metadata can also be used for rights management, governance, retention code policies,
    sensitive information removal and of course improved findability.
What Is Keyword vs. Metadata Costing You?
Problem       Pre Migration                Search                  Records Management             Data Privacy Protection

           •60% of stored            •“It‟s not about better       •67% of data loss in          •Average cost per
            documents are             search”                       Records Management is         exposed record is $197
            obsolete                 •Less than 50% of content      due to end user error         and ranges from $90-
           •50% of documents are      is correctly indexed, meta   •It costs and organization     $305 per record
            duplicates                tagged or efficiently         $180 per document to         •70% of breaches are due
           •Requires resources to     searchable                    recreate it when it is not    to a mistake or malicious
            identify what            •85% of relevant               tagged correctly and          intent by an
            should/not be migrated    documents are never           cannot be found               organization‟s own staff
                                      retrieved in search

           •Eliminate duplicate      •Eliminate manual tagging     •Eliminate inconsistent       •Identify any type of
Solution                                                            end user tagging              organizationally defined
            documents                 & replace with automatic
           •Identify privacy data     identification of multi-     •Automatically declare         privacy data
            exposures                 word concepts                 documents of record          •Combines pattern
           •Identify and declare     •Provide guided                based on vocabulary and       matching with associated
            records that were not     navigation via the            retention codes               vocabulary
            previously identified     taxonomy structure (i.e.     •Automatically change the     •Automatic Content Type
           •Identify high value       concepts)                     Content Type and route        updating enabling
            content                  •Go beyond dynamic             to the Records                workflows and rights
           •Migrating required        clustering with               Management repository         management
            content to a structure    conceptual clustering
                                      based on the taxonomies

Benefit    •Reduces migration        •Taxonomy navigation           •Savings of $4.00 - $7.04 •Average cost runs from
            costs                     is 36% - 48% faster            per record by eliminating $225K to $35M
           •Ensures                  •Savings 2.5 hours              manual tagging
            compliance and            per user per day              •Ensures compliance and
            protection of                                            reduces potential
            content assets                                           litigation exposures
USAF Human Performance Clearinghouse
                                 GOAL : Leverage Existing USAF, AFDW, and AFMS License Agreements to
                                            Enable IM, RM, & Privacy & Security Compliance
Requirements
• DoDD 8320 (Data Sharing in a Net-Centric DoD)
• DoDD 5015 (Records Management)                                                                             Data Privacy
• USAF Privacy Act Program & HIPAA
• Freedom of Information Act (FOIA)
                                            Migration
                                          Migration




                                                          Records
                                                          Management




                                                                                                              Search

                                                                     eDiscovery &
                                                                         FOIA



                                                                                                 Tel: 703.246.9360 | Fax: 240.465.1182

Distribution Statement A: Approved for public release; distribution is unlimited.
Distribution Statement A: Approved for public release; distribution is
311 ABG/PA No. 09-488, 16 Oct 2009                                                  unlimited.
311 ABG/PA No. 09-488, 16 Oct 2009
What Type of Search or Information Architecture Do You Need?


Keyword Search = ~66%+                       Metadata Search = 100% of
  of results (Recall)                          results (Recall)
• Simple                                     • Guided Navigation
• No administration                          • Records Management
• Good enough                                • Sensitive Information
                                               Removal
                                             • Collaboration
Recall (information retrieval), a
statistical measure (contrasted with         • Improved Precision and
precision), the fraction of (all) relevant
material that are returned by a search
                                               Recall
query                                        • Evolution of Keyword
Precision (information retrieval), the
percentage of documents returned that          Search
are relevant
Metadata Search vs.. Keyword and Guided Navigation “Proposal”




                    “Software License”          “SLA” “Licensee”    “Addendum”

                    “License Agreement”           “License”
                                                                   100% of Results
  Results            “Documents of Record”                         Metadata Search
also known
as “Recall”          “Proposals” “Contract”
                                                       66% Key + Synonym Search

                                                                   “Proposal”
                                       Entity Extraction
                                                              33% Keyword Search
                                       20-33% of results

     Entity extraction without complex rules
    is ineffective. It is just keyword match,       Cost (Time, Money and Complex)
    which is what keyword search is, which
                 is 33% effective.
Similar Features Against Total Number of Documents Returned

                   Google      SharePoint       FAST
Index              500 M +     100 M            500 M +
Key Word – 33%     Yes         Yes – Good as    Yes
of results                     Google or FAST
Synonyms - Up to Yes           Yes              Yes
50-66%+ of
results for topic
Ranking            Somewhat    Somewhat         Very Tunable
Algorithm + Best   Tunable     Tunable
Bets: Does not
improve number
of results only
how presented
What Is Missing To Get to 100% of Relevant Results in Every Search?


Metadata          Google              SharePoint            FAST


Auto              No –                No –                  Entity extraction,
Classification    Missing 33-50%      Missing 33-50%        which is the same
                  of results on any   of results on any     as keyword
                  particular topic    particular topic      search 33%
                                                            results. Provides
                                                            some refinement
                                                            capabilities.
Taxonomy          No                  Yes, but not used     Same as
Management                            for auto              SharePoint
                                      classification this
                                      release.
Miscellaneous Items to Review

                  Google    SharePoint           FAST
SharePoint        Hard      Yes – Easy to use    Medium – Initial
Refiners and                for standard         release, does not
Navigators with             search. No           leverage Term
counts.                     counts on results.   Store yet. XML –
                                                 Powershell based
RECALL
Customization     Limited   Limited              Extendable
Summary

•   Google – Best for no administration, install and walk away. However, keyword
    approach usually missing 33%-50% of results on any given topic because of missing
    metadata. Not easy to integrate refiners or navigators into SharePoint UI.

•   SharePoint Search – Cost effective, comes free with SharePoint. Also very easy to
    install. Search Algorithm is as good as FAST or Google. Limited extensibility. Easy
    integration for refiners and navigators (no counts). However, keyword approach still
    missing 50% of results on any topic.

•   FAST – Extremely customizable, but requires training or professional services to
    customize. Most likely Microsoft long term platform for search. Very scalable and
    can provide refiner counts. However, keyword approach still missing 33-50% of
    results from any given search because of metadata inconsistency.

•   However, they are all missing a true metadata strategy which is the only way to
    ensure 100% of results (Recall).
Take it away Val
Google Search Appliance 6.8
                  vs..
  FAST Search Server for SharePoint 2010
For metadata-driven search scenarios in a SharePoint environment
Val Orekhov, Chief Architect
Portal Solutions
Email: val@portalsolutions.net
Phone: (240) 450-2166 x 103
www.portalsolutions.net
Agenda
• Enterprise Search Technologies
   •   Google Search Appliance 6.8 and FAST for SharePoint
   •   Content Indexing Options
   •   Approach to Security Trimming
   •   Ranking Algorithms and Searching Options
   •   Index Schema Management, Metadata & Search Refinements
• Conclusions
• Q&A
Enterprise Search Technologies
• Heterogeneous content sources:
   • HTML, Documents and LOBs records
   • Located on Portals, File Systems and in Databases
• Required Security Trimming:
   • Integrate with Identity Providers (AD, LDAP, SQL)
   • Implement authorization decision logic
• Able to take advantage of metadata stored with
  documents and LOBs
Introducing the Contenders
Google Search Appliance (GSA)
    •   Search Appliance, Google.com in a box
    •   Hardware & Software Solution
    •   Pre-packaged functionality ready to work
    •   “Black box” approach to search results


FAST Search Server for SharePoint 2010
  • Spin off of the earlier FAST ESP
  • Software-only solution
  • Allows to customize many aspects of the engine functionality
    down to relevancy tuning algorithms
  • Platform rather than a product
Comparing FS4SP and GSA
•   Indexing Options
•   Approach to Security Trimming
•   Ranking Algorithms & Sorting Options
•   Metadata & Search Refinements
Content Crawl Options
                      GSA                       FAST                      SharePoint
 Content Pull         HTTP Crawler              SharePoint Crawler        SharePoint
                                                Enterprise Crawler        Crawler
 Content Push         XML Feed API              Feed API                  -

 Indexing LOBs (Pull) Onboard Database          Databases & Web Services Databases &
                      Connector                 via SharePoint BCS       Web Services via
                                                                         SharePoint BCS
 Connectors           SharePoint,               OTB: File System,         OTB: File
                      Documentum,               Exchange Public Folders   System,
                      LiveLink, FileNet, File                             Exchange Public
                      System, LDAP              Custom: Documentum,       Folders
                                                Lotus Notes
 External Metadata    Push through XML          Custom Stages in the      -
                      Feed API                  processing pipeline
 Cloud Connectivity   Google Apps & Sites;      Custom connectors         -
                      Tweeter;
Comparing FS4SP and GSA
•   Indexing Options
•   Approach to Security Trimming
•   Ranking Algorithms & Sorting Options
•   Metadata & Search Refinements
Security Trimming
• Answers the “Who Am I” and “What Results Can I See”
  questions
• Required with most Enterprise Search scenarios
• Approaches include Late & Early Authorization/Biding
 Authorization   Access Rights          Pros                     Cons
 Approach        (ACLs)

 Late            Checked at run      - Up-to-date presentation   - Slow on larger
                 time against system                               sections of result
                 of record                                         sets

 Early           Information stored     - Fast                   - Duplicates info
                 in the index at item   - Facilitates metadata   - Potential for
                 level                    clustering               outdated results
Security Trimming Options Support
                        GSA                                 FAST                                 SharePoint
                                                                                                 2010
Late                    - “Default” option in               - ?                                  - Custom

Authorization             many scenarios
                        - Via Kerberos, SAML
                          Bridge or Connector
Early                   - Rel. 6.0 –High level   - Item-level ACLs for                           Native support
Authorization             Policy ACLs configured   Windows and                                   for Item-level
                          by admins or through a   SharePoint security                           ACLs with
                          remote API *             principals supported                          Windows and
                        - Rel. 6.8 – Item-level    natively                                      SharePoint
                          ACLs) **               - Allows to setup multiple                      security
                                                   user property stores and                      principals
                                                   map user principals


* Best applied to enterprises with a manageable number of high level policies, or able to invest into custom ACL sync tools
** SharePoint Connector Rel. 2.6.4 sends SharePoint Site Groups with the feed but the Groups are not expanded property by GSA
Comparing FS4SP and GSA
•   Indexing Options
•   Approach to Security Trimming
•   Ranking Algorithms & Sorting Options
•   Metadata & Search Refinements
Search Engine Internals
Result Set Ranking
• Fidelity of keyword matches (All Engines)
     • Proximity
     • Frequency
     • Completeness
• Hyper Text Matching (GSA only)
     • Analyzes keyword location on a rendered page and related pages
• Hub and Spoke Algorithm (All engines)
     • Driven by linkages between web pages
     • Pages receiving or providing most links have higher rankings
     • GSA – PageRank; FAST – Document authority;
• Static rank biasing, document importance
     • Document, Site, Metadata -based promotion / demotion (All engines)
     • User-tagged documents receive higher importance (FAST, SharePoint search)
• Adaptive ranking
     • User clicks in search results (FAST, SharePoint search)
 • Custom Ranking
     • Build custom ranking models w/ FAST
Result Set Sorting
• GSA
   • Date/Time only (Document Modification Date, or a date extracted
     from Title, Metadata or Body of a document)
• FAST
   • Any property marked as Sortable
   • Supported data types: String, Number, Date/Time
Comparing FS4SP and GSA
•   Indexing Options
•   Approach to Security Trimming
•   Ranking Algorithms & Sorting Options
•   Index Schema Management, Metadata & Search
    Refinements
Index Schema Management
• GSA (All-inclusive)
    • All discovered metadata (Crawled Properties) are stored in the index by default
    • Metadata from MS Office documents stored in the index results. (GSA Feature
      Request ID# 1371024)
    • All string-type metadata is associated with FTI by default, matches on metadata
      controlled through query time (allintext:, allintitle: keyword filters)
    • Metadata in results limited to 1,500 chars per field (Rel. 6.8; prev. releases – 320
      chars)
• FAST (Opt-in)
    • Crawled properties have to be associated with Managed Properties (MPs) to be
      stored in the index
    • MPs represent a level of abstraction from Content Sources
    • MPs can be configured to be used as:
        •   Stored in the index (Queryable)
        •   Associated with FTI (Searchable)
        •   Sortable
        •   Refiner-enabled
Search Refinement with Metadata
 Approach        Completeness           Pros                           Cons
 Run-time        Smaller sample of      - Smaller index size           - Degraded
 clustering /    much larger set;                                        performance w/
 Shallow         Top 50-100 query                                        larger samples
 refiners        results.                                              - No cluster counts
 Index-based     Entire result set      - Fast                         - Increases index
 clustering /    stored in the index.   - Allows for precise cluster     size
 Deep refiners                            counts
Search Refinement with Metadata
                    GSA                      FAST                      SharePoint
                                                                       2010
 Run-time           - The only option prior to - OTB                   - OTB
 clustering /         Rel. 6.8 (Custom)
 Shallow refiners
 Index-based        - “Preview” status in Rel. - OTB for MPs marked as - Not available
 clustering /         6.8 (OTB)                  Refinable
 Deep refiners                                 - Inverted Index and
                                                 Metadata Property Store
                                                 combined into a high
                                                 performance OLAP cube
Conclusions*


               • SharePoint intranet as a hub +                    • Heterogeneous content sources




                                                          GSA
      FAST

                 document libraries, LOBs;                           dominated by web pages
               • Search results served from the                    • Search UI served by GSA
                 SharePoint portal                                 • Predominantly Keyword –driven
               • Active Directory -tied systems w/                   search experience,
                 content security policies applied                 • Custom run-time search refiners for
                 broadly                                             protected content; OTB “Dynamic
               • Fine level of control over index                    Navigation” for LOB / public data
                 schema and document processing                    • Result biasing via URL patterns,
               • Custom search results ranking /                     metadata values
                 relevancy models                                  • Medium complexity metadata-based
               • High complexity metadata-based                      search scenarios
                 search scenarios
               • Full & Mini Search-driven
                 applications



* Usage scenarios best aligned with OTB functionality, minimum possible customizations.
Questions
Back to Don
In Summary: Enterprise Search Comparison for SharePoint vs. Google vs. FAST



Why Enterprise Search needs Metadata and Taxonomy Management
    –   Recall – Ensures you bring back 100% of Results
    –   Enhances Precision – Fastest way to filter to the right results so that you are looking at the
        documents that matter the most
    –   Boosts the relevancy of documents
    –   Drives Records Management, Sensitive Information Removal, Retention Code Policies


MUST HAVES:
    –   Heterogeneous content sources:
         • HTML, Documents and LOBs records
         • Located on Portals, File Systems and in Databases

    –   Required Security Trimming:
         • Integrate with Identity Providers (AD, LDAP, SQL)
         • Implement authorization decision logic

    –   Able to take advantage of metadata stored in documents and LOBs
How do I apply metadata to content?
Microsoft‟s approach to solving the metadata
problem for Records Management, Governance
  Policies, Sensitive Information Removal and
                    Findability:

      Content Types, The Term Store
        and Enterprise Managed
           Metadata Services
What is a content type
•   A Content Types is a means to apply structure to unstructured or structured content with in
    SharePoint. Content Types inherit their parent content types.
•   This is usually a combination of a term or terms from a single or multiple term sets.
•   Terms are metadata and metadata is information about information.
•   Terms can also include governance and retention code policies and also can be for the
    sole purpose of improved findability
•   However, it is best to align Content Types with business goals and business use cases.
Introducing EMM, The Term Store and Term Store Management Definitions




                            SharePoint 2010
   conceptClassifier for
                           Enterprise Managed
     SharePoint 2010
                            Metadata Service
                                                   SharePoint 2010 Farm

        Term Store
       Management          Subscription Service

     Auto Classification    Content Type Hub

       Content Type            Term Store           Site Collection
        Updating



                                                         Records Library
The Managed Metadata Service
                                      Managed Metadata Service
                                          Manages Enterprise Content Types via the
                                          Content Type Hub
                                          Manages Term Store
                                          Term Sets (taxonomies) and terms can be
                                          shared across multiple SharePoint site
Enterprise Managed Metadata Service       collections
                                          Multiple manage metadata services can be
                                          created
                                          Enables search filtering
    30,000 Terms per Term Set             Two types of terms:
          (1 Taxonomy)                         Managed terms – pre-defined by an
                                               enterprise administrator and may be
         1,000 Term Sets                       hierarchical. Surfaced in the
                                               "managed metadata" column type
Tested to 1,000,000 Preferred Terms            Managed keywords – non-hierarchical
                                               words or phrases that have been
                                               added to SharePoint 2010 items by
                                               users (folksonomy)
conceptClassifier for SharePoint is the only native Term Store Management tool for
                                        2010




              Term Set



                    Parent Term                                      Build term sets/taxonomies
                            Child Term                               here in SharePoint 2010
                                                                     EMM. Plan for 30,000
                             Grand Child Term                        values

                         A content type can contain one or many taxonomies based on specific
                         business user requirement. The values can shown as columns or can
                         be hidden from users for administrative or governance purposes only.
Traditional manual approach is subjective, cumbersome and overwhelming




End user must select
values from multiple
term sets. Up to 30,000
values per term set and
1,000 term sets per
term store. Manual
approach is impractical.
conceptClassifier for
             SharePoint 2010
An automated solution for applying metadata and
  providing term store management to enhance
    SharePoint 2010 capabilities for Records
    Management, Governance Policies, Rights
   Management, Sensitive Information Removal
                 and Findability.
A Manual Metadata Approach Will Fail 95%+ Of The Time

Issue                        Organizational Impact
Inconsistent                 Less than 50% of content is correctly indexed, meta-tagged or
                             efficiently searchable rendering it unusable to the organization (IDC)
Subjective                   Highly trained Information Specialists will agree on meta tags
                             between 33% - 50% of the time. (C. Cleverdon)
Cumbersome - Expensive       Average cost of manually tagging one item runs from $4 - $7 per
                             document and does not factor in the accuracy of the meta tags nor
                             the repercussions from mis-tagged content (Hoovers)
Malicious Compliance         End users select first value in list (Perspectives on Metadata, Sarah Courier)
No perceived value for end   What‟s in it for me? End user creates document, does not see value
user                         for organization nor risks associated with litigation and non
                             conformance to policies.
What have you seen           Metadata will continue to be a problem due to inconsistent human
                             behavior
    The answer to consistent metadata is an automated approach that can extract the
   meaning from content eliminating manual metadata generation yet still providing the
  ability to manage knowledge assets in alignment with the unique corporate knowledge
                                     infrastructure.
conceptClassifier for SharePoint 2010 provides an automated metadata approach
for an immediate ROI and to drives business value
   Create enterprise automated metadata
    framework/model
       Average return on investment minimum of
        38% and runs as high as 600% (IDC)                           1. Model and
                                                                        Validate
   Apply consistent meaningful metadata to
    enterprise content
      Incorrect meta tags costs an organization     6. Life Cycle                  2. Automate
                                                     Management                       Tagging
         $2,500 per user per year – in addition
         potential costs for non-compliance (IDC)

   Guide users to relevant content with taxonomy
    navigation
      Savings of $8,965 per year per user based
                                                      5. Records
         on an $80K salary (Chen & Dumais)           Management                     3. Findability
      100% “Recall” of content, 35% Faster             and PII
         access to content “Precision”
                                                                     4. Business
   Use automatic conceptual metadata generation                     Processes
    to improve Records Management
       Eliminate inconsistent end user tagging at
         $4-$7 per record (Hoovers)
       Improve compliance processes, eliminate
         potential privacy exposures
conceptClassifier provides a native integration into Term Store



Native integration into Term   No Service Pack Updates, no custom code.
Store                          conceptClassifier is a native integration.
No custom property types       Every item is synchronized with term store
                               and is a part of managed metadata service.
                               All search features work natively as they
                               should. No custom search property values
                               which require custom code updates and
                               additional custom search controls.
                               conceptClassifier is a native integration.
Why do we work with native     Because it is the natural place that you
term store natively            should store metadata if you are driving
                               economies of scale by leveraging Microsoft
                               stack. That is Microsoft‟s road map for
                               metadata management.
Easy Upgrade                   If you want to go back to a pure manual
                               application, there is no code rewrite.
                               conceptClassifier is a native integration.
                               You just unplug and you are back to native.
Automated Multi Word Term Suggestions for Term Store

 Concept Searching‟s unique statistical concept identification underpins all technologies.

 Multi word suggestion is explicitly more valuable than single term suggestion algorithms.


                               Concept Searching
                               provides Automatic
                               Concept Term Extraction

                            Triple        Heart         Bypass

                           Baseball       Organ        Highway
                            Three         Center        Avoid




                          conceptClassifier will generate conceptual metadata by
                           extracting multi-word terms that identifies „triple heart bypass‟
                           as a concept as opposed to single keywords .

                         Metadata can be used by any search engine index or any
                          application/process that uses metadata.
conceptClassifier for SharePoint 2010 drives immediate value for end users for
Search, Records Management and Sensitive Information Removal


conceptClassifier for SharePoint 2010
 Automatically applies Metadata
 Automatically Applies Content Types
 Auto Applies Retention Code Policies
 Automatically applies Windows Rights
   Management Policies
 Automatic Term Boosting for FAST
 Pulls hierarchy directly from Term
   Store, therefore updates are
   immediate and accurate for guided
   taxonomy navigation in FAST
Enterprise Taxonomy Management and Auto-classification

 Multi User Distributed Branch and Term
  Support for Enterprise
 Native Term Store Integration for
  SharePoint 2010
 Accelerate building out taxonomies by
  75% with automatic Term/Clue
  Suggestion
 Enables the ability for information
  architects to build model and validate
 Automatic Term Boosting for
  FAST/Search Platforms
 Pragmatic Ontology Features for
  subject matter experts (You don‟t need
  to be a librarian)
 Broad to Narrow
 Preferred Term
 Non preferred terms
 Poly hierarchies – Not supported in
  Term Store
 Relations – Not supported in Term
  Store
conceptClassifier for FAST Search

 Improves search outcomes by placing
  conceptual metadata in the FAST Search           Provides accurate metadata filters such as numeric
  index to increase relevancy of search results     range searching and wildcard alphanumeric
                                                    matching

Enables import of FAST Entities into the          Removes documents from search results that are
 conceptClassifier taxonomy manager to fine-        confidential/sensitive through automatic Content
 tune them with metadata generated from your        Type updating and routing to secure server
 own content and nomenclature
                                                   Automatically tags content with both vocabulary
 Runs natively as a FAST Pipeline Stage            and retention codes and respects SharePoint
  eliminating integration and customization         security that could prevent access to the document
  issues                                            once it has been declared a record

Eliminates vocabulary normalization issues
 across global boundaries through controlled
 vocabularies

 Improves faceted search results as facets are
  based on concepts aligned with the taxonomy

 Provides taxonomy browse capabilities based
  on the nodes within the corporate taxonomy(s)
Product Screen Shots
Traditional manual approach is subjective, cumbersome and ineffective




End user must select
values from multiple
term sets. Up to 30,000
values per term set and
1,000 term sets per term
store. Manual approach
is impractical.
An automated approach ensures accurate Records Management, Sensitive
Information Removal and improved Search/Findability
c




Metadata is automatically applied to content by ConceptClassifier via
TaxonomyManager. contentTypeUpdater can take it a step further and can modify
content type to redirect document/object to a different content type or migrate it to
another site collection or document library. In this example the documents are being
changed from document content type to PII or Records Center Content Type.
Term Store Management is provided by Taxonomy Manager and
conceptClassifier




TaxonomyManager is an
intuitive and elegant to   Deep capabilities to build out rules classification
tool to manage how and     approaches including: standard term, phonetics,
when term sets are         metadata, class ID, language, case sensitive,
applied within             regular expression and boosting.
SharePoint 2010 and
what new terms to add to
the term store.
An automated approach ensures accurate Records Management, Sensitive
Information Removal and improved Search/Findability




The documents with 10 in front of them have had their content types updated.
In this example the documents are being changed from document content type
to PII or Records Center Content Type. They could have also been moved to
a different folder if that was the desired outcome.
conceptClassifier for FAST and SharePoint 2010 Search
 conceptClassifier for 2010 Product Suite provides intuitive guided navigation for FAST




Multi value select with in a term set is the single fastest approach you can provide for end
users to get access to the correct content. It is just like picking values when you are on
Best Buy or Amazon but it is with your personalized corporate term set vocabulary.
Demo – How to automate the
process of applying metadata in a
  SharePoint 2010 native term
  store environment to improve
     Findability and Records
          Management
QA
Thank You


Don Miller, VP of Business Development     Val Orekhov, Chief Architect
(408) 828-3400                             (240) 450-2166 x103
donm@conceptsearching.com                  val@portalsolutions.net

Weitere ähnliche Inhalte

Was ist angesagt?

Exploring Process Barriers to Release Public Sector Information in Local Gove...
Exploring Process Barriers to Release Public Sector Information in Local Gove...Exploring Process Barriers to Release Public Sector Information in Local Gove...
Exploring Process Barriers to Release Public Sector Information in Local Gove...Peter Conradie
 
Enough Talk – Solving GDPR Problems Through Metadata-Driven Compliance Webinar
Enough Talk – Solving GDPR Problems Through Metadata-Driven Compliance WebinarEnough Talk – Solving GDPR Problems Through Metadata-Driven Compliance Webinar
Enough Talk – Solving GDPR Problems Through Metadata-Driven Compliance WebinarConcept Searching, Inc
 
Using the LucidWorks REST API to Support User-Configuration Big Data Search E...
Using the LucidWorks REST API to Support User-Configuration Big Data Search E...Using the LucidWorks REST API to Support User-Configuration Big Data Search E...
Using the LucidWorks REST API to Support User-Configuration Big Data Search E...lucenerevolution
 
When SharePoint Isn't Enough - Adding Enterprise Class Search for Better Coll...
When SharePoint Isn't Enough - Adding Enterprise Class Search for Better Coll...When SharePoint Isn't Enough - Adding Enterprise Class Search for Better Coll...
When SharePoint Isn't Enough - Adding Enterprise Class Search for Better Coll...Helen Mitchell
 
[Webinar Slides] Developing a Successful Data Retention Policy
[Webinar Slides] Developing a Successful Data Retention Policy [Webinar Slides] Developing a Successful Data Retention Policy
[Webinar Slides] Developing a Successful Data Retention Policy AIIM International
 
ConceptClassifier for SharePoint Turbo Charging the Public Sector
ConceptClassifier for SharePoint Turbo Charging the Public SectorConceptClassifier for SharePoint Turbo Charging the Public Sector
ConceptClassifier for SharePoint Turbo Charging the Public Sectormartingarland
 
Findability Primer by Information Architected - the IA Primer Series
Findability Primer by Information Architected - the IA Primer SeriesFindability Primer by Information Architected - the IA Primer Series
Findability Primer by Information Architected - the IA Primer SeriesDan Keldsen
 
Data, databases and what you can do with them
Data, databases and what you can do with themData, databases and what you can do with them
Data, databases and what you can do with themBrowne Jacobson LLP
 
Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...
Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...
Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...martingarland
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrGrant Ingersoll
 
Hadoop World 2011: Completing the Big Data Picture Understanding Why and Not ...
Hadoop World 2011: Completing the Big Data Picture Understanding Why and Not ...Hadoop World 2011: Completing the Big Data Picture Understanding Why and Not ...
Hadoop World 2011: Completing the Big Data Picture Understanding Why and Not ...Cloudera, Inc.
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignMarianne Sweeny
 
Channeling insights to the right people
Channeling insights to the right peopleChanneling insights to the right people
Channeling insights to the right peopleSebastien Lefebvre
 
Building Efficient eDiscovery and Compliance with SharePoint and O365
Building Efficient eDiscovery and Compliance with SharePoint and O365Building Efficient eDiscovery and Compliance with SharePoint and O365
Building Efficient eDiscovery and Compliance with SharePoint and O365Mitul Rana
 
10 Things You'll Need to Succeed with Information Governance and SharePoint
10 Things You'll Need to Succeed with Information Governance and SharePoint10 Things You'll Need to Succeed with Information Governance and SharePoint
10 Things You'll Need to Succeed with Information Governance and SharePointRecordLion
 
Data management for proposal writing
Data management for proposal writingData management for proposal writing
Data management for proposal writingOlatunbosun Obileye
 
Fried connecting across silos seminar
Fried connecting across silos seminarFried connecting across silos seminar
Fried connecting across silos seminarJeff Fried
 

Was ist angesagt? (19)

Exploring Process Barriers to Release Public Sector Information in Local Gove...
Exploring Process Barriers to Release Public Sector Information in Local Gove...Exploring Process Barriers to Release Public Sector Information in Local Gove...
Exploring Process Barriers to Release Public Sector Information in Local Gove...
 
Ultra search
Ultra searchUltra search
Ultra search
 
Enough Talk – Solving GDPR Problems Through Metadata-Driven Compliance Webinar
Enough Talk – Solving GDPR Problems Through Metadata-Driven Compliance WebinarEnough Talk – Solving GDPR Problems Through Metadata-Driven Compliance Webinar
Enough Talk – Solving GDPR Problems Through Metadata-Driven Compliance Webinar
 
Semantic search
Semantic searchSemantic search
Semantic search
 
Using the LucidWorks REST API to Support User-Configuration Big Data Search E...
Using the LucidWorks REST API to Support User-Configuration Big Data Search E...Using the LucidWorks REST API to Support User-Configuration Big Data Search E...
Using the LucidWorks REST API to Support User-Configuration Big Data Search E...
 
When SharePoint Isn't Enough - Adding Enterprise Class Search for Better Coll...
When SharePoint Isn't Enough - Adding Enterprise Class Search for Better Coll...When SharePoint Isn't Enough - Adding Enterprise Class Search for Better Coll...
When SharePoint Isn't Enough - Adding Enterprise Class Search for Better Coll...
 
[Webinar Slides] Developing a Successful Data Retention Policy
[Webinar Slides] Developing a Successful Data Retention Policy [Webinar Slides] Developing a Successful Data Retention Policy
[Webinar Slides] Developing a Successful Data Retention Policy
 
ConceptClassifier for SharePoint Turbo Charging the Public Sector
ConceptClassifier for SharePoint Turbo Charging the Public SectorConceptClassifier for SharePoint Turbo Charging the Public Sector
ConceptClassifier for SharePoint Turbo Charging the Public Sector
 
Findability Primer by Information Architected - the IA Primer Series
Findability Primer by Information Architected - the IA Primer SeriesFindability Primer by Information Architected - the IA Primer Series
Findability Primer by Information Architected - the IA Primer Series
 
Data, databases and what you can do with them
Data, databases and what you can do with themData, databases and what you can do with them
Data, databases and what you can do with them
 
Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...
Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...
Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
 
Hadoop World 2011: Completing the Big Data Picture Understanding Why and Not ...
Hadoop World 2011: Completing the Big Data Picture Understanding Why and Not ...Hadoop World 2011: Completing the Big Data Picture Understanding Why and Not ...
Hadoop World 2011: Completing the Big Data Picture Understanding Why and Not ...
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By Design
 
Channeling insights to the right people
Channeling insights to the right peopleChanneling insights to the right people
Channeling insights to the right people
 
Building Efficient eDiscovery and Compliance with SharePoint and O365
Building Efficient eDiscovery and Compliance with SharePoint and O365Building Efficient eDiscovery and Compliance with SharePoint and O365
Building Efficient eDiscovery and Compliance with SharePoint and O365
 
10 Things You'll Need to Succeed with Information Governance and SharePoint
10 Things You'll Need to Succeed with Information Governance and SharePoint10 Things You'll Need to Succeed with Information Governance and SharePoint
10 Things You'll Need to Succeed with Information Governance and SharePoint
 
Data management for proposal writing
Data management for proposal writingData management for proposal writing
Data management for proposal writing
 
Fried connecting across silos seminar
Fried connecting across silos seminarFried connecting across silos seminar
Fried connecting across silos seminar
 

Ähnlich wie Search Engine Face-Off: Keyword vs Metadata Search Costs and Effectiveness

KMWorld Martin Briefing
KMWorld Martin BriefingKMWorld Martin Briefing
KMWorld Martin Briefingmartingarland
 
SharePoint Fest Chicago Presentation
SharePoint Fest Chicago PresentationSharePoint Fest Chicago Presentation
SharePoint Fest Chicago PresentationConcept Searching, Inc
 
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...Concept Searching, Inc
 
Driving Value in Content Services with Smart Capture
Driving Value in Content Services with Smart CaptureDriving Value in Content Services with Smart Capture
Driving Value in Content Services with Smart CaptureStephen Boals
 
SPLive Orlando - Beyond the Search Center - Application or Solution?
SPLive Orlando - Beyond the Search Center - Application or Solution?SPLive Orlando - Beyond the Search Center - Application or Solution?
SPLive Orlando - Beyond the Search Center - Application or Solution?Agnes Molnar
 
Data Breaches and Security Rights in SharePoint Webinar
Data Breaches and Security Rights in SharePoint WebinarData Breaches and Security Rights in SharePoint Webinar
Data Breaches and Security Rights in SharePoint WebinarConcept Searching, Inc
 
The Future State of Document Management, Taxonomies and Metadata in the Cloud
The Future State of Document Management, Taxonomies and Metadata in the CloudThe Future State of Document Management, Taxonomies and Metadata in the Cloud
The Future State of Document Management, Taxonomies and Metadata in the CloudBIWUG
 
Intelligent Metadata Enabled Migration with SharePoint
Intelligent Metadata Enabled Migration with SharePointIntelligent Metadata Enabled Migration with SharePoint
Intelligent Metadata Enabled Migration with SharePointConcept Searching, Inc
 
Taxonomy and tagging – manual tagging does not work!
Taxonomy and tagging – manual tagging does not work!Taxonomy and tagging – manual tagging does not work!
Taxonomy and tagging – manual tagging does not work!Concept Searching, Inc
 
Concept Searching Webinar Presentation
Concept Searching Webinar PresentationConcept Searching Webinar Presentation
Concept Searching Webinar Presentationjohnchallis
 
Concept Searching Webinar Presentation
Concept Searching Webinar PresentationConcept Searching Webinar Presentation
Concept Searching Webinar Presentationmartingarland
 
Why Metadata Matters in SharePoint Search and Information Governance Webinar
Why Metadata Matters in SharePoint Search and Information Governance WebinarWhy Metadata Matters in SharePoint Search and Information Governance Webinar
Why Metadata Matters in SharePoint Search and Information Governance WebinarConcept Searching, Inc
 
How to Get Enterprise Search Right Webinar
How to Get Enterprise Search Right WebinarHow to Get Enterprise Search Right Webinar
How to Get Enterprise Search Right WebinarConcept Searching, Inc
 
How To Drive Intelligent Migration Webinar
How To Drive Intelligent Migration WebinarHow To Drive Intelligent Migration Webinar
How To Drive Intelligent Migration WebinarConcept Searching, Inc
 
Overcoming Capability Gaps in Information Transparency, Knowledge Management,...
Overcoming Capability Gaps in Information Transparency, Knowledge Management,...Overcoming Capability Gaps in Information Transparency, Knowledge Management,...
Overcoming Capability Gaps in Information Transparency, Knowledge Management,...Concept Searching, Inc
 
DBAs - Is Your Company’s Personal and Sensitive Data Safe?
DBAs - Is Your Company’s Personal and Sensitive Data Safe?DBAs - Is Your Company’s Personal and Sensitive Data Safe?
DBAs - Is Your Company’s Personal and Sensitive Data Safe?DevOps.com
 
Enterprise Search, Simple, Complex and Powerful
Enterprise Search, Simple, Complex and PowerfulEnterprise Search, Simple, Complex and Powerful
Enterprise Search, Simple, Complex and PowerfulFindwise
 
How Search 2.0 Has Been Redefined by Enterprise 2.0
How Search 2.0 Has Been Redefined by Enterprise 2.0How Search 2.0 Has Been Redefined by Enterprise 2.0
How Search 2.0 Has Been Redefined by Enterprise 2.0Enterprise 2.0 Conference
 
European SharePoint Conference Automated Tagging and Metadata Management w...
European SharePoint Conference   Automated Tagging and Metadata  Management w...European SharePoint Conference   Automated Tagging and Metadata  Management w...
European SharePoint Conference Automated Tagging and Metadata Management w...B-S-S Business Software Solutions GmbH
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and InnovationCaserta
 

Ähnlich wie Search Engine Face-Off: Keyword vs Metadata Search Costs and Effectiveness (20)

KMWorld Martin Briefing
KMWorld Martin BriefingKMWorld Martin Briefing
KMWorld Martin Briefing
 
SharePoint Fest Chicago Presentation
SharePoint Fest Chicago PresentationSharePoint Fest Chicago Presentation
SharePoint Fest Chicago Presentation
 
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
 
Driving Value in Content Services with Smart Capture
Driving Value in Content Services with Smart CaptureDriving Value in Content Services with Smart Capture
Driving Value in Content Services with Smart Capture
 
SPLive Orlando - Beyond the Search Center - Application or Solution?
SPLive Orlando - Beyond the Search Center - Application or Solution?SPLive Orlando - Beyond the Search Center - Application or Solution?
SPLive Orlando - Beyond the Search Center - Application or Solution?
 
Data Breaches and Security Rights in SharePoint Webinar
Data Breaches and Security Rights in SharePoint WebinarData Breaches and Security Rights in SharePoint Webinar
Data Breaches and Security Rights in SharePoint Webinar
 
The Future State of Document Management, Taxonomies and Metadata in the Cloud
The Future State of Document Management, Taxonomies and Metadata in the CloudThe Future State of Document Management, Taxonomies and Metadata in the Cloud
The Future State of Document Management, Taxonomies and Metadata in the Cloud
 
Intelligent Metadata Enabled Migration with SharePoint
Intelligent Metadata Enabled Migration with SharePointIntelligent Metadata Enabled Migration with SharePoint
Intelligent Metadata Enabled Migration with SharePoint
 
Taxonomy and tagging – manual tagging does not work!
Taxonomy and tagging – manual tagging does not work!Taxonomy and tagging – manual tagging does not work!
Taxonomy and tagging – manual tagging does not work!
 
Concept Searching Webinar Presentation
Concept Searching Webinar PresentationConcept Searching Webinar Presentation
Concept Searching Webinar Presentation
 
Concept Searching Webinar Presentation
Concept Searching Webinar PresentationConcept Searching Webinar Presentation
Concept Searching Webinar Presentation
 
Why Metadata Matters in SharePoint Search and Information Governance Webinar
Why Metadata Matters in SharePoint Search and Information Governance WebinarWhy Metadata Matters in SharePoint Search and Information Governance Webinar
Why Metadata Matters in SharePoint Search and Information Governance Webinar
 
How to Get Enterprise Search Right Webinar
How to Get Enterprise Search Right WebinarHow to Get Enterprise Search Right Webinar
How to Get Enterprise Search Right Webinar
 
How To Drive Intelligent Migration Webinar
How To Drive Intelligent Migration WebinarHow To Drive Intelligent Migration Webinar
How To Drive Intelligent Migration Webinar
 
Overcoming Capability Gaps in Information Transparency, Knowledge Management,...
Overcoming Capability Gaps in Information Transparency, Knowledge Management,...Overcoming Capability Gaps in Information Transparency, Knowledge Management,...
Overcoming Capability Gaps in Information Transparency, Knowledge Management,...
 
DBAs - Is Your Company’s Personal and Sensitive Data Safe?
DBAs - Is Your Company’s Personal and Sensitive Data Safe?DBAs - Is Your Company’s Personal and Sensitive Data Safe?
DBAs - Is Your Company’s Personal and Sensitive Data Safe?
 
Enterprise Search, Simple, Complex and Powerful
Enterprise Search, Simple, Complex and PowerfulEnterprise Search, Simple, Complex and Powerful
Enterprise Search, Simple, Complex and Powerful
 
How Search 2.0 Has Been Redefined by Enterprise 2.0
How Search 2.0 Has Been Redefined by Enterprise 2.0How Search 2.0 Has Been Redefined by Enterprise 2.0
How Search 2.0 Has Been Redefined by Enterprise 2.0
 
European SharePoint Conference Automated Tagging and Metadata Management w...
European SharePoint Conference   Automated Tagging and Metadata  Management w...European SharePoint Conference   Automated Tagging and Metadata  Management w...
European SharePoint Conference Automated Tagging and Metadata Management w...
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 

Mehr von martingarland

Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...
Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...
Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...martingarland
 
Expert Webinar Series: SharePoint Governance - Managing Content Sprawl
Expert Webinar Series:  SharePoint Governance - Managing Content SprawlExpert Webinar Series:  SharePoint Governance - Managing Content Sprawl
Expert Webinar Series: SharePoint Governance - Managing Content Sprawlmartingarland
 
Webinar: Business Solutions and Metadata Design
Webinar:  Business Solutions and Metadata DesignWebinar:  Business Solutions and Metadata Design
Webinar: Business Solutions and Metadata Designmartingarland
 
Webinar: Records Management in SharePoint combining Governance with Content T...
Webinar: Records Management in SharePoint combining Governance with Content T...Webinar: Records Management in SharePoint combining Governance with Content T...
Webinar: Records Management in SharePoint combining Governance with Content T...martingarland
 
Webinar: Does the SharePoint 2010 Term Store Seem Like Alphabet Soup? Find ...
Webinar:  Does the SharePoint 2010 Term Store Seem Like Alphabet Soup?  Find ...Webinar:  Does the SharePoint 2010 Term Store Seem Like Alphabet Soup?  Find ...
Webinar: Does the SharePoint 2010 Term Store Seem Like Alphabet Soup? Find ...martingarland
 
Webinar: The How To Guide For Taxonomies In Share Point
Webinar: The How To Guide For Taxonomies In Share PointWebinar: The How To Guide For Taxonomies In Share Point
Webinar: The How To Guide For Taxonomies In Share Pointmartingarland
 
conceptClassifier For SharePoint Driving Business Value
conceptClassifier For SharePoint Driving Business ValueconceptClassifier For SharePoint Driving Business Value
conceptClassifier For SharePoint Driving Business Valuemartingarland
 
Concept Searching Webinar
Concept Searching WebinarConcept Searching Webinar
Concept Searching Webinarmartingarland
 
Concept Searching ConceptClassifier For SharePoint
Concept Searching ConceptClassifier For SharePointConcept Searching ConceptClassifier For SharePoint
Concept Searching ConceptClassifier For SharePointmartingarland
 

Mehr von martingarland (9)

Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...
Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...
Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...
 
Expert Webinar Series: SharePoint Governance - Managing Content Sprawl
Expert Webinar Series:  SharePoint Governance - Managing Content SprawlExpert Webinar Series:  SharePoint Governance - Managing Content Sprawl
Expert Webinar Series: SharePoint Governance - Managing Content Sprawl
 
Webinar: Business Solutions and Metadata Design
Webinar:  Business Solutions and Metadata DesignWebinar:  Business Solutions and Metadata Design
Webinar: Business Solutions and Metadata Design
 
Webinar: Records Management in SharePoint combining Governance with Content T...
Webinar: Records Management in SharePoint combining Governance with Content T...Webinar: Records Management in SharePoint combining Governance with Content T...
Webinar: Records Management in SharePoint combining Governance with Content T...
 
Webinar: Does the SharePoint 2010 Term Store Seem Like Alphabet Soup? Find ...
Webinar:  Does the SharePoint 2010 Term Store Seem Like Alphabet Soup?  Find ...Webinar:  Does the SharePoint 2010 Term Store Seem Like Alphabet Soup?  Find ...
Webinar: Does the SharePoint 2010 Term Store Seem Like Alphabet Soup? Find ...
 
Webinar: The How To Guide For Taxonomies In Share Point
Webinar: The How To Guide For Taxonomies In Share PointWebinar: The How To Guide For Taxonomies In Share Point
Webinar: The How To Guide For Taxonomies In Share Point
 
conceptClassifier For SharePoint Driving Business Value
conceptClassifier For SharePoint Driving Business ValueconceptClassifier For SharePoint Driving Business Value
conceptClassifier For SharePoint Driving Business Value
 
Concept Searching Webinar
Concept Searching WebinarConcept Searching Webinar
Concept Searching Webinar
 
Concept Searching ConceptClassifier For SharePoint
Concept Searching ConceptClassifier For SharePointConcept Searching ConceptClassifier For SharePoint
Concept Searching ConceptClassifier For SharePoint
 

Search Engine Face-Off: Keyword vs Metadata Search Costs and Effectiveness

  • 1. Search Engine Face-Off Keyword Search versus Metadata Search Don Miller, VP of Business Development Val Orekhov, Chief Architect (408) 828-3400 (240) 450-2166 x103 donm@conceptsearching.com val@portalsolutions.net
  • 2. Agenda  Introductions  Concept Searching:  What is Metadata  Keyword vs.. Metadata Search  Keyword vs.. Metadata Costs  Google vs.. SharePoint vs.. FAST  Portal Solutions:  Enterprise Search – Google vs. FAST in SharePoint 2010  Indexing Options  Approach to Security Trimming  Ranking Algorithms & Sorting Options  Metadata & Search Refinements  Concept Searching - How Do I apply metadata:  Microsoft‟s approach to applying metadata  How to automate the Microsoft approach with conceptClassifier for SharePoint 2010  Demo
  • 3. Concept Searching, Inc. Company founded in 2002  Product launched in 2003  Focus on management of structured and unstructured information  Technology  Automatic concept identification, content tagging, auto- classification, taxonomy management  Only statistical vendor that can extract conceptual metadata  2009 and 2010 „100 Companies that Matter in KM‟ (KM World Magazine)  KMWorld „Trend Setting Product‟ of 2009 and 2010  Locations: US, UK, & South Africa Client base: Fortune 500/1000 organizations  Managed Partner under Microsoft global ISV Program - “go to partner” for Microsoft for auto-classification and taxonomy management  Microsoft Enterprise Search ISV , FAST Partner  Product Suite: conceptSearch, conceptTaxonomyManager, conceptClassifier, conceptClassifier for SharePoint, contentTypeUpdater for SharePoint
  • 4. What is metadata • Metadata is a means to apply structure to unstructured or structured content or information. Metadata describes what the document is about. • Metadata makes it easier to find information. • There are usually multiple metadata terms per item or document. • Metadata can also be used for rights management, governance, retention code policies, sensitive information removal and of course improved findability.
  • 5. What Is Keyword vs. Metadata Costing You? Problem Pre Migration Search Records Management Data Privacy Protection •60% of stored •“It‟s not about better •67% of data loss in •Average cost per documents are search” Records Management is exposed record is $197 obsolete •Less than 50% of content due to end user error and ranges from $90- •50% of documents are is correctly indexed, meta •It costs and organization $305 per record duplicates tagged or efficiently $180 per document to •70% of breaches are due •Requires resources to searchable recreate it when it is not to a mistake or malicious identify what •85% of relevant tagged correctly and intent by an should/not be migrated documents are never cannot be found organization‟s own staff retrieved in search •Eliminate duplicate •Eliminate manual tagging •Eliminate inconsistent •Identify any type of Solution end user tagging organizationally defined documents & replace with automatic •Identify privacy data identification of multi- •Automatically declare privacy data exposures word concepts documents of record •Combines pattern •Identify and declare •Provide guided based on vocabulary and matching with associated records that were not navigation via the retention codes vocabulary previously identified taxonomy structure (i.e. •Automatically change the •Automatic Content Type •Identify high value concepts) Content Type and route updating enabling content •Go beyond dynamic to the Records workflows and rights •Migrating required clustering with Management repository management content to a structure conceptual clustering based on the taxonomies Benefit •Reduces migration •Taxonomy navigation •Savings of $4.00 - $7.04 •Average cost runs from costs is 36% - 48% faster per record by eliminating $225K to $35M •Ensures •Savings 2.5 hours manual tagging compliance and per user per day •Ensures compliance and protection of reduces potential content assets litigation exposures
  • 6. USAF Human Performance Clearinghouse GOAL : Leverage Existing USAF, AFDW, and AFMS License Agreements to Enable IM, RM, & Privacy & Security Compliance Requirements • DoDD 8320 (Data Sharing in a Net-Centric DoD) • DoDD 5015 (Records Management) Data Privacy • USAF Privacy Act Program & HIPAA • Freedom of Information Act (FOIA) Migration Migration Records Management Search eDiscovery & FOIA Tel: 703.246.9360 | Fax: 240.465.1182 Distribution Statement A: Approved for public release; distribution is unlimited. Distribution Statement A: Approved for public release; distribution is 311 ABG/PA No. 09-488, 16 Oct 2009 unlimited. 311 ABG/PA No. 09-488, 16 Oct 2009
  • 7. What Type of Search or Information Architecture Do You Need? Keyword Search = ~66%+ Metadata Search = 100% of of results (Recall) results (Recall) • Simple • Guided Navigation • No administration • Records Management • Good enough • Sensitive Information Removal • Collaboration Recall (information retrieval), a statistical measure (contrasted with • Improved Precision and precision), the fraction of (all) relevant material that are returned by a search Recall query • Evolution of Keyword Precision (information retrieval), the percentage of documents returned that Search are relevant
  • 8. Metadata Search vs.. Keyword and Guided Navigation “Proposal” “Software License” “SLA” “Licensee” “Addendum” “License Agreement” “License” 100% of Results Results “Documents of Record” Metadata Search also known as “Recall” “Proposals” “Contract” 66% Key + Synonym Search “Proposal” Entity Extraction 33% Keyword Search 20-33% of results Entity extraction without complex rules is ineffective. It is just keyword match, Cost (Time, Money and Complex) which is what keyword search is, which is 33% effective.
  • 9. Similar Features Against Total Number of Documents Returned Google SharePoint FAST Index 500 M + 100 M 500 M + Key Word – 33% Yes Yes – Good as Yes of results Google or FAST Synonyms - Up to Yes Yes Yes 50-66%+ of results for topic Ranking Somewhat Somewhat Very Tunable Algorithm + Best Tunable Tunable Bets: Does not improve number of results only how presented
  • 10. What Is Missing To Get to 100% of Relevant Results in Every Search? Metadata Google SharePoint FAST Auto No – No – Entity extraction, Classification Missing 33-50% Missing 33-50% which is the same of results on any of results on any as keyword particular topic particular topic search 33% results. Provides some refinement capabilities. Taxonomy No Yes, but not used Same as Management for auto SharePoint classification this release.
  • 11. Miscellaneous Items to Review Google SharePoint FAST SharePoint Hard Yes – Easy to use Medium – Initial Refiners and for standard release, does not Navigators with search. No leverage Term counts. counts on results. Store yet. XML – Powershell based RECALL Customization Limited Limited Extendable
  • 12. Summary • Google – Best for no administration, install and walk away. However, keyword approach usually missing 33%-50% of results on any given topic because of missing metadata. Not easy to integrate refiners or navigators into SharePoint UI. • SharePoint Search – Cost effective, comes free with SharePoint. Also very easy to install. Search Algorithm is as good as FAST or Google. Limited extensibility. Easy integration for refiners and navigators (no counts). However, keyword approach still missing 50% of results on any topic. • FAST – Extremely customizable, but requires training or professional services to customize. Most likely Microsoft long term platform for search. Very scalable and can provide refiner counts. However, keyword approach still missing 33-50% of results from any given search because of metadata inconsistency. • However, they are all missing a true metadata strategy which is the only way to ensure 100% of results (Recall).
  • 14. Google Search Appliance 6.8 vs.. FAST Search Server for SharePoint 2010 For metadata-driven search scenarios in a SharePoint environment Val Orekhov, Chief Architect Portal Solutions Email: val@portalsolutions.net Phone: (240) 450-2166 x 103 www.portalsolutions.net
  • 15. Agenda • Enterprise Search Technologies • Google Search Appliance 6.8 and FAST for SharePoint • Content Indexing Options • Approach to Security Trimming • Ranking Algorithms and Searching Options • Index Schema Management, Metadata & Search Refinements • Conclusions • Q&A
  • 16. Enterprise Search Technologies • Heterogeneous content sources: • HTML, Documents and LOBs records • Located on Portals, File Systems and in Databases • Required Security Trimming: • Integrate with Identity Providers (AD, LDAP, SQL) • Implement authorization decision logic • Able to take advantage of metadata stored with documents and LOBs
  • 17. Introducing the Contenders Google Search Appliance (GSA) • Search Appliance, Google.com in a box • Hardware & Software Solution • Pre-packaged functionality ready to work • “Black box” approach to search results FAST Search Server for SharePoint 2010 • Spin off of the earlier FAST ESP • Software-only solution • Allows to customize many aspects of the engine functionality down to relevancy tuning algorithms • Platform rather than a product
  • 18. Comparing FS4SP and GSA • Indexing Options • Approach to Security Trimming • Ranking Algorithms & Sorting Options • Metadata & Search Refinements
  • 19. Content Crawl Options GSA FAST SharePoint Content Pull HTTP Crawler SharePoint Crawler SharePoint Enterprise Crawler Crawler Content Push XML Feed API Feed API - Indexing LOBs (Pull) Onboard Database Databases & Web Services Databases & Connector via SharePoint BCS Web Services via SharePoint BCS Connectors SharePoint, OTB: File System, OTB: File Documentum, Exchange Public Folders System, LiveLink, FileNet, File Exchange Public System, LDAP Custom: Documentum, Folders Lotus Notes External Metadata Push through XML Custom Stages in the - Feed API processing pipeline Cloud Connectivity Google Apps & Sites; Custom connectors - Tweeter;
  • 20. Comparing FS4SP and GSA • Indexing Options • Approach to Security Trimming • Ranking Algorithms & Sorting Options • Metadata & Search Refinements
  • 21. Security Trimming • Answers the “Who Am I” and “What Results Can I See” questions • Required with most Enterprise Search scenarios • Approaches include Late & Early Authorization/Biding Authorization Access Rights Pros Cons Approach (ACLs) Late Checked at run - Up-to-date presentation - Slow on larger time against system sections of result of record sets Early Information stored - Fast - Duplicates info in the index at item - Facilitates metadata - Potential for level clustering outdated results
  • 22. Security Trimming Options Support GSA FAST SharePoint 2010 Late - “Default” option in - ? - Custom Authorization many scenarios - Via Kerberos, SAML Bridge or Connector Early - Rel. 6.0 –High level - Item-level ACLs for Native support Authorization Policy ACLs configured Windows and for Item-level by admins or through a SharePoint security ACLs with remote API * principals supported Windows and - Rel. 6.8 – Item-level natively SharePoint ACLs) ** - Allows to setup multiple security user property stores and principals map user principals * Best applied to enterprises with a manageable number of high level policies, or able to invest into custom ACL sync tools ** SharePoint Connector Rel. 2.6.4 sends SharePoint Site Groups with the feed but the Groups are not expanded property by GSA
  • 23. Comparing FS4SP and GSA • Indexing Options • Approach to Security Trimming • Ranking Algorithms & Sorting Options • Metadata & Search Refinements
  • 25. Result Set Ranking • Fidelity of keyword matches (All Engines) • Proximity • Frequency • Completeness • Hyper Text Matching (GSA only) • Analyzes keyword location on a rendered page and related pages • Hub and Spoke Algorithm (All engines) • Driven by linkages between web pages • Pages receiving or providing most links have higher rankings • GSA – PageRank; FAST – Document authority; • Static rank biasing, document importance • Document, Site, Metadata -based promotion / demotion (All engines) • User-tagged documents receive higher importance (FAST, SharePoint search) • Adaptive ranking • User clicks in search results (FAST, SharePoint search) • Custom Ranking • Build custom ranking models w/ FAST
  • 26. Result Set Sorting • GSA • Date/Time only (Document Modification Date, or a date extracted from Title, Metadata or Body of a document) • FAST • Any property marked as Sortable • Supported data types: String, Number, Date/Time
  • 27. Comparing FS4SP and GSA • Indexing Options • Approach to Security Trimming • Ranking Algorithms & Sorting Options • Index Schema Management, Metadata & Search Refinements
  • 28. Index Schema Management • GSA (All-inclusive) • All discovered metadata (Crawled Properties) are stored in the index by default • Metadata from MS Office documents stored in the index results. (GSA Feature Request ID# 1371024) • All string-type metadata is associated with FTI by default, matches on metadata controlled through query time (allintext:, allintitle: keyword filters) • Metadata in results limited to 1,500 chars per field (Rel. 6.8; prev. releases – 320 chars) • FAST (Opt-in) • Crawled properties have to be associated with Managed Properties (MPs) to be stored in the index • MPs represent a level of abstraction from Content Sources • MPs can be configured to be used as: • Stored in the index (Queryable) • Associated with FTI (Searchable) • Sortable • Refiner-enabled
  • 29. Search Refinement with Metadata Approach Completeness Pros Cons Run-time Smaller sample of - Smaller index size - Degraded clustering / much larger set; performance w/ Shallow Top 50-100 query larger samples refiners results. - No cluster counts Index-based Entire result set - Fast - Increases index clustering / stored in the index. - Allows for precise cluster size Deep refiners counts
  • 30. Search Refinement with Metadata GSA FAST SharePoint 2010 Run-time - The only option prior to - OTB - OTB clustering / Rel. 6.8 (Custom) Shallow refiners Index-based - “Preview” status in Rel. - OTB for MPs marked as - Not available clustering / 6.8 (OTB) Refinable Deep refiners - Inverted Index and Metadata Property Store combined into a high performance OLAP cube
  • 31. Conclusions* • SharePoint intranet as a hub + • Heterogeneous content sources GSA FAST document libraries, LOBs; dominated by web pages • Search results served from the • Search UI served by GSA SharePoint portal • Predominantly Keyword –driven • Active Directory -tied systems w/ search experience, content security policies applied • Custom run-time search refiners for broadly protected content; OTB “Dynamic • Fine level of control over index Navigation” for LOB / public data schema and document processing • Result biasing via URL patterns, • Custom search results ranking / metadata values relevancy models • Medium complexity metadata-based • High complexity metadata-based search scenarios search scenarios • Full & Mini Search-driven applications * Usage scenarios best aligned with OTB functionality, minimum possible customizations.
  • 34. In Summary: Enterprise Search Comparison for SharePoint vs. Google vs. FAST Why Enterprise Search needs Metadata and Taxonomy Management – Recall – Ensures you bring back 100% of Results – Enhances Precision – Fastest way to filter to the right results so that you are looking at the documents that matter the most – Boosts the relevancy of documents – Drives Records Management, Sensitive Information Removal, Retention Code Policies MUST HAVES: – Heterogeneous content sources: • HTML, Documents and LOBs records • Located on Portals, File Systems and in Databases – Required Security Trimming: • Integrate with Identity Providers (AD, LDAP, SQL) • Implement authorization decision logic – Able to take advantage of metadata stored in documents and LOBs
  • 35. How do I apply metadata to content?
  • 36. Microsoft‟s approach to solving the metadata problem for Records Management, Governance Policies, Sensitive Information Removal and Findability: Content Types, The Term Store and Enterprise Managed Metadata Services
  • 37. What is a content type • A Content Types is a means to apply structure to unstructured or structured content with in SharePoint. Content Types inherit their parent content types. • This is usually a combination of a term or terms from a single or multiple term sets. • Terms are metadata and metadata is information about information. • Terms can also include governance and retention code policies and also can be for the sole purpose of improved findability • However, it is best to align Content Types with business goals and business use cases.
  • 38. Introducing EMM, The Term Store and Term Store Management Definitions SharePoint 2010 conceptClassifier for Enterprise Managed SharePoint 2010 Metadata Service SharePoint 2010 Farm Term Store Management Subscription Service Auto Classification Content Type Hub Content Type Term Store Site Collection Updating Records Library
  • 39. The Managed Metadata Service Managed Metadata Service Manages Enterprise Content Types via the Content Type Hub Manages Term Store Term Sets (taxonomies) and terms can be shared across multiple SharePoint site Enterprise Managed Metadata Service collections Multiple manage metadata services can be created Enables search filtering 30,000 Terms per Term Set Two types of terms: (1 Taxonomy) Managed terms – pre-defined by an enterprise administrator and may be 1,000 Term Sets hierarchical. Surfaced in the "managed metadata" column type Tested to 1,000,000 Preferred Terms Managed keywords – non-hierarchical words or phrases that have been added to SharePoint 2010 items by users (folksonomy)
  • 40. conceptClassifier for SharePoint is the only native Term Store Management tool for 2010 Term Set Parent Term Build term sets/taxonomies Child Term here in SharePoint 2010 EMM. Plan for 30,000 Grand Child Term values A content type can contain one or many taxonomies based on specific business user requirement. The values can shown as columns or can be hidden from users for administrative or governance purposes only.
  • 41. Traditional manual approach is subjective, cumbersome and overwhelming End user must select values from multiple term sets. Up to 30,000 values per term set and 1,000 term sets per term store. Manual approach is impractical.
  • 42. conceptClassifier for SharePoint 2010 An automated solution for applying metadata and providing term store management to enhance SharePoint 2010 capabilities for Records Management, Governance Policies, Rights Management, Sensitive Information Removal and Findability.
  • 43. A Manual Metadata Approach Will Fail 95%+ Of The Time Issue Organizational Impact Inconsistent Less than 50% of content is correctly indexed, meta-tagged or efficiently searchable rendering it unusable to the organization (IDC) Subjective Highly trained Information Specialists will agree on meta tags between 33% - 50% of the time. (C. Cleverdon) Cumbersome - Expensive Average cost of manually tagging one item runs from $4 - $7 per document and does not factor in the accuracy of the meta tags nor the repercussions from mis-tagged content (Hoovers) Malicious Compliance End users select first value in list (Perspectives on Metadata, Sarah Courier) No perceived value for end What‟s in it for me? End user creates document, does not see value user for organization nor risks associated with litigation and non conformance to policies. What have you seen Metadata will continue to be a problem due to inconsistent human behavior The answer to consistent metadata is an automated approach that can extract the meaning from content eliminating manual metadata generation yet still providing the ability to manage knowledge assets in alignment with the unique corporate knowledge infrastructure.
  • 44. conceptClassifier for SharePoint 2010 provides an automated metadata approach for an immediate ROI and to drives business value  Create enterprise automated metadata framework/model  Average return on investment minimum of 38% and runs as high as 600% (IDC) 1. Model and Validate  Apply consistent meaningful metadata to enterprise content  Incorrect meta tags costs an organization 6. Life Cycle 2. Automate Management Tagging $2,500 per user per year – in addition potential costs for non-compliance (IDC)  Guide users to relevant content with taxonomy navigation  Savings of $8,965 per year per user based 5. Records on an $80K salary (Chen & Dumais) Management 3. Findability  100% “Recall” of content, 35% Faster and PII access to content “Precision” 4. Business  Use automatic conceptual metadata generation Processes to improve Records Management  Eliminate inconsistent end user tagging at $4-$7 per record (Hoovers)  Improve compliance processes, eliminate potential privacy exposures
  • 45. conceptClassifier provides a native integration into Term Store Native integration into Term No Service Pack Updates, no custom code. Store conceptClassifier is a native integration. No custom property types Every item is synchronized with term store and is a part of managed metadata service. All search features work natively as they should. No custom search property values which require custom code updates and additional custom search controls. conceptClassifier is a native integration. Why do we work with native Because it is the natural place that you term store natively should store metadata if you are driving economies of scale by leveraging Microsoft stack. That is Microsoft‟s road map for metadata management. Easy Upgrade If you want to go back to a pure manual application, there is no code rewrite. conceptClassifier is a native integration. You just unplug and you are back to native.
  • 46. Automated Multi Word Term Suggestions for Term Store Concept Searching‟s unique statistical concept identification underpins all technologies. Multi word suggestion is explicitly more valuable than single term suggestion algorithms. Concept Searching provides Automatic Concept Term Extraction Triple Heart Bypass Baseball Organ Highway Three Center Avoid  conceptClassifier will generate conceptual metadata by extracting multi-word terms that identifies „triple heart bypass‟ as a concept as opposed to single keywords . Metadata can be used by any search engine index or any application/process that uses metadata.
  • 47. conceptClassifier for SharePoint 2010 drives immediate value for end users for Search, Records Management and Sensitive Information Removal conceptClassifier for SharePoint 2010  Automatically applies Metadata  Automatically Applies Content Types  Auto Applies Retention Code Policies  Automatically applies Windows Rights Management Policies  Automatic Term Boosting for FAST  Pulls hierarchy directly from Term Store, therefore updates are immediate and accurate for guided taxonomy navigation in FAST
  • 48. Enterprise Taxonomy Management and Auto-classification  Multi User Distributed Branch and Term Support for Enterprise  Native Term Store Integration for SharePoint 2010  Accelerate building out taxonomies by 75% with automatic Term/Clue Suggestion  Enables the ability for information architects to build model and validate  Automatic Term Boosting for FAST/Search Platforms  Pragmatic Ontology Features for subject matter experts (You don‟t need to be a librarian)  Broad to Narrow  Preferred Term  Non preferred terms  Poly hierarchies – Not supported in Term Store  Relations – Not supported in Term Store
  • 49. conceptClassifier for FAST Search  Improves search outcomes by placing conceptual metadata in the FAST Search  Provides accurate metadata filters such as numeric index to increase relevancy of search results range searching and wildcard alphanumeric matching Enables import of FAST Entities into the  Removes documents from search results that are conceptClassifier taxonomy manager to fine- confidential/sensitive through automatic Content tune them with metadata generated from your Type updating and routing to secure server own content and nomenclature  Automatically tags content with both vocabulary  Runs natively as a FAST Pipeline Stage and retention codes and respects SharePoint eliminating integration and customization security that could prevent access to the document issues once it has been declared a record Eliminates vocabulary normalization issues across global boundaries through controlled vocabularies  Improves faceted search results as facets are based on concepts aligned with the taxonomy  Provides taxonomy browse capabilities based on the nodes within the corporate taxonomy(s)
  • 51. Traditional manual approach is subjective, cumbersome and ineffective End user must select values from multiple term sets. Up to 30,000 values per term set and 1,000 term sets per term store. Manual approach is impractical.
  • 52. An automated approach ensures accurate Records Management, Sensitive Information Removal and improved Search/Findability c Metadata is automatically applied to content by ConceptClassifier via TaxonomyManager. contentTypeUpdater can take it a step further and can modify content type to redirect document/object to a different content type or migrate it to another site collection or document library. In this example the documents are being changed from document content type to PII or Records Center Content Type.
  • 53. Term Store Management is provided by Taxonomy Manager and conceptClassifier TaxonomyManager is an intuitive and elegant to Deep capabilities to build out rules classification tool to manage how and approaches including: standard term, phonetics, when term sets are metadata, class ID, language, case sensitive, applied within regular expression and boosting. SharePoint 2010 and what new terms to add to the term store.
  • 54. An automated approach ensures accurate Records Management, Sensitive Information Removal and improved Search/Findability The documents with 10 in front of them have had their content types updated. In this example the documents are being changed from document content type to PII or Records Center Content Type. They could have also been moved to a different folder if that was the desired outcome.
  • 55. conceptClassifier for FAST and SharePoint 2010 Search conceptClassifier for 2010 Product Suite provides intuitive guided navigation for FAST Multi value select with in a term set is the single fastest approach you can provide for end users to get access to the correct content. It is just like picking values when you are on Best Buy or Amazon but it is with your personalized corporate term set vocabulary.
  • 56. Demo – How to automate the process of applying metadata in a SharePoint 2010 native term store environment to improve Findability and Records Management
  • 57. QA
  • 58. Thank You Don Miller, VP of Business Development Val Orekhov, Chief Architect (408) 828-3400 (240) 450-2166 x103 donm@conceptsearching.com val@portalsolutions.net