SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Putting Structured Business Vocabularies to Work

                                                                                November 4, 2008
                                               Data Management and Information Quality Conference
                                                                                          IRM UK

                                                                                        Ian Davis
                                                     Global Project Manger, Dow Jones & Company




© Copyright 2008 Dow Jones and Company, Inc.
What we’ll cover today:

        Understanding the challenges of controlled versus
         uncontrolled vocabularies

        Developing a strategy to create and maintain
         controlled vocabularies

        Identifying how you want to integrate your controlled
         vocabularies into your systems

        Understanding the requirements of integrating
         controlled vocabularies into multiple applications




© Copyright 2008 Dow Jones and Company, Inc.                     2
Setting the Context




© Copyright 2008 Dow Jones and Company, Inc.
Once upon a time…


        Most of the business was IT enabled.
        There was some degree of “sharing” of information
         and content, there were even some large, well
         structured document repositories.
        Yet, no one could find anything.
        Actually, they found things,
          but not what they wanted when they wanted it
          and they were never sure they found the “best” or “saw
            it all”.




© Copyright 2008 Dow Jones and Company, Inc.                        4
Once upon a time…


        The C-level executives were a bit irritated.
          They’d spent lots on the technology
          and people really weren’t much more efficient,
          the pinch point in the workflow had simply
           moved further downstream.
        So, what happened next?




© Copyright 2008 Dow Jones and Company, Inc.                5
Once upon a time…


        They SPENT <more> MONEY and bought the
         best in class search utilities.
        Yet, no one could find anything.
        Actually, they found things,
          but not what they wanted when they wanted it
          and they were never sure they found the “best”
           or “saw it all”.




© Copyright 2008 Dow Jones and Company, Inc.                6
Once upon a time…


        The C-level executives became a bit more
         irritated.
        Everyone was a bit frustrated.
        What was missing?




© Copyright 2008 Dow Jones and Company, Inc.        7
Optimized?

                 Is the search utility optimized using all the
                  bells and whistles it came with?
                  Relevancy rankings
                  “Thesaurus” files (synonym lists)
                  Multi-lingual capabilities
                  Common searches saved and presented to
                     users
                  Logs reviewed to understand user issues




© Copyright 2008 Dow Jones and Company, Inc.                      8
Usable?
                 Is the user interface considerate to users?
                  Was it designed with YOUR users in mind
                     Designed for occasional users?
                     Designed for power users?
                  Was it designed with YOUR business in mind
                     Task-based views for context sensitive
                       searches
                     Present results in a format readily used
                       within work flows



© Copyright 2008 Dow Jones and Company, Inc.                     9
Metadata?

                 Are there required metadata fields within the CMS?
                  Author, Title, Language, Topic, Product/Service, etc
                 Are the entry values to those fields controlled?
                  Lookups against authority files, taxonomies, thesauri
                 Does the search utility support fielded searches?
                 Does the search utility weight terms within metadata
                  fields higher than free-text?




© Copyright 2008 Dow Jones and Company, Inc.                               10
Metadata?
                 For example:
                  If a financial analyst enters the query term “stock”
                     within the company’s knowledge base,
                  Will he get back results with the documents
                     specifically discussing “stock” as a financial
                     instrument listed first?
                 Or will he have to look through 100’s of documents
                  discussing what’s relevant to him as well as every
                  document that references free-text in the body of
                  the document about:
                     soup stock (food industry),
                     cows (livestock industry),
                 or stock car racing (professional sports industry)?


© Copyright 2008 Dow Jones and Company, Inc.                              11
Metadata?
                 Precise and comprehensive searches
                  Only if controlled vocabularies have been used to
                     populate metadata fields
                 AND
                  The search utility takes advantage of that by giving
                     priority to query term occurrence within controlled
                     value metadata fields
                 OR
                  Fielded searches are enabled
                     e.g. <Author = Smith> + <Service = Consulting> +
                        <Industry = Automotive> + <Date = January 2006>
                        + <Content Type = Proposal>


© Copyright 2008 Dow Jones and Company, Inc.                               12
Challenges:
                Controlled versus Uncontrolled




© Copyright 2008 Dow Jones and Company, Inc.
Controlled Vocabularies Explained


        Authority files
           e.g. Company’s active directory, ISO standard for Languages
           Typically a flat list of allowed values
        Taxonomies
           e.g. Linnaean Classification (kingdom, phylum, class, order,
            family, genus, and species )
           Typically includes only hierarchical relationships between terms
        Thesauri
           e.g. NASA Thesaurus (http://www.sti.nasa.gov/thesfrm1.htm)
           Includes full set of semantic relationships defined between terms
            (hierarchical, associative, equivalence)




© Copyright 2008 Dow Jones and Company, Inc.                                    14
NASA Thesaurus – Sample Entry




© Copyright 2008 Dow Jones and Company, Inc.    15
Semantic Relationships

        Hierarchical
           Superordination - representing a class or a whole, and
            subordination - referring to members or parts
              e.g. mammals and vertebrates
              e.g. cherry pie and cherry pie slices
        Equivalence
           One concept expressed by two or more terms
              e.g. dogs and canines
        Associative
           Terms that are conceptually linked, but not through
            hierarchy or equivalence
              e.g. accounting and accountant

© Copyright 2008 Dow Jones and Company, Inc.                         16
Challenges – Uncontrolled Vocabularies

        Uncontrolled vocabularies are:
          Comprehensive but noisy
             Only comprehensive if synonym lists are
              used
          Limited in their precision and relevancy
             Time lost scanning through hundreds of
              “miss” hits
          Reduced effectiveness of cross-repository
           searches
             Limited ways to disambiguate ‘soup stock’
              from ‘stock car’
© Copyright 2008 Dow Jones and Company, Inc.              17
Challenges - Controlled Vocabularies

        Controlled vocabularies can produce:
          Potentially significant overhead effort (manual
           and technical)
          Organizational politics can add YEARS to
           establishing an initial set of controlled
           vocabularies
          A lack of basic understanding of what the
           controlled vocabularies are and how they work
           impedes effective development and utilization



© Copyright 2008 Dow Jones and Company, Inc.                 18
Challenges - Controlled Vocabularies

        Controlled vocabularies:
                  Richness and power comes from a full set of semantic
                   relationships, not just hierarchical ones
                     Hierarchy supports the ability to narrow and broaden
                      search queries
                     Association supports “did you mean” and “you might
                      also want to look at”
                     Equivalence enables the use of familiar language to
                      retrieve content which is conceptually on target but
                      never uses their term
                          e.g. user enters dog and search utility expands
                           query to include “canine, k-9, puppy”


© Copyright 2008 Dow Jones and Company, Inc.                                 19
Challenges - Controlled Vocabularies

        Controlled vocabularies:
          Richness and power comes at the cost of
           added complexity of development,
           implementation, integration and maintenance
          Utilization of controlled vocabularies can
           produce performance issues
             During search index creation
             During query run time




© Copyright 2008 Dow Jones and Company, Inc.             20
Tackling the Challenges




© Copyright 2008 Dow Jones and Company, Inc.
Strategy – Creation and Maintenance


                 State the business case clearly
                  Benefits
                     Reduced time for knowledge discovery
                     Increased richness of knowledge discovery
                     Decreased risk to firm of making business
                       decisions with partial information
                  Scope
                     One business unit or enterprise-wide?
                  Resource requirements
                     Skill sets (IS, IT, business knowledge)
                     Time commitment


© Copyright 2008 Dow Jones and Company, Inc.                      22
Strategy – Creation and Maintenance


                 Tackle organizational politics head-on
                  Gain credibility and ensure usability by establishing a
                    cross-functional working committee that will become
                    the Review Committee
                  Include all major stakeholder groups and any
                    interested parties (even the non-supporters)
                  Establish methods of broadly soliciting end-user input
                    that will become a source of change requests during
                    maintenance phases




© Copyright 2008 Dow Jones and Company, Inc.                                 23
Strategy – Creation and Maintenance


                 Additional considerations before you start:
                  How rigorous does it need to be?
                    What external standards should be adopted?
                        ANSI/NISO Z39.19-2005
                        British Standard – BS 8723
                    What internal standards should be developed?
                        Editorial Guidelines
                        Usage Guidelines
                  How extensive will it be?
                    Depth and breadth within and across facets
                  What about adaptability and flexibility
                    Will there be a need for local extensions?


© Copyright 2008 Dow Jones and Company, Inc.                        24
Strategy – Creation and Maintenance


                 Additional considerations before you start:
                  Projected frequency of revisions
                    How quickly does the content base change with
                       respect to concepts; is there significant content
                       drift?
                    How volatile is the language?
                        Management consulting vs. accounting
                  Vocabulary Management Software
                    DON’T spend money just to spend money
                    However, you CAN’T manage controlled
                       vocabularies in a spreadsheet
                    Buy the tool you need based on your documented
                       functional requirements
© Copyright 2008 Dow Jones and Company, Inc.                               25
Strategy – Integration Choices

        Performance trade-offs
           Store UIDs within content, then use look-up table at
            query run time
           Store full-text of a term, then touch all content when
            taxonomy value changes (must re-assign new term
            value)
        Version control
           Use static versions of controlled vocabularies within
            CMS and search utilities, releasing new versions
            periodically
           Use dynamic version of controlled vocabularies with
            continuous revisions occurring


© Copyright 2008 Dow Jones and Company, Inc.                         26
Strategy – Integration Choices

        Utilizing semantic relationships
          Store full set (term values or UIDs) within
           content record
         OR
          Store single UID and have search utility use
           reference tables to determine related terms
        Display of semantic relationships
          User interface considerations for effective
           presentation of non-hierarchically related terms


© Copyright 2008 Dow Jones and Company, Inc.                  27
Strategy – Integration Choices


                                                             Query entry
                                                   (including ability to broaden or
                                                   narrow current search results)


                                               Previous query statement user entered     Related topics
           Browse navigation                   plus any auto-expansion done by engine   (defined through
               options                                                                     Associative
                                                                                          relationships)


                                                        Query results listing




© Copyright 2008 Dow Jones and Company, Inc.                                                               28
Strategy – Multiple Applications

        Expanding the adoption and use of controlled
         vocabularies
           Know the business objectives of the applications
               In conjunction with the search utility, does the
                controlled vocabulary enable this objective?
           Are there metadata fields available within current
            application for the controlled vocabulary?
           Does the business have resources to assign the
            controlled vocabulary?
           What format does the controlled vocabulary need to be
            in to be integrated with the application?



© Copyright 2008 Dow Jones and Company, Inc.                        29
Strategy – Multiple Applications

        Additional considerations
          Will there be conflicting version management
           needs?
          How does search currently index these
           applications and will that change with the use
           of controlled vocabularies?




© Copyright 2008 Dow Jones and Company, Inc.                30
Five Key Points

       1. Controlled vocabularies are a lever to improve
          precision and comprehensiveness
       2. Controlled vocabularies are never finished – they are
          always a work in process
       3. Search utilities can only be tweaked so far
       4. Tapping into the richness of the semantic
          relationships between terms can be extremely
          powerful
       5. There are lots of options for implementing and
          integrating controlled vocabularies




© Copyright 2008 Dow Jones and Company, Inc.                      31
Thank you for your attention!

                           Ian Davis
                           ian.davis@dowjones.com




© Copyright 2008 Dow Jones and Company, Inc.

Weitere ähnliche Inhalte

Andere mochten auch

LinkedIn and Twitter Lab
LinkedIn and Twitter LabLinkedIn and Twitter Lab
LinkedIn and Twitter Lab
Helen Buzdugan
 
Cloud Computing Presentation V3
Cloud Computing Presentation V3Cloud Computing Presentation V3
Cloud Computing Presentation V3
David Oliver
 
Chapter 12
Chapter 12Chapter 12
Chapter 12
dphil002
 
Bonnie’S Life In Ethiopia
Bonnie’S Life In EthiopiaBonnie’S Life In Ethiopia
Bonnie’S Life In Ethiopia
bmohan
 
Portland Views
Portland ViewsPortland Views
Portland Views
gardenmam
 
Beowulf summary
Beowulf summaryBeowulf summary
Beowulf summary
Mr. Euc@s
 
La Casa Invisible
La Casa InvisibleLa Casa Invisible
La Casa Invisible
Crisis 999
 

Andere mochten auch (20)

LinkedIn and Twitter Lab
LinkedIn and Twitter LabLinkedIn and Twitter Lab
LinkedIn and Twitter Lab
 
Git, как инструмент управления веб-контентом
Git, как инструмент управления веб-контентомGit, как инструмент управления веб-контентом
Git, как инструмент управления веб-контентом
 
Sergio mejia a.
Sergio mejia a.Sergio mejia a.
Sergio mejia a.
 
Cloud Computing Presentation V3
Cloud Computing Presentation V3Cloud Computing Presentation V3
Cloud Computing Presentation V3
 
Social Good: Social Media beyond politics
Social Good: Social Media beyond politicsSocial Good: Social Media beyond politics
Social Good: Social Media beyond politics
 
Big data - Cassandra
Big data - CassandraBig data - Cassandra
Big data - Cassandra
 
Ciszewski internet credentials and case study eng
Ciszewski internet credentials and case study engCiszewski internet credentials and case study eng
Ciszewski internet credentials and case study eng
 
Chapter 12
Chapter 12Chapter 12
Chapter 12
 
DocDokuPLM : Domain Specific PaaS and Business Oriented API, OW2con'16, Paris.
DocDokuPLM : Domain Specific PaaS and Business Oriented API, OW2con'16, Paris. DocDokuPLM : Domain Specific PaaS and Business Oriented API, OW2con'16, Paris.
DocDokuPLM : Domain Specific PaaS and Business Oriented API, OW2con'16, Paris.
 
Hammr Project Update: Machine Images and Docker Containers for your Cloud, OW...
Hammr Project Update: Machine Images and Docker Containers for your Cloud, OW...Hammr Project Update: Machine Images and Docker Containers for your Cloud, OW...
Hammr Project Update: Machine Images and Docker Containers for your Cloud, OW...
 
Bonnie’S Life In Ethiopia
Bonnie’S Life In EthiopiaBonnie’S Life In Ethiopia
Bonnie’S Life In Ethiopia
 
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris. Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
 
Portland Views
Portland ViewsPortland Views
Portland Views
 
Beowulf summary
Beowulf summaryBeowulf summary
Beowulf summary
 
La Casa Invisible
La Casa InvisibleLa Casa Invisible
La Casa Invisible
 
Monitoring File transfert (MFT) WAARP R66, OW2con'16, Paris.
Monitoring File transfert (MFT) WAARP R66, OW2con'16, Paris. Monitoring File transfert (MFT) WAARP R66, OW2con'16, Paris.
Monitoring File transfert (MFT) WAARP R66, OW2con'16, Paris.
 
Egypt Travel- Webinar Slide Show (June 2009)
Egypt Travel- Webinar Slide Show (June 2009)Egypt Travel- Webinar Slide Show (June 2009)
Egypt Travel- Webinar Slide Show (June 2009)
 
nuage, deployment strategy of a distributed cloud infrastructure, OW2con'15, ...
nuage, deployment strategy of a distributed cloud infrastructure, OW2con'15, ...nuage, deployment strategy of a distributed cloud infrastructure, OW2con'15, ...
nuage, deployment strategy of a distributed cloud infrastructure, OW2con'15, ...
 
OW2con'14 - Weblab in the land of Big Data
OW2con'14 - Weblab in the land of Big DataOW2con'14 - Weblab in the land of Big Data
OW2con'14 - Weblab in the land of Big Data
 
Role Of Industrial Hygienist In Asbestos Litigation
Role Of Industrial Hygienist In Asbestos LitigationRole Of Industrial Hygienist In Asbestos Litigation
Role Of Industrial Hygienist In Asbestos Litigation
 

Ähnlich wie Putting Controlled Vocabulary To Work I Davis 2008

Avoiding the Software Marketing Trap: Understanding Lies and Near-Truths When...
Avoiding the Software Marketing Trap: Understanding Lies and Near-Truths When...Avoiding the Software Marketing Trap: Understanding Lies and Near-Truths When...
Avoiding the Software Marketing Trap: Understanding Lies and Near-Truths When...
Scott Abel
 
EIS-Webinar-Most-From-LLMs-2023-08-23.pptx
EIS-Webinar-Most-From-LLMs-2023-08-23.pptxEIS-Webinar-Most-From-LLMs-2023-08-23.pptx
EIS-Webinar-Most-From-LLMs-2023-08-23.pptx
Earley Information Science
 

Ähnlich wie Putting Controlled Vocabulary To Work I Davis 2008 (20)

Taxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information ArchitectureTaxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information Architecture
 
Taxonomy 101
Taxonomy 101Taxonomy 101
Taxonomy 101
 
Dynamic Potential of Semantic Enrichment
Dynamic Potential of Semantic EnrichmentDynamic Potential of Semantic Enrichment
Dynamic Potential of Semantic Enrichment
 
Folksonomies&Taxonomies Dow Jones Webcast
Folksonomies&Taxonomies Dow Jones WebcastFolksonomies&Taxonomies Dow Jones Webcast
Folksonomies&Taxonomies Dow Jones Webcast
 
Terminology Management
Terminology ManagementTerminology Management
Terminology Management
 
Avoiding the Software Marketing Trap: Understanding Lies and Near-Truths When...
Avoiding the Software Marketing Trap: Understanding Lies and Near-Truths When...Avoiding the Software Marketing Trap: Understanding Lies and Near-Truths When...
Avoiding the Software Marketing Trap: Understanding Lies and Near-Truths When...
 
Dealing the Cards
Dealing the CardsDealing the Cards
Dealing the Cards
 
PoolParty Platform 2013
PoolParty Platform 2013PoolParty Platform 2013
PoolParty Platform 2013
 
Taxonomies for Human vs Auto-Indexing
Taxonomies for Human vs Auto-IndexingTaxonomies for Human vs Auto-Indexing
Taxonomies for Human vs Auto-Indexing
 
Federated Search Webinar for SLA (Special Libraries Assoc.)
Federated Search Webinar for SLA (Special Libraries Assoc.)Federated Search Webinar for SLA (Special Libraries Assoc.)
Federated Search Webinar for SLA (Special Libraries Assoc.)
 
TermWiki
TermWikiTermWiki
TermWiki
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How Ontologies Power Chatbots
How Ontologies Power ChatbotsHow Ontologies Power Chatbots
How Ontologies Power Chatbots
 
Document Classification for Microsoft Office
Document Classification for Microsoft OfficeDocument Classification for Microsoft Office
Document Classification for Microsoft Office
 
Taxonomies and Folksonomies
Taxonomies and FolksonomiesTaxonomies and Folksonomies
Taxonomies and Folksonomies
 
EIS-Webinar-Most-From-LLMs-2023-08-23.pptx
EIS-Webinar-Most-From-LLMs-2023-08-23.pptxEIS-Webinar-Most-From-LLMs-2023-08-23.pptx
EIS-Webinar-Most-From-LLMs-2023-08-23.pptx
 
Role of metadata in transportation agency data programs
Role of metadata in transportation agency data programsRole of metadata in transportation agency data programs
Role of metadata in transportation agency data programs
 
Taxonomy Fundamentals - SLA 2014
Taxonomy Fundamentals - SLA 2014Taxonomy Fundamentals - SLA 2014
Taxonomy Fundamentals - SLA 2014
 
S doherty counting_dragons_dita-reuse
S doherty counting_dragons_dita-reuseS doherty counting_dragons_dita-reuse
S doherty counting_dragons_dita-reuse
 
User-Driven Taxonomies
User-Driven TaxonomiesUser-Driven Taxonomies
User-Driven Taxonomies
 

Kürzlich hochgeladen

Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in PakistanChallenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
vineshkumarsajnani12
 
Mckinsey foundation level Handbook for Viewing
Mckinsey foundation level Handbook for ViewingMckinsey foundation level Handbook for Viewing
Mckinsey foundation level Handbook for Viewing
Nauman Safdar
 

Kürzlich hochgeladen (20)

Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in PakistanChallenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
 
Mckinsey foundation level Handbook for Viewing
Mckinsey foundation level Handbook for ViewingMckinsey foundation level Handbook for Viewing
Mckinsey foundation level Handbook for Viewing
 
Berhampur Call Girl Just Call 8084732287 Top Class Call Girl Service Available
Berhampur Call Girl Just Call 8084732287 Top Class Call Girl Service AvailableBerhampur Call Girl Just Call 8084732287 Top Class Call Girl Service Available
Berhampur Call Girl Just Call 8084732287 Top Class Call Girl Service Available
 
Durg CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN durg ESCORTS
Durg CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN durg ESCORTSDurg CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN durg ESCORTS
Durg CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN durg ESCORTS
 
Putting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptxPutting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptx
 
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All TimeCall 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
 
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
 
Lucknow Housewife Escorts by Sexy Bhabhi Service 8250092165
Lucknow Housewife Escorts  by Sexy Bhabhi Service 8250092165Lucknow Housewife Escorts  by Sexy Bhabhi Service 8250092165
Lucknow Housewife Escorts by Sexy Bhabhi Service 8250092165
 
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
 
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 
CROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NSCROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NS
 
Falcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business PotentialFalcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business Potential
 
HomeRoots Pitch Deck | Investor Insights | April 2024
HomeRoots Pitch Deck | Investor Insights | April 2024HomeRoots Pitch Deck | Investor Insights | April 2024
HomeRoots Pitch Deck | Investor Insights | April 2024
 
New 2024 Cannabis Edibles Investor Pitch Deck Template
New 2024 Cannabis Edibles Investor Pitch Deck TemplateNew 2024 Cannabis Edibles Investor Pitch Deck Template
New 2024 Cannabis Edibles Investor Pitch Deck Template
 
Pre Engineered Building Manufacturers Hyderabad.pptx
Pre Engineered  Building Manufacturers Hyderabad.pptxPre Engineered  Building Manufacturers Hyderabad.pptx
Pre Engineered Building Manufacturers Hyderabad.pptx
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentation
 
Berhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 
Falcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business GrowthFalcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business Growth
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with Culture
 
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR ESCORTS
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR  ESCORTSJAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR  ESCORTS
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR ESCORTS
 

Putting Controlled Vocabulary To Work I Davis 2008

  • 1. Putting Structured Business Vocabularies to Work November 4, 2008 Data Management and Information Quality Conference IRM UK Ian Davis Global Project Manger, Dow Jones & Company © Copyright 2008 Dow Jones and Company, Inc.
  • 2. What we’ll cover today:  Understanding the challenges of controlled versus uncontrolled vocabularies  Developing a strategy to create and maintain controlled vocabularies  Identifying how you want to integrate your controlled vocabularies into your systems  Understanding the requirements of integrating controlled vocabularies into multiple applications © Copyright 2008 Dow Jones and Company, Inc. 2
  • 3. Setting the Context © Copyright 2008 Dow Jones and Company, Inc.
  • 4. Once upon a time…  Most of the business was IT enabled.  There was some degree of “sharing” of information and content, there were even some large, well structured document repositories.  Yet, no one could find anything.  Actually, they found things,  but not what they wanted when they wanted it  and they were never sure they found the “best” or “saw it all”. © Copyright 2008 Dow Jones and Company, Inc. 4
  • 5. Once upon a time…  The C-level executives were a bit irritated.  They’d spent lots on the technology  and people really weren’t much more efficient,  the pinch point in the workflow had simply moved further downstream.  So, what happened next? © Copyright 2008 Dow Jones and Company, Inc. 5
  • 6. Once upon a time…  They SPENT <more> MONEY and bought the best in class search utilities.  Yet, no one could find anything.  Actually, they found things,  but not what they wanted when they wanted it  and they were never sure they found the “best” or “saw it all”. © Copyright 2008 Dow Jones and Company, Inc. 6
  • 7. Once upon a time…  The C-level executives became a bit more irritated.  Everyone was a bit frustrated.  What was missing? © Copyright 2008 Dow Jones and Company, Inc. 7
  • 8. Optimized?  Is the search utility optimized using all the bells and whistles it came with?  Relevancy rankings  “Thesaurus” files (synonym lists)  Multi-lingual capabilities  Common searches saved and presented to users  Logs reviewed to understand user issues © Copyright 2008 Dow Jones and Company, Inc. 8
  • 9. Usable?  Is the user interface considerate to users?  Was it designed with YOUR users in mind  Designed for occasional users?  Designed for power users?  Was it designed with YOUR business in mind  Task-based views for context sensitive searches  Present results in a format readily used within work flows © Copyright 2008 Dow Jones and Company, Inc. 9
  • 10. Metadata?  Are there required metadata fields within the CMS?  Author, Title, Language, Topic, Product/Service, etc  Are the entry values to those fields controlled?  Lookups against authority files, taxonomies, thesauri  Does the search utility support fielded searches?  Does the search utility weight terms within metadata fields higher than free-text? © Copyright 2008 Dow Jones and Company, Inc. 10
  • 11. Metadata?  For example:  If a financial analyst enters the query term “stock” within the company’s knowledge base,  Will he get back results with the documents specifically discussing “stock” as a financial instrument listed first?  Or will he have to look through 100’s of documents discussing what’s relevant to him as well as every document that references free-text in the body of the document about:  soup stock (food industry),  cows (livestock industry),  or stock car racing (professional sports industry)? © Copyright 2008 Dow Jones and Company, Inc. 11
  • 12. Metadata?  Precise and comprehensive searches  Only if controlled vocabularies have been used to populate metadata fields AND  The search utility takes advantage of that by giving priority to query term occurrence within controlled value metadata fields OR  Fielded searches are enabled  e.g. <Author = Smith> + <Service = Consulting> + <Industry = Automotive> + <Date = January 2006> + <Content Type = Proposal> © Copyright 2008 Dow Jones and Company, Inc. 12
  • 13. Challenges: Controlled versus Uncontrolled © Copyright 2008 Dow Jones and Company, Inc.
  • 14. Controlled Vocabularies Explained  Authority files  e.g. Company’s active directory, ISO standard for Languages  Typically a flat list of allowed values  Taxonomies  e.g. Linnaean Classification (kingdom, phylum, class, order, family, genus, and species )  Typically includes only hierarchical relationships between terms  Thesauri  e.g. NASA Thesaurus (http://www.sti.nasa.gov/thesfrm1.htm)  Includes full set of semantic relationships defined between terms (hierarchical, associative, equivalence) © Copyright 2008 Dow Jones and Company, Inc. 14
  • 15. NASA Thesaurus – Sample Entry © Copyright 2008 Dow Jones and Company, Inc. 15
  • 16. Semantic Relationships  Hierarchical  Superordination - representing a class or a whole, and subordination - referring to members or parts  e.g. mammals and vertebrates  e.g. cherry pie and cherry pie slices  Equivalence  One concept expressed by two or more terms  e.g. dogs and canines  Associative  Terms that are conceptually linked, but not through hierarchy or equivalence  e.g. accounting and accountant © Copyright 2008 Dow Jones and Company, Inc. 16
  • 17. Challenges – Uncontrolled Vocabularies  Uncontrolled vocabularies are:  Comprehensive but noisy  Only comprehensive if synonym lists are used  Limited in their precision and relevancy  Time lost scanning through hundreds of “miss” hits  Reduced effectiveness of cross-repository searches  Limited ways to disambiguate ‘soup stock’ from ‘stock car’ © Copyright 2008 Dow Jones and Company, Inc. 17
  • 18. Challenges - Controlled Vocabularies  Controlled vocabularies can produce:  Potentially significant overhead effort (manual and technical)  Organizational politics can add YEARS to establishing an initial set of controlled vocabularies  A lack of basic understanding of what the controlled vocabularies are and how they work impedes effective development and utilization © Copyright 2008 Dow Jones and Company, Inc. 18
  • 19. Challenges - Controlled Vocabularies  Controlled vocabularies:  Richness and power comes from a full set of semantic relationships, not just hierarchical ones  Hierarchy supports the ability to narrow and broaden search queries  Association supports “did you mean” and “you might also want to look at”  Equivalence enables the use of familiar language to retrieve content which is conceptually on target but never uses their term  e.g. user enters dog and search utility expands query to include “canine, k-9, puppy” © Copyright 2008 Dow Jones and Company, Inc. 19
  • 20. Challenges - Controlled Vocabularies  Controlled vocabularies:  Richness and power comes at the cost of added complexity of development, implementation, integration and maintenance  Utilization of controlled vocabularies can produce performance issues  During search index creation  During query run time © Copyright 2008 Dow Jones and Company, Inc. 20
  • 21. Tackling the Challenges © Copyright 2008 Dow Jones and Company, Inc.
  • 22. Strategy – Creation and Maintenance  State the business case clearly  Benefits  Reduced time for knowledge discovery  Increased richness of knowledge discovery  Decreased risk to firm of making business decisions with partial information  Scope  One business unit or enterprise-wide?  Resource requirements  Skill sets (IS, IT, business knowledge)  Time commitment © Copyright 2008 Dow Jones and Company, Inc. 22
  • 23. Strategy – Creation and Maintenance  Tackle organizational politics head-on  Gain credibility and ensure usability by establishing a cross-functional working committee that will become the Review Committee  Include all major stakeholder groups and any interested parties (even the non-supporters)  Establish methods of broadly soliciting end-user input that will become a source of change requests during maintenance phases © Copyright 2008 Dow Jones and Company, Inc. 23
  • 24. Strategy – Creation and Maintenance  Additional considerations before you start:  How rigorous does it need to be?  What external standards should be adopted?  ANSI/NISO Z39.19-2005  British Standard – BS 8723  What internal standards should be developed?  Editorial Guidelines  Usage Guidelines  How extensive will it be?  Depth and breadth within and across facets  What about adaptability and flexibility  Will there be a need for local extensions? © Copyright 2008 Dow Jones and Company, Inc. 24
  • 25. Strategy – Creation and Maintenance  Additional considerations before you start:  Projected frequency of revisions  How quickly does the content base change with respect to concepts; is there significant content drift?  How volatile is the language?  Management consulting vs. accounting  Vocabulary Management Software  DON’T spend money just to spend money  However, you CAN’T manage controlled vocabularies in a spreadsheet  Buy the tool you need based on your documented functional requirements © Copyright 2008 Dow Jones and Company, Inc. 25
  • 26. Strategy – Integration Choices  Performance trade-offs  Store UIDs within content, then use look-up table at query run time  Store full-text of a term, then touch all content when taxonomy value changes (must re-assign new term value)  Version control  Use static versions of controlled vocabularies within CMS and search utilities, releasing new versions periodically  Use dynamic version of controlled vocabularies with continuous revisions occurring © Copyright 2008 Dow Jones and Company, Inc. 26
  • 27. Strategy – Integration Choices  Utilizing semantic relationships  Store full set (term values or UIDs) within content record OR  Store single UID and have search utility use reference tables to determine related terms  Display of semantic relationships  User interface considerations for effective presentation of non-hierarchically related terms © Copyright 2008 Dow Jones and Company, Inc. 27
  • 28. Strategy – Integration Choices Query entry (including ability to broaden or narrow current search results) Previous query statement user entered Related topics Browse navigation plus any auto-expansion done by engine (defined through options Associative relationships) Query results listing © Copyright 2008 Dow Jones and Company, Inc. 28
  • 29. Strategy – Multiple Applications  Expanding the adoption and use of controlled vocabularies  Know the business objectives of the applications  In conjunction with the search utility, does the controlled vocabulary enable this objective?  Are there metadata fields available within current application for the controlled vocabulary?  Does the business have resources to assign the controlled vocabulary?  What format does the controlled vocabulary need to be in to be integrated with the application? © Copyright 2008 Dow Jones and Company, Inc. 29
  • 30. Strategy – Multiple Applications  Additional considerations  Will there be conflicting version management needs?  How does search currently index these applications and will that change with the use of controlled vocabularies? © Copyright 2008 Dow Jones and Company, Inc. 30
  • 31. Five Key Points 1. Controlled vocabularies are a lever to improve precision and comprehensiveness 2. Controlled vocabularies are never finished – they are always a work in process 3. Search utilities can only be tweaked so far 4. Tapping into the richness of the semantic relationships between terms can be extremely powerful 5. There are lots of options for implementing and integrating controlled vocabularies © Copyright 2008 Dow Jones and Company, Inc. 31
  • 32. Thank you for your attention! Ian Davis ian.davis@dowjones.com © Copyright 2008 Dow Jones and Company, Inc.