Weitere ähnliche Inhalte Ähnlich wie Putting Controlled Vocabulary To Work I Davis 2008 (20) Kürzlich hochgeladen (20) Putting Controlled Vocabulary To Work I Davis 20081. Putting Structured Business Vocabularies to Work
November 4, 2008
Data Management and Information Quality Conference
IRM UK
Ian Davis
Global Project Manger, Dow Jones & Company
© Copyright 2008 Dow Jones and Company, Inc.
2. What we’ll cover today:
Understanding the challenges of controlled versus
uncontrolled vocabularies
Developing a strategy to create and maintain
controlled vocabularies
Identifying how you want to integrate your controlled
vocabularies into your systems
Understanding the requirements of integrating
controlled vocabularies into multiple applications
© Copyright 2008 Dow Jones and Company, Inc. 2
4. Once upon a time…
Most of the business was IT enabled.
There was some degree of “sharing” of information
and content, there were even some large, well
structured document repositories.
Yet, no one could find anything.
Actually, they found things,
but not what they wanted when they wanted it
and they were never sure they found the “best” or “saw
it all”.
© Copyright 2008 Dow Jones and Company, Inc. 4
5. Once upon a time…
The C-level executives were a bit irritated.
They’d spent lots on the technology
and people really weren’t much more efficient,
the pinch point in the workflow had simply
moved further downstream.
So, what happened next?
© Copyright 2008 Dow Jones and Company, Inc. 5
6. Once upon a time…
They SPENT <more> MONEY and bought the
best in class search utilities.
Yet, no one could find anything.
Actually, they found things,
but not what they wanted when they wanted it
and they were never sure they found the “best”
or “saw it all”.
© Copyright 2008 Dow Jones and Company, Inc. 6
7. Once upon a time…
The C-level executives became a bit more
irritated.
Everyone was a bit frustrated.
What was missing?
© Copyright 2008 Dow Jones and Company, Inc. 7
8. Optimized?
Is the search utility optimized using all the
bells and whistles it came with?
Relevancy rankings
“Thesaurus” files (synonym lists)
Multi-lingual capabilities
Common searches saved and presented to
users
Logs reviewed to understand user issues
© Copyright 2008 Dow Jones and Company, Inc. 8
9. Usable?
Is the user interface considerate to users?
Was it designed with YOUR users in mind
Designed for occasional users?
Designed for power users?
Was it designed with YOUR business in mind
Task-based views for context sensitive
searches
Present results in a format readily used
within work flows
© Copyright 2008 Dow Jones and Company, Inc. 9
10. Metadata?
Are there required metadata fields within the CMS?
Author, Title, Language, Topic, Product/Service, etc
Are the entry values to those fields controlled?
Lookups against authority files, taxonomies, thesauri
Does the search utility support fielded searches?
Does the search utility weight terms within metadata
fields higher than free-text?
© Copyright 2008 Dow Jones and Company, Inc. 10
11. Metadata?
For example:
If a financial analyst enters the query term “stock”
within the company’s knowledge base,
Will he get back results with the documents
specifically discussing “stock” as a financial
instrument listed first?
Or will he have to look through 100’s of documents
discussing what’s relevant to him as well as every
document that references free-text in the body of
the document about:
soup stock (food industry),
cows (livestock industry),
or stock car racing (professional sports industry)?
© Copyright 2008 Dow Jones and Company, Inc. 11
12. Metadata?
Precise and comprehensive searches
Only if controlled vocabularies have been used to
populate metadata fields
AND
The search utility takes advantage of that by giving
priority to query term occurrence within controlled
value metadata fields
OR
Fielded searches are enabled
e.g. <Author = Smith> + <Service = Consulting> +
<Industry = Automotive> + <Date = January 2006>
+ <Content Type = Proposal>
© Copyright 2008 Dow Jones and Company, Inc. 12
13. Challenges:
Controlled versus Uncontrolled
© Copyright 2008 Dow Jones and Company, Inc.
14. Controlled Vocabularies Explained
Authority files
e.g. Company’s active directory, ISO standard for Languages
Typically a flat list of allowed values
Taxonomies
e.g. Linnaean Classification (kingdom, phylum, class, order,
family, genus, and species )
Typically includes only hierarchical relationships between terms
Thesauri
e.g. NASA Thesaurus (http://www.sti.nasa.gov/thesfrm1.htm)
Includes full set of semantic relationships defined between terms
(hierarchical, associative, equivalence)
© Copyright 2008 Dow Jones and Company, Inc. 14
15. NASA Thesaurus – Sample Entry
© Copyright 2008 Dow Jones and Company, Inc. 15
16. Semantic Relationships
Hierarchical
Superordination - representing a class or a whole, and
subordination - referring to members or parts
e.g. mammals and vertebrates
e.g. cherry pie and cherry pie slices
Equivalence
One concept expressed by two or more terms
e.g. dogs and canines
Associative
Terms that are conceptually linked, but not through
hierarchy or equivalence
e.g. accounting and accountant
© Copyright 2008 Dow Jones and Company, Inc. 16
17. Challenges – Uncontrolled Vocabularies
Uncontrolled vocabularies are:
Comprehensive but noisy
Only comprehensive if synonym lists are
used
Limited in their precision and relevancy
Time lost scanning through hundreds of
“miss” hits
Reduced effectiveness of cross-repository
searches
Limited ways to disambiguate ‘soup stock’
from ‘stock car’
© Copyright 2008 Dow Jones and Company, Inc. 17
18. Challenges - Controlled Vocabularies
Controlled vocabularies can produce:
Potentially significant overhead effort (manual
and technical)
Organizational politics can add YEARS to
establishing an initial set of controlled
vocabularies
A lack of basic understanding of what the
controlled vocabularies are and how they work
impedes effective development and utilization
© Copyright 2008 Dow Jones and Company, Inc. 18
19. Challenges - Controlled Vocabularies
Controlled vocabularies:
Richness and power comes from a full set of semantic
relationships, not just hierarchical ones
Hierarchy supports the ability to narrow and broaden
search queries
Association supports “did you mean” and “you might
also want to look at”
Equivalence enables the use of familiar language to
retrieve content which is conceptually on target but
never uses their term
e.g. user enters dog and search utility expands
query to include “canine, k-9, puppy”
© Copyright 2008 Dow Jones and Company, Inc. 19
20. Challenges - Controlled Vocabularies
Controlled vocabularies:
Richness and power comes at the cost of
added complexity of development,
implementation, integration and maintenance
Utilization of controlled vocabularies can
produce performance issues
During search index creation
During query run time
© Copyright 2008 Dow Jones and Company, Inc. 20
22. Strategy – Creation and Maintenance
State the business case clearly
Benefits
Reduced time for knowledge discovery
Increased richness of knowledge discovery
Decreased risk to firm of making business
decisions with partial information
Scope
One business unit or enterprise-wide?
Resource requirements
Skill sets (IS, IT, business knowledge)
Time commitment
© Copyright 2008 Dow Jones and Company, Inc. 22
23. Strategy – Creation and Maintenance
Tackle organizational politics head-on
Gain credibility and ensure usability by establishing a
cross-functional working committee that will become
the Review Committee
Include all major stakeholder groups and any
interested parties (even the non-supporters)
Establish methods of broadly soliciting end-user input
that will become a source of change requests during
maintenance phases
© Copyright 2008 Dow Jones and Company, Inc. 23
24. Strategy – Creation and Maintenance
Additional considerations before you start:
How rigorous does it need to be?
What external standards should be adopted?
ANSI/NISO Z39.19-2005
British Standard – BS 8723
What internal standards should be developed?
Editorial Guidelines
Usage Guidelines
How extensive will it be?
Depth and breadth within and across facets
What about adaptability and flexibility
Will there be a need for local extensions?
© Copyright 2008 Dow Jones and Company, Inc. 24
25. Strategy – Creation and Maintenance
Additional considerations before you start:
Projected frequency of revisions
How quickly does the content base change with
respect to concepts; is there significant content
drift?
How volatile is the language?
Management consulting vs. accounting
Vocabulary Management Software
DON’T spend money just to spend money
However, you CAN’T manage controlled
vocabularies in a spreadsheet
Buy the tool you need based on your documented
functional requirements
© Copyright 2008 Dow Jones and Company, Inc. 25
26. Strategy – Integration Choices
Performance trade-offs
Store UIDs within content, then use look-up table at
query run time
Store full-text of a term, then touch all content when
taxonomy value changes (must re-assign new term
value)
Version control
Use static versions of controlled vocabularies within
CMS and search utilities, releasing new versions
periodically
Use dynamic version of controlled vocabularies with
continuous revisions occurring
© Copyright 2008 Dow Jones and Company, Inc. 26
27. Strategy – Integration Choices
Utilizing semantic relationships
Store full set (term values or UIDs) within
content record
OR
Store single UID and have search utility use
reference tables to determine related terms
Display of semantic relationships
User interface considerations for effective
presentation of non-hierarchically related terms
© Copyright 2008 Dow Jones and Company, Inc. 27
28. Strategy – Integration Choices
Query entry
(including ability to broaden or
narrow current search results)
Previous query statement user entered Related topics
Browse navigation plus any auto-expansion done by engine (defined through
options Associative
relationships)
Query results listing
© Copyright 2008 Dow Jones and Company, Inc. 28
29. Strategy – Multiple Applications
Expanding the adoption and use of controlled
vocabularies
Know the business objectives of the applications
In conjunction with the search utility, does the
controlled vocabulary enable this objective?
Are there metadata fields available within current
application for the controlled vocabulary?
Does the business have resources to assign the
controlled vocabulary?
What format does the controlled vocabulary need to be
in to be integrated with the application?
© Copyright 2008 Dow Jones and Company, Inc. 29
30. Strategy – Multiple Applications
Additional considerations
Will there be conflicting version management
needs?
How does search currently index these
applications and will that change with the use
of controlled vocabularies?
© Copyright 2008 Dow Jones and Company, Inc. 30
31. Five Key Points
1. Controlled vocabularies are a lever to improve
precision and comprehensiveness
2. Controlled vocabularies are never finished – they are
always a work in process
3. Search utilities can only be tweaked so far
4. Tapping into the richness of the semantic
relationships between terms can be extremely
powerful
5. There are lots of options for implementing and
integrating controlled vocabularies
© Copyright 2008 Dow Jones and Company, Inc. 31
32. Thank you for your attention!
Ian Davis
ian.davis@dowjones.com
© Copyright 2008 Dow Jones and Company, Inc.