SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Downloaden Sie, um offline zu lesen
Mapping, Merging, and
Multilingual Taxonomies

Heather Hedden
Taxonomy Consultant
Hedden Information Management

SLA 2012 Hedden Information Presentation
    © 2012 Conference
          Management
Heather Hedden
     Taxonomy consultant, Hedden Information Management
     Continuing education instructor with Simmons College Graduate
     School of Library and Information Science
     Author of The Accidental Taxonomist (Information Today, 2010)


Previously worked as:
  Controlled vocabulary editor, IAC/Gale/Cengage Learning
  Internal taxonomy manager for an energy company
  Taxonomy consultant with consulting firms
  Taxonomist in product development at a search software vendor




 2                     © 2012 Hedden Information Management
Agenda

    Background
    Mapping Taxonomies
    Merging Taxonomies
    Multilingual Taxonomies




3                © 2012 Hedden Information Management
Agenda

    Background
    Mapping Taxonomies
    Merging Taxonomies
    Multilingual Taxonomies




4                © 2012 Hedden Information Management
Background: Taxonomies


Controlled Vocabulary/Taxonomy/Thesaurus
    An authoritative, restricted list of terms (words or phrases)
    Each term for a single unambiguous concept
    (synonyms/nonpreferred terms, as cross-references, may be
    included)
    Policies (control) for who, when, and how new terms can be
    added
    Typically has structured relationships between terms
    To support indexing/tagging/metadata management of
    content to facilitate content management and retrieval



5                    © 2012 Hedden Information Management
Faceted Taxonomy
Hierarchical taxonomy                Thesaurus




                                 Examples




                        © 2012 Hedden Information Management
Background:
Mapping, Merging, & Multilingual Taxonomies
Taxonomies/Controlled Vocabularies (CVs) are:
1. Designed
2. Built
3. Maintained/Managed


But in time, a taxonomy may gain additional uses, and may
  need to be:
  Mapped or merged with another taxonomy
  Translated into another language or localized


7                 © 2012 Hedden Information Management
Background:
Mapping, Merging, & Multilingual Taxonomies
Mapping, Merging, and Multilingual Taxonomies:
  Methods of combining taxonomies
  Different methods > Different purposes


     Mapping


     Merging


     Multilingual

8                   © 2012 Hedden Information Management
Agenda

    Background
    Mapping Taxonomies
    Merging Taxonomies
    Multilingual Taxonomies




9                © 2012 Hedden Information Management
Mapping Taxonomies

Mapping:
Enabling one controlled vocabulary
(CV) to be used for another in the
same subject area
  Retain them both as continued
distinct vocabularies.
  A CV continues to be used to
retrieve its content as before, plus
additional content associated with                 Something representing
the other CV.                                      something else

  Mapping tables also called
“crosswalks”
10                 © 2012 Hedden Information Management
Mapping Taxonomies
Situations:
     Selected content with an enterprise taxonomy is made
     available on a public web site with a different public-
     facing taxonomy
     A content provider with a CV partners with a third-party
     information vendor with its own CV
     A provider of scientific/technical/medical content with a
     technical CV creates a simpler CV aimed at laypeople
     Search log query terms need to be integrated into the
     CV as additional nonpreferred (variant/synonym) terms.
     To support “federated search” that involves multiple
     taxonomies

11                   © 2012 Hedden Information Management
Mapping Taxonomies

 From a CV indexed to content to a retrieval/user-interface CV
 Use a software tool or scripts to compare vocabularies, to obtain
 matches in succeeding passes.
 Human review confirms and approves automatically proposed
 matching terms.
 Unmatched terms cannot be utilized.
 Narrower-to-broader matches are fine.
 Set automatic matches to also include matches of words/phrases
 of the retrieval taxonomy within a term from the indexing CV.
     Indexing taxonomy             Retrieval/UI taxonomy
     HDTV Television sets          Television sets

12                 © 2012 Hedden Information Management
Mapping Taxonomies




 Indexing CV in column A. Retrieval CV in column C.
 Taxonomist notes in column B.
 (“ok” is equivalent, “b” means second term is broader so also ok, and “n”
 is narrower or otherwise not acceptable.)
13                      © 2012 Hedden Information Management
Mapping Taxonomies
Mapping user-
entered search
queries (column 2) to
terms, in this case
the term “Type of
Vehicles.”
If terms could be
(narrower) examples
of automobiles, put a
“y” in the
CV_Terms_Y
column. Some terms
are too broad and
vague.

  14                    © 2012 Hedden Information Management
Mapping Taxonomies
Tools for mapping

     In commercial thesaurus/taxonomy software,
     designate a custom equivalence relationship:
        Example: USE-Map / UF-Map (in place of USE/UF)

     Import CSV mapping tables, such as created in Excel




15                  © 2012 Hedden Information Management
Agenda

     Background
     Mapping Taxonomies
     Merging Taxonomies
     Multilingual Taxonomies




16                © 2012 Hedden Information Management
Merging Taxonomies

     Merging:
     Combining two or more
     redundant vocabularies in
     same subject area into one

        Without any longer
        retaining them as distinct
        Legacy content is
        retrieved through added
        equivalence relationships



17                    © 2012 Hedden Information Management
Merging Taxonomies
Situations
     An enterprise taxonomy replaces multiple CVs of
     separate administrative departments
     An organization acquires or merges with another
     organization, and their redundant vocabularies are
     merged
     A folksonomy is incorporated into a CV
     An internally created CV is combined with a
     purchased/licensed CV




18                   © 2012 Hedden Information Management
Merging Taxonomies

Merging – Which Direction?
Designate a dominant/primary CV into which
 to merge the other:

  If an organization acquires another, then the acquirer’s
  CV is dominant.
Or choose:
  The larger CV
  The CV with greater breadth
  The CV with greater depth
  The more structured CV
  The “better” CV

 19                 © 2012 Hedden Information Management
Merging Taxonomies
  Use a software tool or scripts to compare vocabularies,
      to obtain matches in succeeding passes:
Merging CV                        Primary CV (Keep and grows)         Taxonomist
(will go away)                                                        Reviews
Exact matches of:
  Preferred term: Cars              Preferred term: Cars              no need
  Preferred term: Automobiles       Nonpreferred term: Automobiles    no need
                                      USE Cars
  Nonpreferred term: Cars           Preferred term: Cars               yes
   USE Automobiles
  Nonpreferred term: Cars           Nonpreferred term: Cars            yes
   USE Automobiles                    USE Autos
Inexact matches of:
  Preferred term: Automobile        Preferred term: Automobiles        yes


  20                           © 2012 Hedden Information Management
Merging Taxonomies
Can create rules for automatic inexact or "fuzzy”
   matches, then subject to human review:

Match Type:                                   Examples:
hyphens, parentheses, punctuation, and        Healthcare            Health care
spaces
plural/singular                               Teaching method       Teaching methods

common abbreviations and acronyms             and                   &
                                              Dept.                 Department
Word order                                    Photography, digital Digital photography

Addition of specified words (industry,        Healthcare industry   Healthcare services
services, etc.)
Grammatical endings                           Production            Producing


21                           © 2012 Hedden Information Management
Merging Taxonomies
Tools for merging

     Commercial thesaurus/taxonomy software with merge
     vocabularies feature
        Synaptica
        Wordmap
     Custom scripting (Perl, etc.) to compare vocabularies




22                  © 2012 Hedden Information Management
Mapping and Merging Summary


     Mapping
       Overlapping Controlled Vocabularies remain distinct,
       one used for the other in a specific application
       (indexing vs. retrieval CVs)



     Merging
       Overlapping Controlled Vocabularies combined
       permanently, removing duplicates




23             © 2012 Hedden Information Management
Mapping and Merging Summary

 Compare two closely redundant vocabularies side-by-
 side, term-by-term
 First pass is automatic, followed by taxonomist review of
 matches
 Taxonomy software may have the feature, or do your
 own scripting
 Taxonomist reviews, discerns distinction between
 equivalent, broader/narrower, related terms to approve
 matches
 Taxonomist deals with terms more than structure.


                 © 2012 Hedden Information Management
Agenda

     Background
     Mapping Taxonomies
     Merging Taxonomies
     Multilingual Taxonomies
     1.   Multilingual Taxonomy Goals
     2.   Multilingual Taxonomy Design
     3.   Taxonomy Translation Management




25                  © 2012 Hedden Information Management
Multilingual Taxonomy Goals
Bilingual/Multilingual Taxonomies can enable:
1. A user to search and retrieve content that is in multiple languages
    through a single taxonomy in their own language
                     Français

                                 Deutsch                     Taxonomy: Single-
                                                             language user
                                                             interface (UI).
                                               Español
                                                             Multiple language
                                                             translations, not
English-speaking                                             displayed.
user




26                    © 2012 Hedden Information Management
Multilingual Taxonomy Goals

Bilingual/Multilingual Taxonomies can enable:
2. Different users who speak different languages to search the same
    body of content (in one other language), each using a taxonomy in
    the user interface in their native language

                             Spanish
                             speaker                         English


         French
         speaker



         German             Multiple, different
         speaker            language UIs.

27                    © 2012 Hedden Information Management
Multilingual Taxonomy Goals
Bilingual/Multilingual Taxonomies can enable:
3. Different users who speak different languages to search the same
    body of content that is in multiple languages.

                     Spanish                     Français
                     speaker
                                                            Deutsch


        French                                                        Español
        speaker



        German             Multiple, different
        speaker            language UIs.

28                   © 2012 Hedden Information Management
Multilingual Taxonomy Goals

Goals #1 or #2: Users of one language can access content
  in a different language.
     Taxonomy in one language with equivalent translated terms
     The taxonomy needs to function in only one direction.


Goal #3: Multilingual users can access multilingual content.
     Fully multilingual taxonomy or distinct taxonomies for each language
     linked at equivalent-meaning terms
     The taxonomy needs to function in both/all language directions.




29                      © 2012 Hedden Information Management
Multilingual Taxonomy Goals
Different scenario: Multiple language taxonomies, each connected to its
own language content, such as for separate web sites.
                                                             Español

                     Spanish
                     speaker                                           Français




         French                                              Deutsch
         speaker



         German
         speaker       Multiple, different
                       language UIs.
30                    © 2012 Hedden Information Management
Multilingual Taxonomy Design
Design the multilingual taxonomy to meet the taxonomy goals.

 In a one-direction translated taxonomy:
          The language of the searcher has structure to display.
          The language of the content may not need structure.
          Translations may be in one direction (user/display term may be
          used for content/index term, not vice versa).

 For a fully bidirectional multilingual taxonomy:
          Both language taxonomies need structure.
          Translations must be exact matches in both directions.

 For separate taxonomies in different languages:
          Taxonomies are not translated but each created and managed
          separately.
31                      © 2012 Hedden Information Management
Multilingual Taxonomy Design

Dedicated taxonomy/thesaurus management software tools
   provide varying multilingual capabilities.

1.   Customized text field used for term translations
       No vocabulary control of second language(s)

2.   Second language taxonomy mirroring first, linked at each
     translated term
        Vocabulary control of second language(s)
        Copying taxonomy structure of primary language

3.   Multiple taxonomies in different languages linked at
     equivalent term translations
       Each language may have its own structure (requires
       additional work to build)
32                   © 2012 Hedden Information Management
Multilingual Taxonomy Design

1.    Customized field used for term translations

                             Term
                                            Transla-
                                            tion


                Child                        Child
               Term 1       Transla-        Term 2          Transla-
                            tion                            tion


     Grand-       Grand-               Grand-        Grand-
     child 1      child 2        Tr    child 3     T child 4          Transla-
                  tion           tion              tion               tion




33                             © 2012 Hedden Information Management
Multilingual Taxonomy Design
2.     Second language taxonomy mirroring first, linked at each
       translated term. Inter-term relationships replicate.

                       Term
                        Term


           Child                      Child
            Child                      Child
           Term 1                     Term 2
            Term 1                     Term 2


     Grand-     Grand           Grand-          Grand-
      Grand-     Grand-          Grand-          Grand-
     child 1    child 2         child 3         child 4
      child 1    child 2         child 3         child 4




34                         © 2012 Hedden Information Management
Multilingual Taxonomy Design
  3.      Multiple taxonomies in different languages linked at equivalent
          term translations. Inter-term relationships may differ.

                  Term
                   Term                                            Term



                                                     Child          Child     Child
        Child                Child
                                                    Term 1         Term 2     Term
       Term 1               Term 2
                                                                               3


Grand-       Grand-     Grand-
                         Grand
                        Grand-        Grand-         Grand-         Grand-    Grand-
child 1      child 2    child 3
                         child 2
                        child 3       child 4
                                            3        child 3        child 4   child 5




  35                        © 2012 Hedden Information Management
Multilingual Taxonomy Design & Tools

Dedicated taxonomy/thesaurus management
software tool screenshot examples from:

     Data Harmony Thesaurus Master (Access Innovations, Inc.)
     Synaptica (Synaptica, LLC)
     MultiTes (Multisystems)
     Semaphore Ontology Manager (Smartlogic)

Additional tools also provide similar capabilities.




36                      © 2012 Hedden Information Management
Multilingual Taxonomy Design & Tools



                                                    Method #1:
                                                    Create user-defined
                                                    text field and enter
                                                    translation


                                                    Data Harmony
                                                    Thesaurus Master




37           © 2012 Hedden Information Management
Multilingual Taxonomy Design & Tools




                                                    Method #1

                                                    Synaptica


38           © 2012 Hedden Information Management
Multilingual Taxonomy Design & Tools
                      Method #2: Create second language
                      taxonomy mirroring first, linked at each
                      translated term. Inter-term relationships
                      replicate.

                      MultiTes




39           © 2012 Hedden Information Management
Multilingual Taxonomy Design & Tools
                             Method #2:
                             Smartlogic Semaphore Ontology
                             Manager




40           © 2012 Hedden Information Management
Multilingual Taxonomy Design & Tools



                                                    Method #3: Link
                                                    equivalent terms in
                                                    different language
                                                    by user-defined
                                                    associative
                                                    relationship.

                                                    Synaptica




41           © 2012 Hedden Information Management
Multilingual Taxonomy Design
     Translations of a term may display as another kind of
     relationship.
     Similar to equivalence, but both languages are preferred
     and none is nonpreferred




From the bilingual European Training Thesaurus http://libserver.cedefop.europa.eu/ett


42                         © 2012 Hedden Information Management
Taxonomy Translation Management

     Taxonomy translations are typically created from scratch,
     translating each term.
     It is also possible to map and existing/separately created
     foreign language taxonomies to another, if their coverage is
     nearly identical.

     For Goals #1 or #2 (Users of one language accessing content
     in a different language) translations may suffice
     For Goal #3 (Multilingual users accessing multilingual content)
     mapping separately created taxonomies in each language is
     better.



43                    © 2012 Hedden Information Management
Taxonomy Translation Management
     User interface taxonomies in one language may be mapped to
     indexing taxonomies in another language.
        The retrieval taxonomy is in the language of the searcher.
        The indexing taxonomy is in the language of the content.

     The role of the different language taxonomies is typically dynamic
        depending on the language of the user
        depending on the language of the content
     The taxonomy of either language could be the retrieval taxonomy or
     the indexing taxonomy.


     Mapping has to go in both directions.
     Matches between terms in both languages have to be exact
     translations.
44                      © 2012 Hedden Information Management
Taxonomy Translation Management

     Matches are for concepts, not terms.
       Translations are for the concept and not necessarily
       for the preferred term.



     Nonpreferred (variant/synonym) terms may vary.
       Some can be translated
       Some cannot be translated
       Additional nonpreferred terms may be created in the
       second language(s)

45                  © 2012 Hedden Information Management
Taxonomy Translation Management

Translating taxonomies/thesauri is different from translating
  documents.

     Pay by hour/project, not by word.
     Translators should have experience with translating in
     both directions.
     Translators should be familiar with using taxonomies, if
     not also taxonomists.
     If not using a translator who is also a taxonomist, have a
     taxonomist/information-specialist native speaker of target
     languages review the translated taxonomy.

46                   © 2012 Hedden Information Management
Taxonomy Translation Management

Taxonomy Translation Issues
     Lack of an equivalent translation
     A term in one language having two meanings with two
     terms in another language
     (e.g. seguridad = safety or security)
     Term length
     Use of definite articles
     Use of abbreviations
     Use of plural
     Use of capitalization
     Alphabetizing sorting rules



47                  © 2012 Hedden Information Management
Taxonomy Translation Management

Translation projects end, but taxonomy management
  does not.
Taxonomy management issues:
  Taxonomy growth
  Taxonomy change
  Taxonomy management/ownership responsibility
  Merging or combining additional taxonomies

     Translations/additional language versions will need
     frequent reviewing and updating.

48                 © 2012 Hedden Information Management
Conclusions

     Mapping Taxonomies
     Merging Taxonomies
     Making Multilingual Taxonomies

In all cases:
   Need to be pro-active and anticipate and plan for
   the future
   Need to bring in additional experts: subject matter
   experts, technology experts, translators


49                © 2012 Hedden Information Management
Additional Taxonomy Resources/Training
Book: The Accidental Taxonomist
2010, Information Today, Inc.
www.accidental-taxonomist.com

Taxonomies & Controlled Vocabularies 5-week online workshop
Simmons College Graduate School of Library & Information Science
Starting November, 2012, and January, 2013
http://alanis.simmons.edu/ceweb

SLA Taxonomy Division
http://taxonomy.sla.org




                     © 2012 Hedden Information Management
Contact




Heather Hedden
Hedden Information Management
Carlisle, MA
heather@hedden.net
www.hedden-information.com
accidental-taxonomist.blogspot.com
Twitter: @hhedden
978-467-5195




                       © 2012 Hedden Information Management

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Controlled Vocabulary
Introduction to Controlled VocabularyIntroduction to Controlled Vocabulary
Introduction to Controlled VocabularyRebecca Thompson
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPDatabricks
 
DBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, DublinDBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, Dublinm_ackermann
 
DAMA Feb2015 Mastering Master Data
DAMA Feb2015 Mastering Master DataDAMA Feb2015 Mastering Master Data
DAMA Feb2015 Mastering Master DataMary Levins, PMP
 
ONTOLOGY BASED DATA ACCESS
ONTOLOGY BASED DATA ACCESSONTOLOGY BASED DATA ACCESS
ONTOLOGY BASED DATA ACCESSKishan Patel
 
Data visualization with sql analytics
Data visualization with sql analyticsData visualization with sql analytics
Data visualization with sql analyticsDatabricks
 
Migrating data: How to reduce risk
Migrating data: How to reduce riskMigrating data: How to reduce risk
Migrating data: How to reduce riskETLSolutions
 
Building a Data Quality Program from Scratch
Building a Data Quality Program from ScratchBuilding a Data Quality Program from Scratch
Building a Data Quality Program from Scratchdmurph4
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataAn introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataFabien Gandon
 
Big Data Information Architecture Powerpoint Presentation Slide
Big Data Information Architecture Powerpoint Presentation SlideBig Data Information Architecture Powerpoint Presentation Slide
Big Data Information Architecture Powerpoint Presentation SlideSlideTeam
 
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018 Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018 Antonios Chatzipavlis
 
How to Prepare for a BI Migration
How to Prepare for a BI MigrationHow to Prepare for a BI Migration
How to Prepare for a BI MigrationSenturus
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Spring camp 발표자료
Spring camp 발표자료Spring camp 발표자료
Spring camp 발표자료수홍 이
 
Enterprise Data Governance for Financial Institutions
Enterprise Data Governance for Financial InstitutionsEnterprise Data Governance for Financial Institutions
Enterprise Data Governance for Financial InstitutionsSheldon McCarthy
 

Was ist angesagt? (20)

Taxonomies for Users
Taxonomies for UsersTaxonomies for Users
Taxonomies for Users
 
Introduction to Controlled Vocabulary
Introduction to Controlled VocabularyIntroduction to Controlled Vocabulary
Introduction to Controlled Vocabulary
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
 
DBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, DublinDBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, Dublin
 
Taxonomies and Metadata
Taxonomies and MetadataTaxonomies and Metadata
Taxonomies and Metadata
 
DAMA Feb2015 Mastering Master Data
DAMA Feb2015 Mastering Master DataDAMA Feb2015 Mastering Master Data
DAMA Feb2015 Mastering Master Data
 
ONTOLOGY BASED DATA ACCESS
ONTOLOGY BASED DATA ACCESSONTOLOGY BASED DATA ACCESS
ONTOLOGY BASED DATA ACCESS
 
Data visualization with sql analytics
Data visualization with sql analyticsData visualization with sql analytics
Data visualization with sql analytics
 
Making a decision between Liferay and Drupal
Making a decision between Liferay and DrupalMaking a decision between Liferay and Drupal
Making a decision between Liferay and Drupal
 
Migrating data: How to reduce risk
Migrating data: How to reduce riskMigrating data: How to reduce risk
Migrating data: How to reduce risk
 
Building a Data Quality Program from Scratch
Building a Data Quality Program from ScratchBuilding a Data Quality Program from Scratch
Building a Data Quality Program from Scratch
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataAn introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked Data
 
Big Data Information Architecture Powerpoint Presentation Slide
Big Data Information Architecture Powerpoint Presentation SlideBig Data Information Architecture Powerpoint Presentation Slide
Big Data Information Architecture Powerpoint Presentation Slide
 
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018 Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
 
How to Prepare for a BI Migration
How to Prepare for a BI MigrationHow to Prepare for a BI Migration
How to Prepare for a BI Migration
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Data Vault and DW2.0
Data Vault and DW2.0Data Vault and DW2.0
Data Vault and DW2.0
 
DataHub
DataHubDataHub
DataHub
 
Spring camp 발표자료
Spring camp 발표자료Spring camp 발표자료
Spring camp 발표자료
 
Enterprise Data Governance for Financial Institutions
Enterprise Data Governance for Financial InstitutionsEnterprise Data Governance for Financial Institutions
Enterprise Data Governance for Financial Institutions
 

Andere mochten auch

Taxonomies and Folksonomies
Taxonomies and FolksonomiesTaxonomies and Folksonomies
Taxonomies and FolksonomiesHeather Hedden
 
Taxonomy, ontology, folksonomies & SKOS.
Taxonomy, ontology, folksonomies & SKOS.Taxonomy, ontology, folksonomies & SKOS.
Taxonomy, ontology, folksonomies & SKOS.Janet Leu
 
Seth Earley Talks About Enterprise Information Architecture
Seth Earley Talks About Enterprise Information ArchitectureSeth Earley Talks About Enterprise Information Architecture
Seth Earley Talks About Enterprise Information ArchitectureEarley Information Science
 
A Survey: Taxonomy Building Tools
A Survey: Taxonomy Building ToolsA Survey: Taxonomy Building Tools
A Survey: Taxonomy Building ToolsRachel Lovinger
 
Creating Taxonomies: Methods and Processes
Creating Taxonomies: Methods and ProcessesCreating Taxonomies: Methods and Processes
Creating Taxonomies: Methods and ProcessesFred Leise
 
Metadata and ontologies
Metadata and ontologiesMetadata and ontologies
Metadata and ontologiesDavid Lamas
 
Taxonomies for Text Analytics and Auto-indexing
Taxonomies for Text Analytics and Auto-indexingTaxonomies for Text Analytics and Auto-indexing
Taxonomies for Text Analytics and Auto-indexingHeather Hedden
 
Learning Strategies
Learning StrategiesLearning Strategies
Learning Strategiesgueste21f806
 
Taxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information ArchitectureTaxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information ArchitectureAccess Innovations, Inc.
 

Andere mochten auch (16)

Taxonomy Interoperability Standards
Taxonomy Interoperability StandardsTaxonomy Interoperability Standards
Taxonomy Interoperability Standards
 
Taxonomies and Folksonomies
Taxonomies and FolksonomiesTaxonomies and Folksonomies
Taxonomies and Folksonomies
 
Taxonomy, ontology, folksonomies & SKOS.
Taxonomy, ontology, folksonomies & SKOS.Taxonomy, ontology, folksonomies & SKOS.
Taxonomy, ontology, folksonomies & SKOS.
 
Tools for Taxonomies
Tools for TaxonomiesTools for Taxonomies
Tools for Taxonomies
 
Seth Earley Talks About Enterprise Information Architecture
Seth Earley Talks About Enterprise Information ArchitectureSeth Earley Talks About Enterprise Information Architecture
Seth Earley Talks About Enterprise Information Architecture
 
Orchestrated Content
Orchestrated ContentOrchestrated Content
Orchestrated Content
 
A Survey: Taxonomy Building Tools
A Survey: Taxonomy Building ToolsA Survey: Taxonomy Building Tools
A Survey: Taxonomy Building Tools
 
Creating Taxonomies: Methods and Processes
Creating Taxonomies: Methods and ProcessesCreating Taxonomies: Methods and Processes
Creating Taxonomies: Methods and Processes
 
SKOS in a nutshell
SKOS in a nutshellSKOS in a nutshell
SKOS in a nutshell
 
Taxonomy Governance and Iteration
Taxonomy Governance and IterationTaxonomy Governance and Iteration
Taxonomy Governance and Iteration
 
Testing Taxonomies
Testing TaxonomiesTesting Taxonomies
Testing Taxonomies
 
Metadata and ontologies
Metadata and ontologiesMetadata and ontologies
Metadata and ontologies
 
Taxonomies for Text Analytics and Auto-indexing
Taxonomies for Text Analytics and Auto-indexingTaxonomies for Text Analytics and Auto-indexing
Taxonomies for Text Analytics and Auto-indexing
 
Taxonomy And Metadata
Taxonomy And MetadataTaxonomy And Metadata
Taxonomy And Metadata
 
Learning Strategies
Learning StrategiesLearning Strategies
Learning Strategies
 
Taxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information ArchitectureTaxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information Architecture
 

Ähnlich wie Mapping, Merging, and Multilingual Taxonomies

Taxonomies for Human vs Auto-Indexing
Taxonomies for Human vs Auto-IndexingTaxonomies for Human vs Auto-Indexing
Taxonomies for Human vs Auto-IndexingHeather Hedden
 
Terminology management as fitness v.2 iti
Terminology management as fitness v.2 itiTerminology management as fitness v.2 iti
Terminology management as fitness v.2 itiITIRussia
 
Challenges of Agile Software Development
Challenges of Agile Software DevelopmentChallenges of Agile Software Development
Challenges of Agile Software DevelopmentWei (Terence) Li
 
Mapping Taxonomies, Thesauri, and Ontologies
Mapping Taxonomies, Thesauri, and OntologiesMapping Taxonomies, Thesauri, and Ontologies
Mapping Taxonomies, Thesauri, and OntologiesHeather Hedden
 
Arch CoP - Domain Driven Design.pptx
Arch CoP - Domain Driven Design.pptxArch CoP - Domain Driven Design.pptx
Arch CoP - Domain Driven Design.pptxSanjoy Kumar Roy
 
C* Summit 2013: Adaptive Data Convergence for Life Sciences by Manish Sood
C* Summit 2013: Adaptive Data Convergence for Life Sciences by Manish SoodC* Summit 2013: Adaptive Data Convergence for Life Sciences by Manish Sood
C* Summit 2013: Adaptive Data Convergence for Life Sciences by Manish SoodDataStax Academy
 
Semic 2012 highlights report
Semic 2012 highlights report Semic 2012 highlights report
Semic 2012 highlights report Semic.eu
 
Case Study: Data Harmony Custom Features as Implemented for Triumph Learning
Case Study:  Data Harmony Custom Features as Implemented for Triumph LearningCase Study:  Data Harmony Custom Features as Implemented for Triumph Learning
Case Study: Data Harmony Custom Features as Implemented for Triumph LearningAccess Innovations, Inc.
 
Terminology Management
Terminology ManagementTerminology Management
Terminology ManagementUwe Muegge
 
Taxonomies for E-commerce
Taxonomies for E-commerceTaxonomies for E-commerce
Taxonomies for E-commerceHeather Hedden
 
Presentation businesscase
Presentation businesscasePresentation businesscase
Presentation businesscasebeezelbub
 
BLI Learning Leaders Symposium - Bersin Trends in Learning
BLI Learning Leaders Symposium - Bersin Trends in LearningBLI Learning Leaders Symposium - Bersin Trends in Learning
BLI Learning Leaders Symposium - Bersin Trends in LearningBusiness Learning Institute
 
Veda Semantics - introduction document
Veda Semantics - introduction documentVeda Semantics - introduction document
Veda Semantics - introduction documentrajatkr
 
18. Alessandro Cattelan (Translated) Terminology
18. Alessandro Cattelan (Translated) Terminology18. Alessandro Cattelan (Translated) Terminology
18. Alessandro Cattelan (Translated) TerminologyRIILP
 
Synonyms, Alternative Labels, and Nonpreferred Terms
Synonyms, Alternative Labels, and Nonpreferred TermsSynonyms, Alternative Labels, and Nonpreferred Terms
Synonyms, Alternative Labels, and Nonpreferred TermsHeather Hedden
 
Dynamic Potential of Semantic Enrichment
Dynamic Potential of Semantic EnrichmentDynamic Potential of Semantic Enrichment
Dynamic Potential of Semantic Enrichmentpharley
 
Forrester Hr Wave 2012
Forrester Hr Wave 2012Forrester Hr Wave 2012
Forrester Hr Wave 2012jlindley
 
Forrester Wave Human Resource Management Systems Q1 2012
Forrester Wave Human Resource Management Systems Q1 2012Forrester Wave Human Resource Management Systems Q1 2012
Forrester Wave Human Resource Management Systems Q1 2012JYack
 

Ähnlich wie Mapping, Merging, and Multilingual Taxonomies (20)

Taxonomies for Human vs Auto-Indexing
Taxonomies for Human vs Auto-IndexingTaxonomies for Human vs Auto-Indexing
Taxonomies for Human vs Auto-Indexing
 
Closing the Gap between Corpora and Termbases, CHAT2013
Closing the Gap between Corpora and Termbases, CHAT2013Closing the Gap between Corpora and Termbases, CHAT2013
Closing the Gap between Corpora and Termbases, CHAT2013
 
Terminology management as fitness v.2 iti
Terminology management as fitness v.2 itiTerminology management as fitness v.2 iti
Terminology management as fitness v.2 iti
 
Challenges of Agile Software Development
Challenges of Agile Software DevelopmentChallenges of Agile Software Development
Challenges of Agile Software Development
 
Mapping Taxonomies, Thesauri, and Ontologies
Mapping Taxonomies, Thesauri, and OntologiesMapping Taxonomies, Thesauri, and Ontologies
Mapping Taxonomies, Thesauri, and Ontologies
 
Arch CoP - Domain Driven Design.pptx
Arch CoP - Domain Driven Design.pptxArch CoP - Domain Driven Design.pptx
Arch CoP - Domain Driven Design.pptx
 
C* Summit 2013: Adaptive Data Convergence for Life Sciences by Manish Sood
C* Summit 2013: Adaptive Data Convergence for Life Sciences by Manish SoodC* Summit 2013: Adaptive Data Convergence for Life Sciences by Manish Sood
C* Summit 2013: Adaptive Data Convergence for Life Sciences by Manish Sood
 
Semic 2012 highlights report
Semic 2012 highlights report Semic 2012 highlights report
Semic 2012 highlights report
 
Case Study: Data Harmony Custom Features as Implemented for Triumph Learning
Case Study:  Data Harmony Custom Features as Implemented for Triumph LearningCase Study:  Data Harmony Custom Features as Implemented for Triumph Learning
Case Study: Data Harmony Custom Features as Implemented for Triumph Learning
 
Terminology Management
Terminology ManagementTerminology Management
Terminology Management
 
Taxonomies for E-commerce
Taxonomies for E-commerceTaxonomies for E-commerce
Taxonomies for E-commerce
 
Presentation businesscase
Presentation businesscasePresentation businesscase
Presentation businesscase
 
BLI Learning Leaders Symposium - Bersin Trends in Learning
BLI Learning Leaders Symposium - Bersin Trends in LearningBLI Learning Leaders Symposium - Bersin Trends in Learning
BLI Learning Leaders Symposium - Bersin Trends in Learning
 
User-Driven Taxonomies
User-Driven TaxonomiesUser-Driven Taxonomies
User-Driven Taxonomies
 
Veda Semantics - introduction document
Veda Semantics - introduction documentVeda Semantics - introduction document
Veda Semantics - introduction document
 
18. Alessandro Cattelan (Translated) Terminology
18. Alessandro Cattelan (Translated) Terminology18. Alessandro Cattelan (Translated) Terminology
18. Alessandro Cattelan (Translated) Terminology
 
Synonyms, Alternative Labels, and Nonpreferred Terms
Synonyms, Alternative Labels, and Nonpreferred TermsSynonyms, Alternative Labels, and Nonpreferred Terms
Synonyms, Alternative Labels, and Nonpreferred Terms
 
Dynamic Potential of Semantic Enrichment
Dynamic Potential of Semantic EnrichmentDynamic Potential of Semantic Enrichment
Dynamic Potential of Semantic Enrichment
 
Forrester Hr Wave 2012
Forrester Hr Wave 2012Forrester Hr Wave 2012
Forrester Hr Wave 2012
 
Forrester Wave Human Resource Management Systems Q1 2012
Forrester Wave Human Resource Management Systems Q1 2012Forrester Wave Human Resource Management Systems Q1 2012
Forrester Wave Human Resource Management Systems Q1 2012
 

Mehr von Heather Hedden

Introduction to Knowledge Graphs for Information Architects.pdf
Introduction to Knowledge Graphs for Information Architects.pdfIntroduction to Knowledge Graphs for Information Architects.pdf
Introduction to Knowledge Graphs for Information Architects.pdfHeather Hedden
 
Benefits of Taxonomies
Benefits of TaxonomiesBenefits of Taxonomies
Benefits of TaxonomiesHeather Hedden
 
Thesauri for Indexing Support / Thesauri zur Unterstützung der Registererstel...
Thesauri for Indexing Support / Thesauri zur Unterstützung der Registererstel...Thesauri for Indexing Support / Thesauri zur Unterstützung der Registererstel...
Thesauri for Indexing Support / Thesauri zur Unterstützung der Registererstel...Heather Hedden
 
Taxonomies in Support of Search
Taxonomies in Support of SearchTaxonomies in Support of Search
Taxonomies in Support of SearchHeather Hedden
 
A Brief Introduction to SKOS
A Brief Introduction to SKOSA Brief Introduction to SKOS
A Brief Introduction to SKOSHeather Hedden
 
A Brief Introduction to Knowledge Graphs
A Brief Introduction to Knowledge GraphsA Brief Introduction to Knowledge Graphs
A Brief Introduction to Knowledge GraphsHeather Hedden
 
Managing Taxonomy Tagging
Managing Taxonomy TaggingManaging Taxonomy Tagging
Managing Taxonomy TaggingHeather Hedden
 
Taxonomies, Categories, and Tags in WordPress
Taxonomies, Categories, and Tags in WordPressTaxonomies, Categories, and Tags in WordPress
Taxonomies, Categories, and Tags in WordPressHeather Hedden
 
Customer-Focused Thesauri
Customer-Focused ThesauriCustomer-Focused Thesauri
Customer-Focused ThesauriHeather Hedden
 
Managing Mature Taxonomies: Resolving Orphan Terms
Managing Mature Taxonomies: Resolving Orphan TermsManaging Mature Taxonomies: Resolving Orphan Terms
Managing Mature Taxonomies: Resolving Orphan TermsHeather Hedden
 
Making Decisions in Creating Taxonomies
Making Decisions in Creating TaxonomiesMaking Decisions in Creating Taxonomies
Making Decisions in Creating TaxonomiesHeather Hedden
 

Mehr von Heather Hedden (11)

Introduction to Knowledge Graphs for Information Architects.pdf
Introduction to Knowledge Graphs for Information Architects.pdfIntroduction to Knowledge Graphs for Information Architects.pdf
Introduction to Knowledge Graphs for Information Architects.pdf
 
Benefits of Taxonomies
Benefits of TaxonomiesBenefits of Taxonomies
Benefits of Taxonomies
 
Thesauri for Indexing Support / Thesauri zur Unterstützung der Registererstel...
Thesauri for Indexing Support / Thesauri zur Unterstützung der Registererstel...Thesauri for Indexing Support / Thesauri zur Unterstützung der Registererstel...
Thesauri for Indexing Support / Thesauri zur Unterstützung der Registererstel...
 
Taxonomies in Support of Search
Taxonomies in Support of SearchTaxonomies in Support of Search
Taxonomies in Support of Search
 
A Brief Introduction to SKOS
A Brief Introduction to SKOSA Brief Introduction to SKOS
A Brief Introduction to SKOS
 
A Brief Introduction to Knowledge Graphs
A Brief Introduction to Knowledge GraphsA Brief Introduction to Knowledge Graphs
A Brief Introduction to Knowledge Graphs
 
Managing Taxonomy Tagging
Managing Taxonomy TaggingManaging Taxonomy Tagging
Managing Taxonomy Tagging
 
Taxonomies, Categories, and Tags in WordPress
Taxonomies, Categories, and Tags in WordPressTaxonomies, Categories, and Tags in WordPress
Taxonomies, Categories, and Tags in WordPress
 
Customer-Focused Thesauri
Customer-Focused ThesauriCustomer-Focused Thesauri
Customer-Focused Thesauri
 
Managing Mature Taxonomies: Resolving Orphan Terms
Managing Mature Taxonomies: Resolving Orphan TermsManaging Mature Taxonomies: Resolving Orphan Terms
Managing Mature Taxonomies: Resolving Orphan Terms
 
Making Decisions in Creating Taxonomies
Making Decisions in Creating TaxonomiesMaking Decisions in Creating Taxonomies
Making Decisions in Creating Taxonomies
 

Mapping, Merging, and Multilingual Taxonomies

  • 1. Mapping, Merging, and Multilingual Taxonomies Heather Hedden Taxonomy Consultant Hedden Information Management SLA 2012 Hedden Information Presentation © 2012 Conference Management
  • 2. Heather Hedden Taxonomy consultant, Hedden Information Management Continuing education instructor with Simmons College Graduate School of Library and Information Science Author of The Accidental Taxonomist (Information Today, 2010) Previously worked as: Controlled vocabulary editor, IAC/Gale/Cengage Learning Internal taxonomy manager for an energy company Taxonomy consultant with consulting firms Taxonomist in product development at a search software vendor 2 © 2012 Hedden Information Management
  • 3. Agenda Background Mapping Taxonomies Merging Taxonomies Multilingual Taxonomies 3 © 2012 Hedden Information Management
  • 4. Agenda Background Mapping Taxonomies Merging Taxonomies Multilingual Taxonomies 4 © 2012 Hedden Information Management
  • 5. Background: Taxonomies Controlled Vocabulary/Taxonomy/Thesaurus An authoritative, restricted list of terms (words or phrases) Each term for a single unambiguous concept (synonyms/nonpreferred terms, as cross-references, may be included) Policies (control) for who, when, and how new terms can be added Typically has structured relationships between terms To support indexing/tagging/metadata management of content to facilitate content management and retrieval 5 © 2012 Hedden Information Management
  • 6. Faceted Taxonomy Hierarchical taxonomy Thesaurus Examples © 2012 Hedden Information Management
  • 7. Background: Mapping, Merging, & Multilingual Taxonomies Taxonomies/Controlled Vocabularies (CVs) are: 1. Designed 2. Built 3. Maintained/Managed But in time, a taxonomy may gain additional uses, and may need to be: Mapped or merged with another taxonomy Translated into another language or localized 7 © 2012 Hedden Information Management
  • 8. Background: Mapping, Merging, & Multilingual Taxonomies Mapping, Merging, and Multilingual Taxonomies: Methods of combining taxonomies Different methods > Different purposes Mapping Merging Multilingual 8 © 2012 Hedden Information Management
  • 9. Agenda Background Mapping Taxonomies Merging Taxonomies Multilingual Taxonomies 9 © 2012 Hedden Information Management
  • 10. Mapping Taxonomies Mapping: Enabling one controlled vocabulary (CV) to be used for another in the same subject area Retain them both as continued distinct vocabularies. A CV continues to be used to retrieve its content as before, plus additional content associated with Something representing the other CV. something else Mapping tables also called “crosswalks” 10 © 2012 Hedden Information Management
  • 11. Mapping Taxonomies Situations: Selected content with an enterprise taxonomy is made available on a public web site with a different public- facing taxonomy A content provider with a CV partners with a third-party information vendor with its own CV A provider of scientific/technical/medical content with a technical CV creates a simpler CV aimed at laypeople Search log query terms need to be integrated into the CV as additional nonpreferred (variant/synonym) terms. To support “federated search” that involves multiple taxonomies 11 © 2012 Hedden Information Management
  • 12. Mapping Taxonomies From a CV indexed to content to a retrieval/user-interface CV Use a software tool or scripts to compare vocabularies, to obtain matches in succeeding passes. Human review confirms and approves automatically proposed matching terms. Unmatched terms cannot be utilized. Narrower-to-broader matches are fine. Set automatic matches to also include matches of words/phrases of the retrieval taxonomy within a term from the indexing CV. Indexing taxonomy Retrieval/UI taxonomy HDTV Television sets Television sets 12 © 2012 Hedden Information Management
  • 13. Mapping Taxonomies Indexing CV in column A. Retrieval CV in column C. Taxonomist notes in column B. (“ok” is equivalent, “b” means second term is broader so also ok, and “n” is narrower or otherwise not acceptable.) 13 © 2012 Hedden Information Management
  • 14. Mapping Taxonomies Mapping user- entered search queries (column 2) to terms, in this case the term “Type of Vehicles.” If terms could be (narrower) examples of automobiles, put a “y” in the CV_Terms_Y column. Some terms are too broad and vague. 14 © 2012 Hedden Information Management
  • 15. Mapping Taxonomies Tools for mapping In commercial thesaurus/taxonomy software, designate a custom equivalence relationship: Example: USE-Map / UF-Map (in place of USE/UF) Import CSV mapping tables, such as created in Excel 15 © 2012 Hedden Information Management
  • 16. Agenda Background Mapping Taxonomies Merging Taxonomies Multilingual Taxonomies 16 © 2012 Hedden Information Management
  • 17. Merging Taxonomies Merging: Combining two or more redundant vocabularies in same subject area into one Without any longer retaining them as distinct Legacy content is retrieved through added equivalence relationships 17 © 2012 Hedden Information Management
  • 18. Merging Taxonomies Situations An enterprise taxonomy replaces multiple CVs of separate administrative departments An organization acquires or merges with another organization, and their redundant vocabularies are merged A folksonomy is incorporated into a CV An internally created CV is combined with a purchased/licensed CV 18 © 2012 Hedden Information Management
  • 19. Merging Taxonomies Merging – Which Direction? Designate a dominant/primary CV into which to merge the other: If an organization acquires another, then the acquirer’s CV is dominant. Or choose: The larger CV The CV with greater breadth The CV with greater depth The more structured CV The “better” CV 19 © 2012 Hedden Information Management
  • 20. Merging Taxonomies Use a software tool or scripts to compare vocabularies, to obtain matches in succeeding passes: Merging CV Primary CV (Keep and grows) Taxonomist (will go away) Reviews Exact matches of: Preferred term: Cars Preferred term: Cars no need Preferred term: Automobiles Nonpreferred term: Automobiles no need USE Cars Nonpreferred term: Cars Preferred term: Cars yes USE Automobiles Nonpreferred term: Cars Nonpreferred term: Cars yes USE Automobiles USE Autos Inexact matches of: Preferred term: Automobile Preferred term: Automobiles yes 20 © 2012 Hedden Information Management
  • 21. Merging Taxonomies Can create rules for automatic inexact or "fuzzy” matches, then subject to human review: Match Type: Examples: hyphens, parentheses, punctuation, and Healthcare Health care spaces plural/singular Teaching method Teaching methods common abbreviations and acronyms and & Dept. Department Word order Photography, digital Digital photography Addition of specified words (industry, Healthcare industry Healthcare services services, etc.) Grammatical endings Production Producing 21 © 2012 Hedden Information Management
  • 22. Merging Taxonomies Tools for merging Commercial thesaurus/taxonomy software with merge vocabularies feature Synaptica Wordmap Custom scripting (Perl, etc.) to compare vocabularies 22 © 2012 Hedden Information Management
  • 23. Mapping and Merging Summary Mapping Overlapping Controlled Vocabularies remain distinct, one used for the other in a specific application (indexing vs. retrieval CVs) Merging Overlapping Controlled Vocabularies combined permanently, removing duplicates 23 © 2012 Hedden Information Management
  • 24. Mapping and Merging Summary Compare two closely redundant vocabularies side-by- side, term-by-term First pass is automatic, followed by taxonomist review of matches Taxonomy software may have the feature, or do your own scripting Taxonomist reviews, discerns distinction between equivalent, broader/narrower, related terms to approve matches Taxonomist deals with terms more than structure. © 2012 Hedden Information Management
  • 25. Agenda Background Mapping Taxonomies Merging Taxonomies Multilingual Taxonomies 1. Multilingual Taxonomy Goals 2. Multilingual Taxonomy Design 3. Taxonomy Translation Management 25 © 2012 Hedden Information Management
  • 26. Multilingual Taxonomy Goals Bilingual/Multilingual Taxonomies can enable: 1. A user to search and retrieve content that is in multiple languages through a single taxonomy in their own language Français Deutsch Taxonomy: Single- language user interface (UI). Español Multiple language translations, not English-speaking displayed. user 26 © 2012 Hedden Information Management
  • 27. Multilingual Taxonomy Goals Bilingual/Multilingual Taxonomies can enable: 2. Different users who speak different languages to search the same body of content (in one other language), each using a taxonomy in the user interface in their native language Spanish speaker English French speaker German Multiple, different speaker language UIs. 27 © 2012 Hedden Information Management
  • 28. Multilingual Taxonomy Goals Bilingual/Multilingual Taxonomies can enable: 3. Different users who speak different languages to search the same body of content that is in multiple languages. Spanish Français speaker Deutsch French Español speaker German Multiple, different speaker language UIs. 28 © 2012 Hedden Information Management
  • 29. Multilingual Taxonomy Goals Goals #1 or #2: Users of one language can access content in a different language. Taxonomy in one language with equivalent translated terms The taxonomy needs to function in only one direction. Goal #3: Multilingual users can access multilingual content. Fully multilingual taxonomy or distinct taxonomies for each language linked at equivalent-meaning terms The taxonomy needs to function in both/all language directions. 29 © 2012 Hedden Information Management
  • 30. Multilingual Taxonomy Goals Different scenario: Multiple language taxonomies, each connected to its own language content, such as for separate web sites. Español Spanish speaker Français French Deutsch speaker German speaker Multiple, different language UIs. 30 © 2012 Hedden Information Management
  • 31. Multilingual Taxonomy Design Design the multilingual taxonomy to meet the taxonomy goals. In a one-direction translated taxonomy: The language of the searcher has structure to display. The language of the content may not need structure. Translations may be in one direction (user/display term may be used for content/index term, not vice versa). For a fully bidirectional multilingual taxonomy: Both language taxonomies need structure. Translations must be exact matches in both directions. For separate taxonomies in different languages: Taxonomies are not translated but each created and managed separately. 31 © 2012 Hedden Information Management
  • 32. Multilingual Taxonomy Design Dedicated taxonomy/thesaurus management software tools provide varying multilingual capabilities. 1. Customized text field used for term translations No vocabulary control of second language(s) 2. Second language taxonomy mirroring first, linked at each translated term Vocabulary control of second language(s) Copying taxonomy structure of primary language 3. Multiple taxonomies in different languages linked at equivalent term translations Each language may have its own structure (requires additional work to build) 32 © 2012 Hedden Information Management
  • 33. Multilingual Taxonomy Design 1. Customized field used for term translations Term Transla- tion Child Child Term 1 Transla- Term 2 Transla- tion tion Grand- Grand- Grand- Grand- child 1 child 2 Tr child 3 T child 4 Transla- tion tion tion tion 33 © 2012 Hedden Information Management
  • 34. Multilingual Taxonomy Design 2. Second language taxonomy mirroring first, linked at each translated term. Inter-term relationships replicate. Term Term Child Child Child Child Term 1 Term 2 Term 1 Term 2 Grand- Grand Grand- Grand- Grand- Grand- Grand- Grand- child 1 child 2 child 3 child 4 child 1 child 2 child 3 child 4 34 © 2012 Hedden Information Management
  • 35. Multilingual Taxonomy Design 3. Multiple taxonomies in different languages linked at equivalent term translations. Inter-term relationships may differ. Term Term Term Child Child Child Child Child Term 1 Term 2 Term Term 1 Term 2 3 Grand- Grand- Grand- Grand Grand- Grand- Grand- Grand- Grand- child 1 child 2 child 3 child 2 child 3 child 4 3 child 3 child 4 child 5 35 © 2012 Hedden Information Management
  • 36. Multilingual Taxonomy Design & Tools Dedicated taxonomy/thesaurus management software tool screenshot examples from: Data Harmony Thesaurus Master (Access Innovations, Inc.) Synaptica (Synaptica, LLC) MultiTes (Multisystems) Semaphore Ontology Manager (Smartlogic) Additional tools also provide similar capabilities. 36 © 2012 Hedden Information Management
  • 37. Multilingual Taxonomy Design & Tools Method #1: Create user-defined text field and enter translation Data Harmony Thesaurus Master 37 © 2012 Hedden Information Management
  • 38. Multilingual Taxonomy Design & Tools Method #1 Synaptica 38 © 2012 Hedden Information Management
  • 39. Multilingual Taxonomy Design & Tools Method #2: Create second language taxonomy mirroring first, linked at each translated term. Inter-term relationships replicate. MultiTes 39 © 2012 Hedden Information Management
  • 40. Multilingual Taxonomy Design & Tools Method #2: Smartlogic Semaphore Ontology Manager 40 © 2012 Hedden Information Management
  • 41. Multilingual Taxonomy Design & Tools Method #3: Link equivalent terms in different language by user-defined associative relationship. Synaptica 41 © 2012 Hedden Information Management
  • 42. Multilingual Taxonomy Design Translations of a term may display as another kind of relationship. Similar to equivalence, but both languages are preferred and none is nonpreferred From the bilingual European Training Thesaurus http://libserver.cedefop.europa.eu/ett 42 © 2012 Hedden Information Management
  • 43. Taxonomy Translation Management Taxonomy translations are typically created from scratch, translating each term. It is also possible to map and existing/separately created foreign language taxonomies to another, if their coverage is nearly identical. For Goals #1 or #2 (Users of one language accessing content in a different language) translations may suffice For Goal #3 (Multilingual users accessing multilingual content) mapping separately created taxonomies in each language is better. 43 © 2012 Hedden Information Management
  • 44. Taxonomy Translation Management User interface taxonomies in one language may be mapped to indexing taxonomies in another language. The retrieval taxonomy is in the language of the searcher. The indexing taxonomy is in the language of the content. The role of the different language taxonomies is typically dynamic depending on the language of the user depending on the language of the content The taxonomy of either language could be the retrieval taxonomy or the indexing taxonomy. Mapping has to go in both directions. Matches between terms in both languages have to be exact translations. 44 © 2012 Hedden Information Management
  • 45. Taxonomy Translation Management Matches are for concepts, not terms. Translations are for the concept and not necessarily for the preferred term. Nonpreferred (variant/synonym) terms may vary. Some can be translated Some cannot be translated Additional nonpreferred terms may be created in the second language(s) 45 © 2012 Hedden Information Management
  • 46. Taxonomy Translation Management Translating taxonomies/thesauri is different from translating documents. Pay by hour/project, not by word. Translators should have experience with translating in both directions. Translators should be familiar with using taxonomies, if not also taxonomists. If not using a translator who is also a taxonomist, have a taxonomist/information-specialist native speaker of target languages review the translated taxonomy. 46 © 2012 Hedden Information Management
  • 47. Taxonomy Translation Management Taxonomy Translation Issues Lack of an equivalent translation A term in one language having two meanings with two terms in another language (e.g. seguridad = safety or security) Term length Use of definite articles Use of abbreviations Use of plural Use of capitalization Alphabetizing sorting rules 47 © 2012 Hedden Information Management
  • 48. Taxonomy Translation Management Translation projects end, but taxonomy management does not. Taxonomy management issues: Taxonomy growth Taxonomy change Taxonomy management/ownership responsibility Merging or combining additional taxonomies Translations/additional language versions will need frequent reviewing and updating. 48 © 2012 Hedden Information Management
  • 49. Conclusions Mapping Taxonomies Merging Taxonomies Making Multilingual Taxonomies In all cases: Need to be pro-active and anticipate and plan for the future Need to bring in additional experts: subject matter experts, technology experts, translators 49 © 2012 Hedden Information Management
  • 50. Additional Taxonomy Resources/Training Book: The Accidental Taxonomist 2010, Information Today, Inc. www.accidental-taxonomist.com Taxonomies & Controlled Vocabularies 5-week online workshop Simmons College Graduate School of Library & Information Science Starting November, 2012, and January, 2013 http://alanis.simmons.edu/ceweb SLA Taxonomy Division http://taxonomy.sla.org © 2012 Hedden Information Management
  • 51. Contact Heather Hedden Hedden Information Management Carlisle, MA heather@hedden.net www.hedden-information.com accidental-taxonomist.blogspot.com Twitter: @hhedden 978-467-5195 © 2012 Hedden Information Management