SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Taxonomy Assessments -
                                 Part Two
                                 February 9, 2012




                                  Access Innovations, Inc.
             Leveraging Your Content Semantically
                                             Jay Ven Eman, Ph.D., CEO
                                                  j_ven_eman@accessinn.com
                                                      www.accessinn.com
                                                     www.dataharmony.com
                                                        +1.505.998.0800
                                                       Albuquerque, NM




© 2012. Access Innovations, Inc. All rights reserved.
Indexing
     Subject term assignment
     Permanent meta-data to indexed object
     Used for retrieval and evaluation
     Processes
      •     Manual
            •     Publisher
            •     3rd party aggregators
            •     Authors
      •     Automated methods


    © 2011. Access Innovations, Inc. All rights reserved.
Integration / workflow
                                                                      API’s, Client/Server,
              Author Submission                                     Web Services, HTTP-TCP/IP
                   System


Books
                                                                           Content
                                                                       Repository “A”
                                                                       Or Intermediate
Conference                                                               Processes
Proceedings



                                                                                  Content
  ETC.
                                                                                 Repository
                                                                                  “B”, etc.
                                   Thesaurus
                                                           M.A.I.
                                    Master


 Web                                       Data Harmony
 Sites                                     MAIstro Server


                                   Classification System

   © 2011. Access Innovations, Inc. All rights reserved.
Select the document collection
                                                                 CMS



                               Please select the database and the the document directory to load




 © 2011. Access Innovations, Inc. All rights reserved.
CMS




© 2011. Access Innovations, Inc. All rights reserved.
Sample unstructured document




 © 2011. Access Innovations, Inc. All rights reserved.
Run the documents through a metadata extraction
process to create well-formed, rich XML




                                                       • Automatic (per doc template)
                                                       • E.g. Dublin Core Metadata
                                                       • Bibliographic citation




    © 2011. Access Innovations, Inc. All rights reserved.
Automatically add the taxonomy
terms




                                                    Entity extraction: People,
                                                      Places, Things
                                                    Conceptual indexing: using the
                                                      taxonomy




 © 2011. Access Innovations, Inc. All rights reserved.
Classification Process or Assigned Indexing
                                                         <Anchor><Date>09-14-11</Date>
09-14-11
                                                         <TI>“Solving the Challenge”</TI>
“Solving the Challenge”
                                                         <BLH>By</BLH>
By Jay Ven Eman
                                                         <Author>
                                                         <AU_FN>Jay</AU_FN>
The process of indexing
                                                         <AU_MI></AU_MI>
a content object begins
                                                         <AU_LN>Ven Eman</AU_LN>
with…
                                                         </Author>
                                                         <Body>The process of indexing a content
                                                         object begins with…</Body>

                                                         <Subject>Indexing</Subject>
                                                         <Subject>Thesauri</Subject>
                                                         <Subject>Standards</Subject>
                                                         <Subject>Classification</Subject>
   Unstructured
                                                         </Anchor>

                                                                                             Structured


     Thesaurus
                               M.A.I.
      Master
                                                                       Content
              Data Harmony                                             Repository
              MAIstro Server                                           e.g. Database
       Classification System
     © 2011. Access Innovations, Inc. All rights reserved.
Indexing
     Indexing measures
      •     Indexing experts
      •     Subject matter experts (SME)
      •     Hits, misses, & noise
      •     85% hits
     In conjunction with taxonomy measures
      •     Over & under used terms
      •     Over & under indexed content



    © 2011. Access Innovations, Inc. All rights reserved.
Indexing & Search Metrics
     Hit, Miss, Noise
     Subjective
      •     Relevance
      •     Aboutness
     Statistical
      •     Precision
      •     Recall
      •     Level of effort



    © 2011. Access Innovations, Inc. All rights reserved.
Hit, Miss, Noise
     Hit – exactly what a human indexer would use
     Miss – human indexer would use, but system
      did not assign
     Noise – system assigned, but human did not
      •     Relevant noise – could have been assigned
      •     Irrelevant noise – just plain wrong




    © 2011. Access Innovations, Inc. All rights reserved.
Subjective
     Relevance
      •     Reflects how akin it is to the users request
     “Aboutness”
      •     Reflects the topical match between the document
            content and the term
      •     How well the topic describes what the document is
            about
     Varies with level of conceptual terms vs. factual
      terms in the thesaurus




    © 2011. Access Innovations, Inc. All rights reserved.
Indexing
     All content types & sources
      •     Inventory control
      •     Everything in, everything out
     Document types
      •     Articles
      •     Proceedings
      •     Corporate




    © 2011. Access Innovations, Inc. All rights reserved.
Link to Community Resources
(Source: Helen Atkins, AACR)
                                                CME
                                                               Upcoming
                   Other                     Activity on
                                                               Conference
                  Journal                     Topic A
                                                               on Topic A
                 Articles on
                  Topic A
                                                                        Job Posting
                                                  Journal                for Expert
                                                 Article on              on Topic A
                                                  Topic A

                Grant Available                               Podcast Interview
               for Researchers                                 with Researcher
                 Working on                                   Working on Topic A
                    Topic A               Author Networks
                                          Social Networking
                                          SME – Topic A

    © 2011. Access Innovations, Inc. All rights reserved.
Indexing with Data Harmony® M.A.I.™
     Rule base development
      •     80/20 rule
      •     Indexing objectives
     GUI
     Time-to-market
      •     Level of effort to build
      •     Level of effort to maintain
      •     Less than all other alternatives when
            indexing for high precision & recall


    © 2011. Access Innovations, Inc. All rights reserved.
Updating Rule Base
     Automatic for matching rules when using
      Data Harmony MAIstro™
     80/20 rule
     Re-index when 5% to 10% changes to
      taxonomy – arbitrary ranges:
      •     Monthly with small databases – 5k to 20k
      •     Quarterly with medium – 20k to 1 million
      •     Annual with large – greater than 1 million
     Depends on search software, too

    © 2011. Access Innovations, Inc. All rights reserved.
NAMES




© 2012. Access Innovations, Inc. All rights reserved.
What’s in a name?
     Juliet:
"What's in a name? That which
      we call a rose
     By any other name would smell as
      sweet."
     Romeo and Juliet (II, ii, 1-2)




    © 2011. Access Innovations, Inc. All rights reserved.
© 2012. Access Innovations, Inc. All rights reserved.
Magnitude of the Problem:
Facebook - 700 Million Users Projected for 2011(Open-First)




         700 Million Names

        How will your boss, peers,
        anyone ever find you?


    © 2012. Access Innovations, Inc. All rights reserved.
What’s in a name?
     My name         Jay Ven Eman
                      Ven Eman, Jay
      <First_Name>Jay</First_Name>
      <Last_Name>Ven Eman</Last_Name>
     Name variants  Aliases
      Jay Von Eman    William Henry McCarty
      Jay Van Eman    Henry Antrim
      Jay van Eman    William H. Bonney
      Jay ven Eman    Billy the Kid
      Jay Veneman  National & Cultural
      Jay Venema      Conventions
    © 2011. Access Innovations, Inc. All rights reserved.
Names
     Computationally & editorially intense
     Author submissions
     Membership records & the like
     Industry initiatives – ORCID, VIVO
     Subject term disambiguation
     Inventory control basics apply here, too
     Difficulty level is high
     Constance maintenance needed


    © 2011. Access Innovations, Inc. All rights reserved.
Taxonomy Assessments -
                                 Part Two
                                 February 9, 2012


                                 Thank you! Questions?
                                  Access Innovations, Inc.
             Leveraging Your Content Semantically
                                             Jay Ven Eman, Ph.D., CEO
                                                  j_ven_eman@accessinn.com
                                                      www.accessinn.com
                                                     www.dataharmony.com
                                                        +1.505.998.0800
                                                       Albuquerque, NM




© 2012. Access Innovations, Inc. All rights reserved.

Weitere ähnliche Inhalte

Ähnlich wie Taxonomy Assessments - Part Two

“It’s not rocket science!” Applying CMS and semantic enrichment to transform...
“It’s not rocket science!”  Applying CMS and semantic enrichment to transform...“It’s not rocket science!”  Applying CMS and semantic enrichment to transform...
“It’s not rocket science!” Applying CMS and semantic enrichment to transform...
Sarah Silveri, RSI Content Solutions
 
10 mistakes when moving to topic-based authoring
10 mistakes when moving to topic-based authoring10 mistakes when moving to topic-based authoring
10 mistakes when moving to topic-based authoring
Sharon Burton
 
FatWire Tutorial For Site Studio Developers
FatWire Tutorial For Site Studio DevelopersFatWire Tutorial For Site Studio Developers
FatWire Tutorial For Site Studio Developers
Brian Huff
 

Ähnlich wie Taxonomy Assessments - Part Two (20)

Taxonomies for Publishing
Taxonomies for PublishingTaxonomies for Publishing
Taxonomies for Publishing
 
SharePoint Taxonomy and Metadata 11-19-09
SharePoint Taxonomy and Metadata 11-19-09SharePoint Taxonomy and Metadata 11-19-09
SharePoint Taxonomy and Metadata 11-19-09
 
“It’s not rocket science!” Applying CMS and semantic enrichment to transform...
“It’s not rocket science!”  Applying CMS and semantic enrichment to transform...“It’s not rocket science!”  Applying CMS and semantic enrichment to transform...
“It’s not rocket science!” Applying CMS and semantic enrichment to transform...
 
10 mistakes when moving to topic-based authoring
10 mistakes when moving to topic-based authoring10 mistakes when moving to topic-based authoring
10 mistakes when moving to topic-based authoring
 
Business Objects....is it LOV?
Business Objects....is it LOV?Business Objects....is it LOV?
Business Objects....is it LOV?
 
Don't Re-write Code to Get Better Analytics
Don't Re-write Code to Get Better AnalyticsDon't Re-write Code to Get Better Analytics
Don't Re-write Code to Get Better Analytics
 
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
 
Elsevier Smart Content LDR SemTech NYC Oct-17-2012
Elsevier Smart Content LDR SemTech NYC Oct-17-2012Elsevier Smart Content LDR SemTech NYC Oct-17-2012
Elsevier Smart Content LDR SemTech NYC Oct-17-2012
 
Why I teach Content Strategy in Information Architecture
Why I teach Content Strategy in Information ArchitectureWhy I teach Content Strategy in Information Architecture
Why I teach Content Strategy in Information Architecture
 
AI-SDV 2020: Can There Be Profitable Revenue from an AI Deployment? The Upsid...
AI-SDV 2020: Can There Be Profitable Revenue from an AI Deployment? The Upsid...AI-SDV 2020: Can There Be Profitable Revenue from an AI Deployment? The Upsid...
AI-SDV 2020: Can There Be Profitable Revenue from an AI Deployment? The Upsid...
 
Better front-end development in Atlassian plugins
Better front-end development in Atlassian pluginsBetter front-end development in Atlassian plugins
Better front-end development in Atlassian plugins
 
Taxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information ArchitectureTaxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information Architecture
 
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?
 
(27.05) MOSSCA Invita - Búsqueda empresarial 2
(27.05) MOSSCA Invita - Búsqueda empresarial 2(27.05) MOSSCA Invita - Búsqueda empresarial 2
(27.05) MOSSCA Invita - Búsqueda empresarial 2
 
(28/05) MOSSCA Invita - Administración de Contenido Empresarial
(28/05) MOSSCA Invita - Administración de Contenido Empresarial(28/05) MOSSCA Invita - Administración de Contenido Empresarial
(28/05) MOSSCA Invita - Administración de Contenido Empresarial
 
Enforcing SharePoint Governance
Enforcing SharePoint GovernanceEnforcing SharePoint Governance
Enforcing SharePoint Governance
 
Alfresco content model
Alfresco content modelAlfresco content model
Alfresco content model
 
FatWire Tutorial For Site Studio Developers
FatWire Tutorial For Site Studio DevelopersFatWire Tutorial For Site Studio Developers
FatWire Tutorial For Site Studio Developers
 
Content Management, Metadata and Semantic Web
Content Management, Metadata and Semantic WebContent Management, Metadata and Semantic Web
Content Management, Metadata and Semantic Web
 
Content Management, Metadata and Semantic Web
Content Management, Metadata and Semantic WebContent Management, Metadata and Semantic Web
Content Management, Metadata and Semantic Web
 

Mehr von Access Innovations, Inc.

Mehr von Access Innovations, Inc. (20)

Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsMaking AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
 
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
 
Smart submit
Smart submitSmart submit
Smart submit
 
Plos taxonomy beyond search dhug 2021
Plos taxonomy beyond search   dhug 2021Plos taxonomy beyond search   dhug 2021
Plos taxonomy beyond search dhug 2021
 
Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)
 
Data harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacingData harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacing
 
Data harmony update 2021
Data harmony update 2021 Data harmony update 2021
Data harmony update 2021
 
Atypon dhug2021
Atypon dhug2021Atypon dhug2021
Atypon dhug2021
 
Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021
 
Asce more than just topic taxonomies
Asce more than just topic taxonomiesAsce more than just topic taxonomies
Asce more than just topic taxonomies
 
Acs discoverability-dhug2021
Acs discoverability-dhug2021Acs discoverability-dhug2021
Acs discoverability-dhug2021
 
Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)
 
Tagging overview - Why Keywords Don't Cut It
Tagging overview  - Why Keywords Don't Cut ItTagging overview  - Why Keywords Don't Cut It
Tagging overview - Why Keywords Don't Cut It
 
Health Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut ItHealth Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut It
 
Why Keywords Don't Cut It
Why Keywords Don't Cut ItWhy Keywords Don't Cut It
Why Keywords Don't Cut It
 
Data Harmony update 2020 final
Data Harmony update 2020 finalData Harmony update 2020 final
Data Harmony update 2020 final
 
Data Harmony Update 2020 final
Data Harmony Update 2020 finalData Harmony Update 2020 final
Data Harmony Update 2020 final
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository Interoperability
 
DHUG 2018 - Florida Thesis OCR
DHUG 2018 - Florida Thesis OCRDHUG 2018 - Florida Thesis OCR
DHUG 2018 - Florida Thesis OCR
 
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedDHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
 

Kürzlich hochgeladen

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Kürzlich hochgeladen (20)

Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 

Taxonomy Assessments - Part Two

  • 1. Taxonomy Assessments - Part Two February 9, 2012 Access Innovations, Inc. Leveraging Your Content Semantically Jay Ven Eman, Ph.D., CEO j_ven_eman@accessinn.com www.accessinn.com www.dataharmony.com +1.505.998.0800 Albuquerque, NM © 2012. Access Innovations, Inc. All rights reserved.
  • 2. Indexing  Subject term assignment  Permanent meta-data to indexed object  Used for retrieval and evaluation  Processes • Manual • Publisher • 3rd party aggregators • Authors • Automated methods © 2011. Access Innovations, Inc. All rights reserved.
  • 3. Integration / workflow API’s, Client/Server, Author Submission Web Services, HTTP-TCP/IP System Books Content Repository “A” Or Intermediate Conference Processes Proceedings Content ETC. Repository “B”, etc. Thesaurus M.A.I. Master Web Data Harmony Sites MAIstro Server Classification System © 2011. Access Innovations, Inc. All rights reserved.
  • 4. Select the document collection CMS Please select the database and the the document directory to load © 2011. Access Innovations, Inc. All rights reserved.
  • 5. CMS © 2011. Access Innovations, Inc. All rights reserved.
  • 6. Sample unstructured document © 2011. Access Innovations, Inc. All rights reserved.
  • 7. Run the documents through a metadata extraction process to create well-formed, rich XML • Automatic (per doc template) • E.g. Dublin Core Metadata • Bibliographic citation © 2011. Access Innovations, Inc. All rights reserved.
  • 8. Automatically add the taxonomy terms Entity extraction: People, Places, Things Conceptual indexing: using the taxonomy © 2011. Access Innovations, Inc. All rights reserved.
  • 9. Classification Process or Assigned Indexing <Anchor><Date>09-14-11</Date> 09-14-11 <TI>“Solving the Challenge”</TI> “Solving the Challenge” <BLH>By</BLH> By Jay Ven Eman <Author> <AU_FN>Jay</AU_FN> The process of indexing <AU_MI></AU_MI> a content object begins <AU_LN>Ven Eman</AU_LN> with… </Author> <Body>The process of indexing a content object begins with…</Body> <Subject>Indexing</Subject> <Subject>Thesauri</Subject> <Subject>Standards</Subject> <Subject>Classification</Subject> Unstructured </Anchor> Structured Thesaurus M.A.I. Master Content Data Harmony Repository MAIstro Server e.g. Database Classification System © 2011. Access Innovations, Inc. All rights reserved.
  • 10. Indexing  Indexing measures • Indexing experts • Subject matter experts (SME) • Hits, misses, & noise • 85% hits  In conjunction with taxonomy measures • Over & under used terms • Over & under indexed content © 2011. Access Innovations, Inc. All rights reserved.
  • 11. Indexing & Search Metrics  Hit, Miss, Noise  Subjective • Relevance • Aboutness  Statistical • Precision • Recall • Level of effort © 2011. Access Innovations, Inc. All rights reserved.
  • 12. Hit, Miss, Noise  Hit – exactly what a human indexer would use  Miss – human indexer would use, but system did not assign  Noise – system assigned, but human did not • Relevant noise – could have been assigned • Irrelevant noise – just plain wrong © 2011. Access Innovations, Inc. All rights reserved.
  • 13. Subjective  Relevance • Reflects how akin it is to the users request  “Aboutness” • Reflects the topical match between the document content and the term • How well the topic describes what the document is about  Varies with level of conceptual terms vs. factual terms in the thesaurus © 2011. Access Innovations, Inc. All rights reserved.
  • 14. Indexing  All content types & sources • Inventory control • Everything in, everything out  Document types • Articles • Proceedings • Corporate © 2011. Access Innovations, Inc. All rights reserved.
  • 15. Link to Community Resources (Source: Helen Atkins, AACR) CME Upcoming Other Activity on Conference Journal Topic A on Topic A Articles on Topic A Job Posting Journal for Expert Article on on Topic A Topic A Grant Available Podcast Interview for Researchers with Researcher Working on Working on Topic A Topic A Author Networks Social Networking SME – Topic A © 2011. Access Innovations, Inc. All rights reserved.
  • 16. Indexing with Data Harmony® M.A.I.™  Rule base development • 80/20 rule • Indexing objectives  GUI  Time-to-market • Level of effort to build • Level of effort to maintain • Less than all other alternatives when indexing for high precision & recall © 2011. Access Innovations, Inc. All rights reserved.
  • 17. Updating Rule Base  Automatic for matching rules when using Data Harmony MAIstro™  80/20 rule  Re-index when 5% to 10% changes to taxonomy – arbitrary ranges: • Monthly with small databases – 5k to 20k • Quarterly with medium – 20k to 1 million • Annual with large – greater than 1 million  Depends on search software, too © 2011. Access Innovations, Inc. All rights reserved.
  • 18. NAMES © 2012. Access Innovations, Inc. All rights reserved.
  • 19. What’s in a name?  Juliet:
"What's in a name? That which we call a rose  By any other name would smell as sweet."  Romeo and Juliet (II, ii, 1-2) © 2011. Access Innovations, Inc. All rights reserved.
  • 20. © 2012. Access Innovations, Inc. All rights reserved.
  • 21. Magnitude of the Problem: Facebook - 700 Million Users Projected for 2011(Open-First) 700 Million Names How will your boss, peers, anyone ever find you? © 2012. Access Innovations, Inc. All rights reserved.
  • 22. What’s in a name?  My name Jay Ven Eman Ven Eman, Jay <First_Name>Jay</First_Name> <Last_Name>Ven Eman</Last_Name>  Name variants  Aliases Jay Von Eman William Henry McCarty Jay Van Eman Henry Antrim Jay van Eman William H. Bonney Jay ven Eman Billy the Kid Jay Veneman  National & Cultural Jay Venema Conventions © 2011. Access Innovations, Inc. All rights reserved.
  • 23. Names  Computationally & editorially intense  Author submissions  Membership records & the like  Industry initiatives – ORCID, VIVO  Subject term disambiguation  Inventory control basics apply here, too  Difficulty level is high  Constance maintenance needed © 2011. Access Innovations, Inc. All rights reserved.
  • 24. Taxonomy Assessments - Part Two February 9, 2012 Thank you! Questions? Access Innovations, Inc. Leveraging Your Content Semantically Jay Ven Eman, Ph.D., CEO j_ven_eman@accessinn.com www.accessinn.com www.dataharmony.com +1.505.998.0800 Albuquerque, NM © 2012. Access Innovations, Inc. All rights reserved.

Hinweis der Redaktion

  1. PDF
  2. Post processing“Labels” content itemBut also classifies author
  3. Thanks to Helen Atkins of AACR for this illustration.The real power of this is that the links can all go in all directions, so we take advantage of having the user’s attention regardless of how they step into our “web”Continuing Medical Education (CME)
  4. Johnny Carson