SlideShare a Scribd company logo
1 of 24
Taxonomy Assessments -
                                 Part Two
                                 February 9, 2012




                                  Access Innovations, Inc.
             Leveraging Your Content Semantically
                                             Jay Ven Eman, Ph.D., CEO
                                                  j_ven_eman@accessinn.com
                                                      www.accessinn.com
                                                     www.dataharmony.com
                                                        +1.505.998.0800
                                                       Albuquerque, NM




© 2012. Access Innovations, Inc. All rights reserved.
Indexing
     Subject term assignment
     Permanent meta-data to indexed object
     Used for retrieval and evaluation
     Processes
      •     Manual
            •     Publisher
            •     3rd party aggregators
            •     Authors
      •     Automated methods


    © 2011. Access Innovations, Inc. All rights reserved.
Integration / workflow
                                                                      API’s, Client/Server,
              Author Submission                                     Web Services, HTTP-TCP/IP
                   System


Books
                                                                           Content
                                                                       Repository “A”
                                                                       Or Intermediate
Conference                                                               Processes
Proceedings



                                                                                  Content
  ETC.
                                                                                 Repository
                                                                                  “B”, etc.
                                   Thesaurus
                                                           M.A.I.
                                    Master


 Web                                       Data Harmony
 Sites                                     MAIstro Server


                                   Classification System

   © 2011. Access Innovations, Inc. All rights reserved.
Select the document collection
                                                                 CMS



                               Please select the database and the the document directory to load




 © 2011. Access Innovations, Inc. All rights reserved.
CMS




© 2011. Access Innovations, Inc. All rights reserved.
Sample unstructured document




 © 2011. Access Innovations, Inc. All rights reserved.
Run the documents through a metadata extraction
process to create well-formed, rich XML




                                                       • Automatic (per doc template)
                                                       • E.g. Dublin Core Metadata
                                                       • Bibliographic citation




    © 2011. Access Innovations, Inc. All rights reserved.
Automatically add the taxonomy
terms




                                                    Entity extraction: People,
                                                      Places, Things
                                                    Conceptual indexing: using the
                                                      taxonomy




 © 2011. Access Innovations, Inc. All rights reserved.
Classification Process or Assigned Indexing
                                                         <Anchor><Date>09-14-11</Date>
09-14-11
                                                         <TI>“Solving the Challenge”</TI>
“Solving the Challenge”
                                                         <BLH>By</BLH>
By Jay Ven Eman
                                                         <Author>
                                                         <AU_FN>Jay</AU_FN>
The process of indexing
                                                         <AU_MI></AU_MI>
a content object begins
                                                         <AU_LN>Ven Eman</AU_LN>
with…
                                                         </Author>
                                                         <Body>The process of indexing a content
                                                         object begins with…</Body>

                                                         <Subject>Indexing</Subject>
                                                         <Subject>Thesauri</Subject>
                                                         <Subject>Standards</Subject>
                                                         <Subject>Classification</Subject>
   Unstructured
                                                         </Anchor>

                                                                                             Structured


     Thesaurus
                               M.A.I.
      Master
                                                                       Content
              Data Harmony                                             Repository
              MAIstro Server                                           e.g. Database
       Classification System
     © 2011. Access Innovations, Inc. All rights reserved.
Indexing
     Indexing measures
      •     Indexing experts
      •     Subject matter experts (SME)
      •     Hits, misses, & noise
      •     85% hits
     In conjunction with taxonomy measures
      •     Over & under used terms
      •     Over & under indexed content



    © 2011. Access Innovations, Inc. All rights reserved.
Indexing & Search Metrics
     Hit, Miss, Noise
     Subjective
      •     Relevance
      •     Aboutness
     Statistical
      •     Precision
      •     Recall
      •     Level of effort



    © 2011. Access Innovations, Inc. All rights reserved.
Hit, Miss, Noise
     Hit – exactly what a human indexer would use
     Miss – human indexer would use, but system
      did not assign
     Noise – system assigned, but human did not
      •     Relevant noise – could have been assigned
      •     Irrelevant noise – just plain wrong




    © 2011. Access Innovations, Inc. All rights reserved.
Subjective
     Relevance
      •     Reflects how akin it is to the users request
     “Aboutness”
      •     Reflects the topical match between the document
            content and the term
      •     How well the topic describes what the document is
            about
     Varies with level of conceptual terms vs. factual
      terms in the thesaurus




    © 2011. Access Innovations, Inc. All rights reserved.
Indexing
     All content types & sources
      •     Inventory control
      •     Everything in, everything out
     Document types
      •     Articles
      •     Proceedings
      •     Corporate




    © 2011. Access Innovations, Inc. All rights reserved.
Link to Community Resources
(Source: Helen Atkins, AACR)
                                                CME
                                                               Upcoming
                   Other                     Activity on
                                                               Conference
                  Journal                     Topic A
                                                               on Topic A
                 Articles on
                  Topic A
                                                                        Job Posting
                                                  Journal                for Expert
                                                 Article on              on Topic A
                                                  Topic A

                Grant Available                               Podcast Interview
               for Researchers                                 with Researcher
                 Working on                                   Working on Topic A
                    Topic A               Author Networks
                                          Social Networking
                                          SME – Topic A

    © 2011. Access Innovations, Inc. All rights reserved.
Indexing with Data Harmony® M.A.I.™
     Rule base development
      •     80/20 rule
      •     Indexing objectives
     GUI
     Time-to-market
      •     Level of effort to build
      •     Level of effort to maintain
      •     Less than all other alternatives when
            indexing for high precision & recall


    © 2011. Access Innovations, Inc. All rights reserved.
Updating Rule Base
     Automatic for matching rules when using
      Data Harmony MAIstro™
     80/20 rule
     Re-index when 5% to 10% changes to
      taxonomy – arbitrary ranges:
      •     Monthly with small databases – 5k to 20k
      •     Quarterly with medium – 20k to 1 million
      •     Annual with large – greater than 1 million
     Depends on search software, too

    © 2011. Access Innovations, Inc. All rights reserved.
NAMES




© 2012. Access Innovations, Inc. All rights reserved.
What’s in a name?
     Juliet:
"What's in a name? That which
      we call a rose
     By any other name would smell as
      sweet."
     Romeo and Juliet (II, ii, 1-2)




    © 2011. Access Innovations, Inc. All rights reserved.
© 2012. Access Innovations, Inc. All rights reserved.
Magnitude of the Problem:
Facebook - 700 Million Users Projected for 2011(Open-First)




         700 Million Names

        How will your boss, peers,
        anyone ever find you?


    © 2012. Access Innovations, Inc. All rights reserved.
What’s in a name?
     My name         Jay Ven Eman
                      Ven Eman, Jay
      <First_Name>Jay</First_Name>
      <Last_Name>Ven Eman</Last_Name>
     Name variants  Aliases
      Jay Von Eman    William Henry McCarty
      Jay Van Eman    Henry Antrim
      Jay van Eman    William H. Bonney
      Jay ven Eman    Billy the Kid
      Jay Veneman  National & Cultural
      Jay Venema      Conventions
    © 2011. Access Innovations, Inc. All rights reserved.
Names
     Computationally & editorially intense
     Author submissions
     Membership records & the like
     Industry initiatives – ORCID, VIVO
     Subject term disambiguation
     Inventory control basics apply here, too
     Difficulty level is high
     Constance maintenance needed


    © 2011. Access Innovations, Inc. All rights reserved.
Taxonomy Assessments -
                                 Part Two
                                 February 9, 2012


                                 Thank you! Questions?
                                  Access Innovations, Inc.
             Leveraging Your Content Semantically
                                             Jay Ven Eman, Ph.D., CEO
                                                  j_ven_eman@accessinn.com
                                                      www.accessinn.com
                                                     www.dataharmony.com
                                                        +1.505.998.0800
                                                       Albuquerque, NM




© 2012. Access Innovations, Inc. All rights reserved.

More Related Content

Similar to Taxonomy Assessments - Part Two

“It’s not rocket science!” Applying CMS and semantic enrichment to transform...
“It’s not rocket science!”  Applying CMS and semantic enrichment to transform...“It’s not rocket science!”  Applying CMS and semantic enrichment to transform...
“It’s not rocket science!” Applying CMS and semantic enrichment to transform...
Sarah Silveri, RSI Content Solutions
 
10 mistakes when moving to topic-based authoring
10 mistakes when moving to topic-based authoring10 mistakes when moving to topic-based authoring
10 mistakes when moving to topic-based authoring
Sharon Burton
 
FatWire Tutorial For Site Studio Developers
FatWire Tutorial For Site Studio DevelopersFatWire Tutorial For Site Studio Developers
FatWire Tutorial For Site Studio Developers
Brian Huff
 

Similar to Taxonomy Assessments - Part Two (20)

Taxonomies for Publishing
Taxonomies for PublishingTaxonomies for Publishing
Taxonomies for Publishing
 
SharePoint Taxonomy and Metadata 11-19-09
SharePoint Taxonomy and Metadata 11-19-09SharePoint Taxonomy and Metadata 11-19-09
SharePoint Taxonomy and Metadata 11-19-09
 
“It’s not rocket science!” Applying CMS and semantic enrichment to transform...
“It’s not rocket science!”  Applying CMS and semantic enrichment to transform...“It’s not rocket science!”  Applying CMS and semantic enrichment to transform...
“It’s not rocket science!” Applying CMS and semantic enrichment to transform...
 
10 mistakes when moving to topic-based authoring
10 mistakes when moving to topic-based authoring10 mistakes when moving to topic-based authoring
10 mistakes when moving to topic-based authoring
 
Business Objects....is it LOV?
Business Objects....is it LOV?Business Objects....is it LOV?
Business Objects....is it LOV?
 
Don't Re-write Code to Get Better Analytics
Don't Re-write Code to Get Better AnalyticsDon't Re-write Code to Get Better Analytics
Don't Re-write Code to Get Better Analytics
 
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
 
Elsevier Smart Content LDR SemTech NYC Oct-17-2012
Elsevier Smart Content LDR SemTech NYC Oct-17-2012Elsevier Smart Content LDR SemTech NYC Oct-17-2012
Elsevier Smart Content LDR SemTech NYC Oct-17-2012
 
Why I teach Content Strategy in Information Architecture
Why I teach Content Strategy in Information ArchitectureWhy I teach Content Strategy in Information Architecture
Why I teach Content Strategy in Information Architecture
 
AI-SDV 2020: Can There Be Profitable Revenue from an AI Deployment? The Upsid...
AI-SDV 2020: Can There Be Profitable Revenue from an AI Deployment? The Upsid...AI-SDV 2020: Can There Be Profitable Revenue from an AI Deployment? The Upsid...
AI-SDV 2020: Can There Be Profitable Revenue from an AI Deployment? The Upsid...
 
Better front-end development in Atlassian plugins
Better front-end development in Atlassian pluginsBetter front-end development in Atlassian plugins
Better front-end development in Atlassian plugins
 
Taxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information ArchitectureTaxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information Architecture
 
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?
 
(27.05) MOSSCA Invita - Búsqueda empresarial 2
(27.05) MOSSCA Invita - Búsqueda empresarial 2(27.05) MOSSCA Invita - Búsqueda empresarial 2
(27.05) MOSSCA Invita - Búsqueda empresarial 2
 
(28/05) MOSSCA Invita - Administración de Contenido Empresarial
(28/05) MOSSCA Invita - Administración de Contenido Empresarial(28/05) MOSSCA Invita - Administración de Contenido Empresarial
(28/05) MOSSCA Invita - Administración de Contenido Empresarial
 
Enforcing SharePoint Governance
Enforcing SharePoint GovernanceEnforcing SharePoint Governance
Enforcing SharePoint Governance
 
Alfresco content model
Alfresco content modelAlfresco content model
Alfresco content model
 
FatWire Tutorial For Site Studio Developers
FatWire Tutorial For Site Studio DevelopersFatWire Tutorial For Site Studio Developers
FatWire Tutorial For Site Studio Developers
 
Content Management, Metadata and Semantic Web
Content Management, Metadata and Semantic WebContent Management, Metadata and Semantic Web
Content Management, Metadata and Semantic Web
 
Content Management, Metadata and Semantic Web
Content Management, Metadata and Semantic WebContent Management, Metadata and Semantic Web
Content Management, Metadata and Semantic Web
 

More from Access Innovations, Inc.

More from Access Innovations, Inc. (20)

Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsMaking AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
 
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
 
Smart submit
Smart submitSmart submit
Smart submit
 
Plos taxonomy beyond search dhug 2021
Plos taxonomy beyond search   dhug 2021Plos taxonomy beyond search   dhug 2021
Plos taxonomy beyond search dhug 2021
 
Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)
 
Data harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacingData harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacing
 
Data harmony update 2021
Data harmony update 2021 Data harmony update 2021
Data harmony update 2021
 
Atypon dhug2021
Atypon dhug2021Atypon dhug2021
Atypon dhug2021
 
Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021
 
Asce more than just topic taxonomies
Asce more than just topic taxonomiesAsce more than just topic taxonomies
Asce more than just topic taxonomies
 
Acs discoverability-dhug2021
Acs discoverability-dhug2021Acs discoverability-dhug2021
Acs discoverability-dhug2021
 
Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)
 
Tagging overview - Why Keywords Don't Cut It
Tagging overview  - Why Keywords Don't Cut ItTagging overview  - Why Keywords Don't Cut It
Tagging overview - Why Keywords Don't Cut It
 
Health Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut ItHealth Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut It
 
Why Keywords Don't Cut It
Why Keywords Don't Cut ItWhy Keywords Don't Cut It
Why Keywords Don't Cut It
 
Data Harmony update 2020 final
Data Harmony update 2020 finalData Harmony update 2020 final
Data Harmony update 2020 final
 
Data Harmony Update 2020 final
Data Harmony Update 2020 finalData Harmony Update 2020 final
Data Harmony Update 2020 final
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository Interoperability
 
DHUG 2018 - Florida Thesis OCR
DHUG 2018 - Florida Thesis OCRDHUG 2018 - Florida Thesis OCR
DHUG 2018 - Florida Thesis OCR
 
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedDHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
 

Recently uploaded

The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptx
heathfieldcps1
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
中 央社
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
中 央社
 
ppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyesppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyes
ashishpaul799
 

Recently uploaded (20)

2024_Student Session 2_ Set Plan Preparation.pptx
2024_Student Session 2_ Set Plan Preparation.pptx2024_Student Session 2_ Set Plan Preparation.pptx
2024_Student Session 2_ Set Plan Preparation.pptx
 
The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptx
 
Application of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesApplication of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matrices
 
MichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdfMichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdf
 
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdfDanh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
 
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
Operations Management - Book1.p  - Dr. Abdulfatah A. SalemOperations Management - Book1.p  - Dr. Abdulfatah A. Salem
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
 
Morse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptxMorse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptx
 
How to Manage Closest Location in Odoo 17 Inventory
How to Manage Closest Location in Odoo 17 InventoryHow to Manage Closest Location in Odoo 17 Inventory
How to Manage Closest Location in Odoo 17 Inventory
 
size separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceuticssize separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceutics
 
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdfINU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
 
How to Analyse Profit of a Sales Order in Odoo 17
How to Analyse Profit of a Sales Order in Odoo 17How to Analyse Profit of a Sales Order in Odoo 17
How to Analyse Profit of a Sales Order in Odoo 17
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
 
Discover the Dark Web .pdf InfosecTrain
Discover the Dark Web .pdf  InfosecTrainDiscover the Dark Web .pdf  InfosecTrain
Discover the Dark Web .pdf InfosecTrain
 
ppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyesppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyes
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/
 
An overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismAn overview of the various scriptures in Hinduism
An overview of the various scriptures in Hinduism
 
Capitol Tech Univ Doctoral Presentation -May 2024
Capitol Tech Univ Doctoral Presentation -May 2024Capitol Tech Univ Doctoral Presentation -May 2024
Capitol Tech Univ Doctoral Presentation -May 2024
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
 
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General QuizPragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
 

Taxonomy Assessments - Part Two

  • 1. Taxonomy Assessments - Part Two February 9, 2012 Access Innovations, Inc. Leveraging Your Content Semantically Jay Ven Eman, Ph.D., CEO j_ven_eman@accessinn.com www.accessinn.com www.dataharmony.com +1.505.998.0800 Albuquerque, NM © 2012. Access Innovations, Inc. All rights reserved.
  • 2. Indexing  Subject term assignment  Permanent meta-data to indexed object  Used for retrieval and evaluation  Processes • Manual • Publisher • 3rd party aggregators • Authors • Automated methods © 2011. Access Innovations, Inc. All rights reserved.
  • 3. Integration / workflow API’s, Client/Server, Author Submission Web Services, HTTP-TCP/IP System Books Content Repository “A” Or Intermediate Conference Processes Proceedings Content ETC. Repository “B”, etc. Thesaurus M.A.I. Master Web Data Harmony Sites MAIstro Server Classification System © 2011. Access Innovations, Inc. All rights reserved.
  • 4. Select the document collection CMS Please select the database and the the document directory to load © 2011. Access Innovations, Inc. All rights reserved.
  • 5. CMS © 2011. Access Innovations, Inc. All rights reserved.
  • 6. Sample unstructured document © 2011. Access Innovations, Inc. All rights reserved.
  • 7. Run the documents through a metadata extraction process to create well-formed, rich XML • Automatic (per doc template) • E.g. Dublin Core Metadata • Bibliographic citation © 2011. Access Innovations, Inc. All rights reserved.
  • 8. Automatically add the taxonomy terms Entity extraction: People, Places, Things Conceptual indexing: using the taxonomy © 2011. Access Innovations, Inc. All rights reserved.
  • 9. Classification Process or Assigned Indexing <Anchor><Date>09-14-11</Date> 09-14-11 <TI>“Solving the Challenge”</TI> “Solving the Challenge” <BLH>By</BLH> By Jay Ven Eman <Author> <AU_FN>Jay</AU_FN> The process of indexing <AU_MI></AU_MI> a content object begins <AU_LN>Ven Eman</AU_LN> with… </Author> <Body>The process of indexing a content object begins with…</Body> <Subject>Indexing</Subject> <Subject>Thesauri</Subject> <Subject>Standards</Subject> <Subject>Classification</Subject> Unstructured </Anchor> Structured Thesaurus M.A.I. Master Content Data Harmony Repository MAIstro Server e.g. Database Classification System © 2011. Access Innovations, Inc. All rights reserved.
  • 10. Indexing  Indexing measures • Indexing experts • Subject matter experts (SME) • Hits, misses, & noise • 85% hits  In conjunction with taxonomy measures • Over & under used terms • Over & under indexed content © 2011. Access Innovations, Inc. All rights reserved.
  • 11. Indexing & Search Metrics  Hit, Miss, Noise  Subjective • Relevance • Aboutness  Statistical • Precision • Recall • Level of effort © 2011. Access Innovations, Inc. All rights reserved.
  • 12. Hit, Miss, Noise  Hit – exactly what a human indexer would use  Miss – human indexer would use, but system did not assign  Noise – system assigned, but human did not • Relevant noise – could have been assigned • Irrelevant noise – just plain wrong © 2011. Access Innovations, Inc. All rights reserved.
  • 13. Subjective  Relevance • Reflects how akin it is to the users request  “Aboutness” • Reflects the topical match between the document content and the term • How well the topic describes what the document is about  Varies with level of conceptual terms vs. factual terms in the thesaurus © 2011. Access Innovations, Inc. All rights reserved.
  • 14. Indexing  All content types & sources • Inventory control • Everything in, everything out  Document types • Articles • Proceedings • Corporate © 2011. Access Innovations, Inc. All rights reserved.
  • 15. Link to Community Resources (Source: Helen Atkins, AACR) CME Upcoming Other Activity on Conference Journal Topic A on Topic A Articles on Topic A Job Posting Journal for Expert Article on on Topic A Topic A Grant Available Podcast Interview for Researchers with Researcher Working on Working on Topic A Topic A Author Networks Social Networking SME – Topic A © 2011. Access Innovations, Inc. All rights reserved.
  • 16. Indexing with Data Harmony® M.A.I.™  Rule base development • 80/20 rule • Indexing objectives  GUI  Time-to-market • Level of effort to build • Level of effort to maintain • Less than all other alternatives when indexing for high precision & recall © 2011. Access Innovations, Inc. All rights reserved.
  • 17. Updating Rule Base  Automatic for matching rules when using Data Harmony MAIstro™  80/20 rule  Re-index when 5% to 10% changes to taxonomy – arbitrary ranges: • Monthly with small databases – 5k to 20k • Quarterly with medium – 20k to 1 million • Annual with large – greater than 1 million  Depends on search software, too © 2011. Access Innovations, Inc. All rights reserved.
  • 18. NAMES © 2012. Access Innovations, Inc. All rights reserved.
  • 19. What’s in a name?  Juliet:
"What's in a name? That which we call a rose  By any other name would smell as sweet."  Romeo and Juliet (II, ii, 1-2) © 2011. Access Innovations, Inc. All rights reserved.
  • 20. © 2012. Access Innovations, Inc. All rights reserved.
  • 21. Magnitude of the Problem: Facebook - 700 Million Users Projected for 2011(Open-First) 700 Million Names How will your boss, peers, anyone ever find you? © 2012. Access Innovations, Inc. All rights reserved.
  • 22. What’s in a name?  My name Jay Ven Eman Ven Eman, Jay <First_Name>Jay</First_Name> <Last_Name>Ven Eman</Last_Name>  Name variants  Aliases Jay Von Eman William Henry McCarty Jay Van Eman Henry Antrim Jay van Eman William H. Bonney Jay ven Eman Billy the Kid Jay Veneman  National & Cultural Jay Venema Conventions © 2011. Access Innovations, Inc. All rights reserved.
  • 23. Names  Computationally & editorially intense  Author submissions  Membership records & the like  Industry initiatives – ORCID, VIVO  Subject term disambiguation  Inventory control basics apply here, too  Difficulty level is high  Constance maintenance needed © 2011. Access Innovations, Inc. All rights reserved.
  • 24. Taxonomy Assessments - Part Two February 9, 2012 Thank you! Questions? Access Innovations, Inc. Leveraging Your Content Semantically Jay Ven Eman, Ph.D., CEO j_ven_eman@accessinn.com www.accessinn.com www.dataharmony.com +1.505.998.0800 Albuquerque, NM © 2012. Access Innovations, Inc. All rights reserved.

Editor's Notes

  1. PDF
  2. Post processing“Labels” content itemBut also classifies author
  3. Thanks to Helen Atkins of AACR for this illustration.The real power of this is that the links can all go in all directions, so we take advantage of having the user’s attention regardless of how they step into our “web”Continuing Medical Education (CME)
  4. Johnny Carson