Weitere ähnliche Inhalte Ähnlich wie AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pharma-company-finding-a-way-amidst-an-ever-changing-and-data-driven-environment (20) Mehr von Dr. Haxel Consult (20) Kürzlich hochgeladen (20) AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pharma-company-finding-a-way-amidst-an-ever-changing-and-data-driven-environment1. © 2021. Access Innovations, Inc. All Rights Reserved.
Synonymy and AI
Monday 4 October 2021
Jay Ven Eman, Ph.D., CEO
Access Innovations, Inc. / Data Harmony
j_ven_eman@accessinn.com
www.accessinn.com
+1.505.998.0800
Albuquerque, NM USA
Access Innovations, Inc.
The Science behind the Semantics™
www.accessinn.com
2. © 2021. Access Innovations, Inc. All Rights Reserved.
Synonymy
! A word or phrase that means exactly or nearly the same
as another word or phrase in the same language…
! Why is it important?
Synonymy breaks search!
Synonym
! The state of being synonymous.
3. © 2021. Access Innovations, Inc. All Rights Reserved.
Non-intuitive Synonyms
! Invasive breast cancer
! Metastatic breast cancer
! Stage IV breast cancer
These all mean the same thing
in MeSH (Medical Subject Headings of the
US National Library of Medicine)
4. © 2021. Access Innovations, Inc. All Rights Reserved.
Differences in search results due to synonymy
! Invasive breast cancer: 520 results
! Metastatic breast cancer: 1803 results
! Stage IV breast cancer: 73 results
! Stage IV breast cancer: 46,400,000 results
Lack of Synonymy Control
Breaks Search
5. © 2021. Access Innovations, Inc. All Rights Reserved.
Another example
! Organochlorine
! Chlorinated hydrocarbon
!
! Chlorocarbon
!
! Organochloride
Lack of Synonymy Control
Breaks Search
6. © 2021. Access Innovations, Inc. All Rights Reserved.
Source: Synonym.com
7. © 2021. Access Innovations, Inc. All Rights Reserved.
Kingdom: Plantae
Clade: Tracheophytes
Clade: Angiosperms
Clade: Monocots
Clade: Commelinids
Order: Poales
Family: Poaceae
Subfamily: Panicoideae
Genus: Zea
Species: Z. mays
Scientific classification of Corn
Source: Wikipedia
8. © 2021. Access Innovations, Inc. All Rights Reserved.
Use search that leverages the
taxonomy with type-ahead &
merged synonymy.
! How do you deal with it?
9531 results
9. © 2021. Access Innovations, Inc. All Rights Reserved.
Semantic control and content enrichment
! Controlled vocabularies, authority files, taxonomies,
thesaurus, ontologies, triple stores, and knowledge
graphs
! Follow the standards
" Accepted Structure and Format Use
• ANSI/NISO Z39.19
• ISO2788
• BS5723
• ISO25964 Parts 1 and 2
10. © 2021. Access Innovations, Inc. All Rights Reserved.
Beware the confusing terminology
Keyword
Key phrase
Key Term
Thesaurus Term
Taxonomy term
Descriptor
Tag
Preferred term
Use term
Index term
! Controlled vocabulary
! Ontology
! Knowledge organization
System (KOS)
! Thesaurus
! Taxonomy
! Knowledge Graph
! Index
! Semantic enrichment
Refer to the standards!
11. © 2021. Access Innovations, Inc. All Rights Reserved.
What is indexing?
! Computer Science
" The purpose of storing an index is to optimize speed and performance in finding relevant documents for a search query.
" Search engine optimization indexing is the collecting, parsing, and storing of data to facilitate fast and
accurate information retrieval.
" Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics,
and computer science.
" An alternate name for the process in the context of search engines designed to find web pages on the Internet is web
indexing.
" Popular engines focus on the full-text indexing of online, natural language documents
! Index data structures
! Search engine architectures vary in the way indexing is performed and in methods of index storage to meet the various
design factors.
" Suffix tree – branching using hash tables saves time and virtual memory
" Inverted index - Stores a list of data or term occurrences in the form of a hash table or binary tree.
" Citation index - Stores the citations or hyperlinks between documents to support citation analysis, a subject
of bibliometrics.
" n-gram index - specifies the length of the term string to support other types of retrieval or text mining
" Document-term matrix Used in latent semantic analysis, stores the occurrences of words in documents in a two-
dimensional sparse matrix.
12. © 2021. Access Innovations, Inc. All Rights Reserved.
Information Science Indexing
We live in a world of overlapping synonymy
• Confusion between the indexes we build and the IT people who use
our work
Knowledge organization systems (KOS)
• Bibliographic and database indexing
• Legal indexing
• Periodical and newspaper indexing
• Subject gateways
• Website and metadata indexing
13. © 2021. Access Innovations, Inc. All Rights Reserved.
Why Auto
Index?
Tagging records with terms or pointers.
Indexing Makes Records Searchable Online
Discovery
Navigation
Findability
NOT relevance
Why auto
index?
14. © 2021. Access Innovations, Inc. All Rights Reserved.
! Website navigation
! Journal Article indexing
! Peer Review selection
! Conference paper sorting
" For tracks
" For Review
" For attendees personal meeting selection
! And more!
Besides improving search…
15. © 2021. Access Innovations, Inc. All Rights Reserved.
80,000 papers to sort
! Two weeks to
" Accept
" Reject
" Add to tracks
! Add to conference app
" Attendees stated interests
" Deliver tailored content
16. © 2021. Access Innovations, Inc. All Rights Reserved.
Some onomatopoeia
Source: Frontier Menu
! Synonym - Cinnamon
! Cinnamon rolls
! Synonym rolls?
! Rolls down hill
! Rolls-Royce
! Friend named Royce
17. © 2021. Access Innovations, Inc. All Rights Reserved.
‘Association’ problem in AI search
! The Synonym – Cinnamon
Challenge
" AI needs extensive training to
avoid taking you down the
proverbial rabbit hole
" Correlations and co-occurrence
! An applied thesauri is key
18. © 2021. Access Innovations, Inc. All Rights Reserved.
Semantically Enriching Your Content Helps
! 5% to 10% “improvement”
! Ying-Hsang Liu, “University Metadata and Retrieval: The Death of the Library Catalog?” DC-2016, Copenhagen, Denmark
! One survey:
" 75% higher books sales with more complete metadata
" 34% with just semantic enrichment
! “50% reduction in retrieval time.” The Weather Channel
19. © 2021. Access Innovations, Inc. All Rights Reserved.
Our Software and our Team of Experts
! Data Harmony
" XIS (XML Intranet System)®
" M.A.I.® (Machine Aided Indexer)
" Thesaurus Master ®
" Administration Module
" MAIChem
" Smart Submit
MAIstro™
Data Harmony
Suite
20. © 2021. Access Innovations, Inc. All Rights Reserved.
Clients
Publishing &
Media
Education
Government
Non-profits &
Societies
Health/Pharma
Manufacturing
& Retail
21. Thank you!
Access Innovations, Inc.
The Science behind the Semantics™
www.accessinn.com
Jay Ven Eman, Ph.D., CEO
Access Innovations, Inc. / Data Harmony
j_ven_eman@accessinn.com
www.accessinn.com
+1.505.998.0800
Albuquerque, NM USA