AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology, USA)

Copyright ©1997-2022 Search Technology, Inc. TheVantagePoint.com | 1
Nils Newman | October 10, 2022
Finding the WHAT
Will AI help?

The WHAT - How to find concepts in Text
• For a computer, finding
concepts within text is an
ongoing struggle
• How can machines help us
find concepts without us
reading?
• What can machines actually
find?
• How will AI change things?
NOUNS
Machines do not understand
what they are “reading”

Two Main Approaches
• There are two main
approaches to finding
WHAT in a document
➢ Natural Language
Processing (NLP)
➢ Machine Learning (ML)

Natural Language Processing
• NLP is about finding WHAT
through the structure of language
• Based on learning from the
structure of language either
through programming or learning
from documents
• Uses semantic and syntactic rules
to “understand” text
• Usually language specific
• Projects are trying to generalize
across languages

Natural Language Processing
• NLP Requires Training!
• Even if done by someone else
such as Google’s Universal Parsey
• Training is particularly important
if you are interested in technical
topics which do not adhere to
normal sentence structure (for
instance – a patent)
• Some of this training might have
to be supervised (humans)

NER– NLP’s Concept Shortcut
• Named Entity
Recognition (NER) targets
specific types of entities
such as:
➢ People
➢ Places
➢ Things
• For example:
• Geographic Names
• Chemical Names
• Pharma Concepts

NER
• NER still requires training
but if you are working in
an area with a
constrained vocabulary,
NER can save a lot of time
and effort
*Text Courtesy of Wikipedia

Machine Learning: AKA Alphabet Soup
• Machine Learning in Concept
Extraction is all about finding patterns
• Decades of research have produced
many different approaches:
• LSI
• LSA
• PCA
• SVM
• MI
• TM
• Etc..

Machine Learning: Patterns via Math
• The core of many of these techniques is finding
patterns using math with little explicit instruction
(no rules given)
• The math runs on your data to look for
connections between items and will find them on
its own
• The advantage of this approach is you do not have
to know what you are looking for
• The disadvantage is sometimes the output is
rubbish
• The other issue is many of these approaches give
a collection of related terms but giving it a name
is up to the human

Impact of AI on NLP
• Natural Language Processing now
merging with AI
• NLP was transformed by the BERT
language models (Sci-BERT, Bio-
BERT, FinBERT, RoBERTa, ALBERT, etc..)
• GPT also impactful but not open-source
• The technique works because
enormous training sets form the
foundation
• Original BERT used BookCorpus
(800 million words) and English
Wikipedia (2,500 million words)

Impact of AI on Machine Learning
• Machine Learning can be considered a
branch of AI
• The distinction is in the level of
training
• The latest round of AI development
combined with the access to a lot of
unsupervised data, means that ML-
based concept extraction may be
drawing on training without you
knowing it
• For example: Deep Learning

AI + ML + NLP
• AI has facilitated the fusion of ML
with NLP to improve concept
identification
• NLP has the language structure, AI
gives the ability to learn, and ML
enhances that learning by looking
for patterns, particularly patterns
not seen before
• For example, NER systems, given
some initial training, can learn on
their own using ML techniques+ AI
learning models

Beware the easy WHAT
• Finding the WHAT in records is still a real challenge
• Is WHAT a Concept or a Word?
➢ The Analyst’s WHAT
• An analyst with Subject Matter Expertise has an expected WHAT in mind
when they look at data based on their own knowledge. So their WHAT is
sometimes not represented in the data. They are often looking for higher
order concepts.
➢ The Data WHAT
• Algorithms let the data speak for itself. The WHAT is the word in the data.
• The two WHAT’s often do not agree
• But AI is working to solve that as well…..

Words vs. Concept
• Looking at a set of words and
associating them with a concept is
not beyond the scope of AI - with
proper training
• In constrained lexicons, it is very
possible now – for example,
screening existing drugs to
repurpose for COVID or Google’s ill-
fated human impersonating Duplex
• However, a general model is not on
the horizon

Questions?

AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology, USA)

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology, USA)

Ähnlich wie AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology, USA) (20)

Mehr von Dr. Haxel Consult

Mehr von Dr. Haxel Consult (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology, USA)