SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Getting beyond Keywords:
Using Conceptual
Representations of Text in
Search, Analysis, and Discovery
Roger Bradford
Agilex Technologies
16 April 2012
2
What’s Wrong with Keyword Search?
6.9 Million Documents
3
• Synonomy:
Common English Nouns have 6-8 Close Synonyms
Common English Verbs have 9-11
• Polysemy: The English Word Strike has >30
Common Meanings
• Implicit Context (e.g., John’s Project)
• Data Irregularities:
Missing
Erroneous
Contradictory
• Cross-document Relationships
Complexities of Text
4
The Problem of Synonomy
Xxxxxxxxxxxxxxx
Xxxxxxxxxxxxxxx
Xxxxxxxxxxxxxxx
Xxx Auto xxx
Xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx
User Terminology
Author Terminology
Car?
Automobile?
Motor Vehicle?
5
Large Collections ⇒⇒⇒⇒ Many Terms
to Choose from
6
The Problem of Polysemy
Miss (Baseball)?
Hit (Bowling)?
Labor Action?
Military Operation?
…
User Terminology
Intent
Strike
7
John ↔ Bob Relationship:
First Order:
Second Order:
Third Order:
JOHN
BOB
JOHN
TOM
TOM
BOB
JOHN
TOM
TOM
DAVE
DAVE
BOB
51,474
11,026,553
68,070,600
# of Relations in
5,998 Documents:
Cross-document Relationships
8
How do Semantic Representations Help?
1.Improve Efficiency
2.Overcome Errors
3.Deal with Complex Information Requirements
4.Organize Information
5.Interpret Results
6.Discover New Information
7.Work across Languages
8.Search Multiple Databases Simultaneously
9.Search Multimedia Collections
10.Support Direct Analytics
Top 10 Benefits:
9
Implementation Approaches
• Capture Semantics Manually(Labor-intensive):
Taxonomies
Dictionaries
• Approximate Semantics using Local Co-
occurrence Statistics (Low Accuracy)
• Employ Matrix Decomposition Techniques
(Computationally Intensive) :
LSI
PLSI
Ontologies
Grammars
LRA
SDD-LSI
10
8 of the Top 10 Litigation Support Platforms
use it for Analytics
6 of the Top 8 Essay Scoring Packages use
it for Holistic Scoring (SAT, GMAT, etc.)
Most Spam Filters are Based on it
It is used for Survey Analysis for Major
Corporations (Marriott, Home Depot, etc.)
It is used in the Most Advanced Literature-
based Discovery Systems
Semantic Processing: Commercial
Application Examples
11
Xxxxxxxxxxxxxxxxx
Xxxxxxxxxxxxxxxxx
Xxxxxxxxxxxxxxxxx
x automobile xx
Xxxxxxxxxxxxxxxxx
Xxxxxxxxxxxxxxxxx
Xxxxxxxxxxxxxxxxx
Conceptual Search
Car
Semantic Space
12
Xxxxxxxxxxxxxx
Xxxxxxxxxxxxxx
methods of armed
struggle not
accepted
internationally
Xxxxxxxxxxxxxxx
Xxxxxxxxxxxxxxx
Deep Conceptual Generalization is Possible
War Crimes
Semantic Space
13
Muammar Kaddafi Muammar Qadhdhafi Muammar Qadhafi
Muammar Ghadafi Muammar Gaddafi Moammar Qaddafi
Muammar Gaddaffi Muamar Qaddafi Muammar Qadhaffi
Muammar Kadafi Muammer Gadaffi Muamar Ghadaffi
Muammar Gadhafi Muammar Qhadhafi Muammar Qathafi
Muammar Kadhafi Muammar al Qadaffi Moammar Gadaffi
Mouammar Qaddafi Muammar al Qadhaffi Moammar Qadafi
Muamar Gadaffi Moammar Qadhafi Muammar Qaddafi
Automated Terminology Variant
Identification
Query = Muammar Qadaffi
Variants Found Automatically in 6.1 M News Articles:
14
X
Query Term
Transcription
Errors
Nicknames
Synonyms
OCR Errors
Trade Names
Speech Conversion Errors
Aliases
Transliteration
Differences
Comprehensive Variant Clustering
15
Interpreting Results
Today MOX is widely used in Europe and is
planned to be used in Japan. Currently over
30 reactors in Europe (Belgium, Switzerland,
Germany and France) are using MOX and a
further 20 have been licensed to do so. Japan
also plans to use MOX in around a third of its
reactors by 2010. Most reactors use it as
about one third of their core, but some will
accept up to 50% MOX assemblies. France
aims to have all its 900 MWe series of
reactors running with at least one third MOX.
Japan aims to have one third of its reactors
using MOX by 2010, and has
16
Today MOX is widely used in Europe
and is planned to be used in Japan.
Currently over 30 reactors in Europe
(Belgium, Switzerland, Germany and
France) are using MOX and a further
20 have been licensed to do so. Japan
also plans to use MOX in around a
third of its reactors by 2010. Most
reactors use it as about one third of
their core, but some will accept up to
50% MOX assemblies. France aims to
have all its 900 MWe series of reactors
running with at least one third MOX.
Japan aims to have one third of its
reactors using MOX by 2010, and has
uranium-plutonium
mixed-oxide
bnfl
reactor
reactor's
plutonium-uranium
bnfl's
nuclear-fuel
vver
Instant Context Display
17
Incoming Document Stream
Immediate
Action
Queue for
Review
No Further
Action
Conceptual Generalization + Context
“Understanding” ⇒⇒⇒⇒ Accurate Categorization
18
Semantic Processing ⇒⇒⇒⇒ Noise Tolerance
0.0%
20.0%
40.0%
60.0%
80.0%
100.0%
0.0% 20.0% 40.0% 60.0% 80.0% 100.0%
Percentage of Words Corrupted
CategorizationAccuracy
ComparedtoBaseline
19
Sample Document with 70% Word Error Rate
– Still Categorized Correctly
EORE MA4EUE uIqTED AT CAJFMARHUILLA
Peru's stnte minerals markeing Arm, Minero
PFou CVvercial SA (Mi1peco)C ifted a force majeure n
zinc iWgotshipmenfwfrom 2he country's biggesD zioc
sefintry at CajUmarquivla,Vaspekus an said.
The spofesmanBsaid the p oblems affectiKg
sulphurxg a iT and roast j plants that Uad
hgltedWproduction sinUe May p had bsen resolvFd.
D Howvjr,Khe BTid producti6 of znc ingots
thisLyearPwas expe ted to fall to aro nd 86,000 tonnes
thistyear at CajamruuiwpT, from 9wT000 Tonne7 in
198x becauNe of1the f5oppage
O Thm refVuery has an opKimum annuv pjoduotibn
capwcity of 100,000 tonn s but its highest qroueio ca
969000 tonnes of refined zRnc zLgots inw1985, the
spokebman sa d.
20
Application: Sentiment Detection
Mideast
Sentiment
0400 hrs
5 August 2002
Automatic
Detection in
News Articles
21
Good, Quality, Standard
Nice hotel, Clean
Lobby, Small
Enjoyed stay
Bathrooms, Hair, Plug
Showers, Water, Head, Floor
Smoking room, Pool area, Cigarettes
Light, Poor, Dark, Bedside
Work properly, Flushing
Excellent staff
Property, View, Beautiful hotel
Noise, Traffic, Construction, Sleep
5,000 Surveys
from Large
Hotel Chain
Automatic Taxonomy Generation
426
253
410
348
281
260
231
251
250
242
198
192
Internet access, High-speed internet172
22
Application: Summarization
• So I ask you to join me in expanding
their access to child care, creating new
hiring preferences for military spouses
across the federal government, and
allowing our troops to transfer their
unused education benefits to their
spouses or children.
• And tonight, I ask Congress to support
an innovative proposal to provide food
assistance by purchasing crops directly
from farmers in the developing world, so
we can build up local agriculture and
help break the cycle of famine.
45-
minute
Speech
23
Represent Complex Information Requirements (I)
Job
Descriptions
E-mail Alert
to Hiring
Manager
Résumé
For One Fortune 500 Company ⇒⇒⇒⇒
Savings of >3 Days in Average
Applicant Processing Time
24
Represent Complex Information Requirements (II)
Sections of Previous Proposals
Draft
Proposal
InputsRFP
Subsection One Fortune 500 Company
Estimates Savings of >$10
Million per Year
25
Zukas, A., GO-Driven Literature-Based Discovery using Semantic Analysis, MS Thesis, George Mason University, 2007.
• PubMed Abstracts
• Gene – Function Relationships
Derived Semantically
• 98,074,359 Potential Gene-
function Associations.
Literature-based Discovery (I)
26
Latent Gene and Function
Relationships from the June 2006
Gene Ontology - Later Documented
in the January 2007 Gene Ontology
Literature-based Discovery (II)
27
Social Media – Goldmine of Public Opinion?
28
Volume – 200 Million
Tweets per Day
Worldwide
Velocity – Minutes to
Hours
Variety - 17 Languages
Variability – Very Low
Signal to Noise Ratio
28
Processing Social Media = Big Data Problem
29
Cross-lingual Retrieval
Farsi
English
Arabic
English
English
Doc
Retrieved Documents
in Correct Relevance
Order
Documents in
Multiple
Languages
30
Interleaved Results Display
31
Federation of Queries
Database
#1
Database
#2
Database
#3
Merged
Result
List
32
Buyer Seller Material Amount Date
John
Smith
Ace
Jewelers
Diamond
Ring
$3,000 8/18/11
RDBMS
Semantic
Representation Space
Rows = Mini-documents
Text
Combined Search of Structured and
Unstructured Information
33
Where is Semantic Analysis Going?
1.Larger Databases (100s of Millions of
Documents +)
2.Multimedia Collections
Text
Relational Data
Audio
Video
3.Visualization
4.Analytics
34
0
100
200
300
400
500
0 25 50 75 100
BuildTime(hrs)
Millions of Documents
2010
2009
2007
2011
Hardware Advances ⇒⇒⇒⇒ Large Applications
35
Integrated
Analytics
Structured Data
Images
Multi-
lingual Text
Audio
Sensor Data
Video
Integration of Multimedia Data
Buyer Seller Material Amount Date
John
Smith
Ace
Jewelers
Diamond
Ring
3 Carat 8/18/06
36
Analytics + Visualization
37
Questions or Comments
Roger Bradford
Agilex Technologies Inc
703-889-3916
r.bradford@agilex.com

Weitere ähnliche Inhalte

Andere mochten auch

II-SDV 2013 Graphical Representation of the Assessment of Inventive Step for ...
II-SDV 2013 Graphical Representation of the Assessment of Inventive Step for ...II-SDV 2013 Graphical Representation of the Assessment of Inventive Step for ...
II-SDV 2013 Graphical Representation of the Assessment of Inventive Step for ...
Dr. Haxel Consult
 
II-SDV 2014 Standing on the Shoulders of Giants: New strategies to involve mo...
II-SDV 2014 Standing on the Shoulders of Giants: New strategies to involve mo...II-SDV 2014 Standing on the Shoulders of Giants: New strategies to involve mo...
II-SDV 2014 Standing on the Shoulders of Giants: New strategies to involve mo...
Dr. Haxel Consult
 
II-SDV 2012 Expert System Driven Insights into Patent Quality and Competitive...
II-SDV 2012 Expert System Driven Insights into Patent Quality and Competitive...II-SDV 2012 Expert System Driven Insights into Patent Quality and Competitive...
II-SDV 2012 Expert System Driven Insights into Patent Quality and Competitive...
Dr. Haxel Consult
 
II-SDV 2014 Recommender Systems for Analysis Applications (Roger Bradford - A...
II-SDV 2014 Recommender Systems for Analysis Applications (Roger Bradford - A...II-SDV 2014 Recommender Systems for Analysis Applications (Roger Bradford - A...
II-SDV 2014 Recommender Systems for Analysis Applications (Roger Bradford - A...
Dr. Haxel Consult
 
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...
Dr. Haxel Consult
 
II-SDV 2014 Product Presentations Minsoft
II-SDV 2014 Product Presentations MinsoftII-SDV 2014 Product Presentations Minsoft
II-SDV 2014 Product Presentations Minsoft
Dr. Haxel Consult
 
II-SDV 14 Product Presentations Cambridge Intelligence
II-SDV 14 Product Presentations Cambridge IntelligenceII-SDV 14 Product Presentations Cambridge Intelligence
II-SDV 14 Product Presentations Cambridge Intelligence
Dr. Haxel Consult
 
II-SDV 2013 Tweet Mining: Is it Useful and Should we Bother?
II-SDV 2013 Tweet Mining: Is it Useful and Should we Bother? II-SDV 2013 Tweet Mining: Is it Useful and Should we Bother?
II-SDV 2013 Tweet Mining: Is it Useful and Should we Bother?
Dr. Haxel Consult
 
II-SDV 2014 Product Presentations Search Technologies
II-SDV 2014 Product Presentations Search TechnologiesII-SDV 2014 Product Presentations Search Technologies
II-SDV 2014 Product Presentations Search Technologies
Dr. Haxel Consult
 
II-SDV 2014 Product Presentations THOMSON REUTERS
II-SDV 2014 Product Presentations THOMSON REUTERSII-SDV 2014 Product Presentations THOMSON REUTERS
II-SDV 2014 Product Presentations THOMSON REUTERS
Dr. Haxel Consult
 
II-SDV 2013 Customising Statistics for Sharper Analysis
II-SDV 2013 Customising Statistics for Sharper AnalysisII-SDV 2013 Customising Statistics for Sharper Analysis
II-SDV 2013 Customising Statistics for Sharper Analysis
Dr. Haxel Consult
 
II-SDV 2013 Big Data Triage with Text Analytics
II-SDV 2013 Big Data Triage with Text AnalyticsII-SDV 2013 Big Data Triage with Text Analytics
II-SDV 2013 Big Data Triage with Text Analytics
Dr. Haxel Consult
 
II-SDV 2013 The Challenge of Finding and Using Appropriate Tools for Competit...
II-SDV 2013 The Challenge of Finding and Using Appropriate Tools for Competit...II-SDV 2013 The Challenge of Finding and Using Appropriate Tools for Competit...
II-SDV 2013 The Challenge of Finding and Using Appropriate Tools for Competit...
Dr. Haxel Consult
 
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...
Dr. Haxel Consult
 
II-SDV 2013 Challenges in Building a Future Search Centre: our Observations a...
II-SDV 2013 Challenges in Building a Future Search Centre: our Observations a...II-SDV 2013 Challenges in Building a Future Search Centre: our Observations a...
II-SDV 2013 Challenges in Building a Future Search Centre: our Observations a...
Dr. Haxel Consult
 
II-SDV 2013 Large scale Application of Text Mining and Visualization in the E...
II-SDV 2013 Large scale Application of Text Mining and Visualization in the E...II-SDV 2013 Large scale Application of Text Mining and Visualization in the E...
II-SDV 2013 Large scale Application of Text Mining and Visualization in the E...
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 

Andere mochten auch (20)

II-SDV 2013 Graphical Representation of the Assessment of Inventive Step for ...
II-SDV 2013 Graphical Representation of the Assessment of Inventive Step for ...II-SDV 2013 Graphical Representation of the Assessment of Inventive Step for ...
II-SDV 2013 Graphical Representation of the Assessment of Inventive Step for ...
 
II-SDV 2014 Standing on the Shoulders of Giants: New strategies to involve mo...
II-SDV 2014 Standing on the Shoulders of Giants: New strategies to involve mo...II-SDV 2014 Standing on the Shoulders of Giants: New strategies to involve mo...
II-SDV 2014 Standing on the Shoulders of Giants: New strategies to involve mo...
 
II-SDV 2012 Expert System Driven Insights into Patent Quality and Competitive...
II-SDV 2012 Expert System Driven Insights into Patent Quality and Competitive...II-SDV 2012 Expert System Driven Insights into Patent Quality and Competitive...
II-SDV 2012 Expert System Driven Insights into Patent Quality and Competitive...
 
II-SDV 2014 Recommender Systems for Analysis Applications (Roger Bradford - A...
II-SDV 2014 Recommender Systems for Analysis Applications (Roger Bradford - A...II-SDV 2014 Recommender Systems for Analysis Applications (Roger Bradford - A...
II-SDV 2014 Recommender Systems for Analysis Applications (Roger Bradford - A...
 
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...
 
II-SDV 2014 Product Presentations Minsoft
II-SDV 2014 Product Presentations MinsoftII-SDV 2014 Product Presentations Minsoft
II-SDV 2014 Product Presentations Minsoft
 
II-SDV 14 Product Presentations Cambridge Intelligence
II-SDV 14 Product Presentations Cambridge IntelligenceII-SDV 14 Product Presentations Cambridge Intelligence
II-SDV 14 Product Presentations Cambridge Intelligence
 
II-SDV 2013 Tweet Mining: Is it Useful and Should we Bother?
II-SDV 2013 Tweet Mining: Is it Useful and Should we Bother? II-SDV 2013 Tweet Mining: Is it Useful and Should we Bother?
II-SDV 2013 Tweet Mining: Is it Useful and Should we Bother?
 
II-SDV 2014 Product Presentations Search Technologies
II-SDV 2014 Product Presentations Search TechnologiesII-SDV 2014 Product Presentations Search Technologies
II-SDV 2014 Product Presentations Search Technologies
 
ICIC 2014 New Product Introduction Wiley
ICIC 2014 New Product Introduction WileyICIC 2014 New Product Introduction Wiley
ICIC 2014 New Product Introduction Wiley
 
ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future?
ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future? ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future?
ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future?
 
II-SDV 2014 Product Presentations THOMSON REUTERS
II-SDV 2014 Product Presentations THOMSON REUTERSII-SDV 2014 Product Presentations THOMSON REUTERS
II-SDV 2014 Product Presentations THOMSON REUTERS
 
II-SDV 2013 Customising Statistics for Sharper Analysis
II-SDV 2013 Customising Statistics for Sharper AnalysisII-SDV 2013 Customising Statistics for Sharper Analysis
II-SDV 2013 Customising Statistics for Sharper Analysis
 
II-SDV 2013 Big Data Triage with Text Analytics
II-SDV 2013 Big Data Triage with Text AnalyticsII-SDV 2013 Big Data Triage with Text Analytics
II-SDV 2013 Big Data Triage with Text Analytics
 
II-SDV 2013 The Challenge of Finding and Using Appropriate Tools for Competit...
II-SDV 2013 The Challenge of Finding and Using Appropriate Tools for Competit...II-SDV 2013 The Challenge of Finding and Using Appropriate Tools for Competit...
II-SDV 2013 The Challenge of Finding and Using Appropriate Tools for Competit...
 
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...
 
II-SDV 2013 Challenges in Building a Future Search Centre: our Observations a...
II-SDV 2013 Challenges in Building a Future Search Centre: our Observations a...II-SDV 2013 Challenges in Building a Future Search Centre: our Observations a...
II-SDV 2013 Challenges in Building a Future Search Centre: our Observations a...
 
II-SDV 2013 Large scale Application of Text Mining and Visualization in the E...
II-SDV 2013 Large scale Application of Text Mining and Visualization in the E...II-SDV 2013 Large scale Application of Text Mining and Visualization in the E...
II-SDV 2013 Large scale Application of Text Mining and Visualization in the E...
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 

Ähnlich wie II-SDV 2012 Getting beyond Keywords: Using Conceptual Representations of Text in Search, Analysis and Discovery

The Forces That Are Transforming How Products Are MadeBeth Ambru.docx
The Forces That Are Transforming How Products Are MadeBeth Ambru.docxThe Forces That Are Transforming How Products Are MadeBeth Ambru.docx
The Forces That Are Transforming How Products Are MadeBeth Ambru.docx
cherry686017
 
ECO 610 Managerial Economics Group Assignment 2 .docx
ECO 610 Managerial Economics  Group Assignment 2  .docxECO 610 Managerial Economics  Group Assignment 2  .docx
ECO 610 Managerial Economics Group Assignment 2 .docx
madlynplamondon
 

Ähnlich wie II-SDV 2012 Getting beyond Keywords: Using Conceptual Representations of Text in Search, Analysis and Discovery (9)

The Forces That Are Transforming How Products Are MadeBeth Ambru.docx
The Forces That Are Transforming How Products Are MadeBeth Ambru.docxThe Forces That Are Transforming How Products Are MadeBeth Ambru.docx
The Forces That Are Transforming How Products Are MadeBeth Ambru.docx
 
FCB Partners Webinar: Robots Are the Next Blackbelts
FCB Partners Webinar: Robots Are the Next BlackbeltsFCB Partners Webinar: Robots Are the Next Blackbelts
FCB Partners Webinar: Robots Are the Next Blackbelts
 
UN Economic Commission on Latin America and the Caribbean 2014
UN Economic Commission on Latin America and the Caribbean 2014UN Economic Commission on Latin America and the Caribbean 2014
UN Economic Commission on Latin America and the Caribbean 2014
 
UN ECLAC 2014 Santiago, Chile
UN ECLAC 2014 Santiago, Chile UN ECLAC 2014 Santiago, Chile
UN ECLAC 2014 Santiago, Chile
 
Www Sahaj Marg Org Essay Event In Hindi. Online assignment writing service.
Www Sahaj Marg Org Essay Event In Hindi. Online assignment writing service.Www Sahaj Marg Org Essay Event In Hindi. Online assignment writing service.
Www Sahaj Marg Org Essay Event In Hindi. Online assignment writing service.
 
PRObE
PRObEPRObE
PRObE
 
ECO 610 Managerial Economics Group Assignment 2 .docx
ECO 610 Managerial Economics  Group Assignment 2  .docxECO 610 Managerial Economics  Group Assignment 2  .docx
ECO 610 Managerial Economics Group Assignment 2 .docx
 
Rethink x+humanity+report
Rethink x+humanity+reportRethink x+humanity+report
Rethink x+humanity+report
 
Technological advancement and its impacts on the world of work and society
Technological advancement and its impacts on the world of work and societyTechnological advancement and its impacts on the world of work and society
Technological advancement and its impacts on the world of work and society
 

Mehr von Dr. Haxel Consult

AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
Dr. Haxel Consult
 

Mehr von Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Kürzlich hochgeladen

call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
ayvbos
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Monica Sydney
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
pxcywzqs
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
ydyuyu
 
一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理
F
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
JOHNBEBONYAP1
 
一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理
F
 

Kürzlich hochgeladen (20)

Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsMira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 
一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 

II-SDV 2012 Getting beyond Keywords: Using Conceptual Representations of Text in Search, Analysis and Discovery

  • 1. Getting beyond Keywords: Using Conceptual Representations of Text in Search, Analysis, and Discovery Roger Bradford Agilex Technologies 16 April 2012
  • 2. 2 What’s Wrong with Keyword Search? 6.9 Million Documents
  • 3. 3 • Synonomy: Common English Nouns have 6-8 Close Synonyms Common English Verbs have 9-11 • Polysemy: The English Word Strike has >30 Common Meanings • Implicit Context (e.g., John’s Project) • Data Irregularities: Missing Erroneous Contradictory • Cross-document Relationships Complexities of Text
  • 4. 4 The Problem of Synonomy Xxxxxxxxxxxxxxx Xxxxxxxxxxxxxxx Xxxxxxxxxxxxxxx Xxx Auto xxx Xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx User Terminology Author Terminology Car? Automobile? Motor Vehicle?
  • 5. 5 Large Collections ⇒⇒⇒⇒ Many Terms to Choose from
  • 6. 6 The Problem of Polysemy Miss (Baseball)? Hit (Bowling)? Labor Action? Military Operation? … User Terminology Intent Strike
  • 7. 7 John ↔ Bob Relationship: First Order: Second Order: Third Order: JOHN BOB JOHN TOM TOM BOB JOHN TOM TOM DAVE DAVE BOB 51,474 11,026,553 68,070,600 # of Relations in 5,998 Documents: Cross-document Relationships
  • 8. 8 How do Semantic Representations Help? 1.Improve Efficiency 2.Overcome Errors 3.Deal with Complex Information Requirements 4.Organize Information 5.Interpret Results 6.Discover New Information 7.Work across Languages 8.Search Multiple Databases Simultaneously 9.Search Multimedia Collections 10.Support Direct Analytics Top 10 Benefits:
  • 9. 9 Implementation Approaches • Capture Semantics Manually(Labor-intensive): Taxonomies Dictionaries • Approximate Semantics using Local Co- occurrence Statistics (Low Accuracy) • Employ Matrix Decomposition Techniques (Computationally Intensive) : LSI PLSI Ontologies Grammars LRA SDD-LSI
  • 10. 10 8 of the Top 10 Litigation Support Platforms use it for Analytics 6 of the Top 8 Essay Scoring Packages use it for Holistic Scoring (SAT, GMAT, etc.) Most Spam Filters are Based on it It is used for Survey Analysis for Major Corporations (Marriott, Home Depot, etc.) It is used in the Most Advanced Literature- based Discovery Systems Semantic Processing: Commercial Application Examples
  • 12. 12 Xxxxxxxxxxxxxx Xxxxxxxxxxxxxx methods of armed struggle not accepted internationally Xxxxxxxxxxxxxxx Xxxxxxxxxxxxxxx Deep Conceptual Generalization is Possible War Crimes Semantic Space
  • 13. 13 Muammar Kaddafi Muammar Qadhdhafi Muammar Qadhafi Muammar Ghadafi Muammar Gaddafi Moammar Qaddafi Muammar Gaddaffi Muamar Qaddafi Muammar Qadhaffi Muammar Kadafi Muammer Gadaffi Muamar Ghadaffi Muammar Gadhafi Muammar Qhadhafi Muammar Qathafi Muammar Kadhafi Muammar al Qadaffi Moammar Gadaffi Mouammar Qaddafi Muammar al Qadhaffi Moammar Qadafi Muamar Gadaffi Moammar Qadhafi Muammar Qaddafi Automated Terminology Variant Identification Query = Muammar Qadaffi Variants Found Automatically in 6.1 M News Articles:
  • 14. 14 X Query Term Transcription Errors Nicknames Synonyms OCR Errors Trade Names Speech Conversion Errors Aliases Transliteration Differences Comprehensive Variant Clustering
  • 15. 15 Interpreting Results Today MOX is widely used in Europe and is planned to be used in Japan. Currently over 30 reactors in Europe (Belgium, Switzerland, Germany and France) are using MOX and a further 20 have been licensed to do so. Japan also plans to use MOX in around a third of its reactors by 2010. Most reactors use it as about one third of their core, but some will accept up to 50% MOX assemblies. France aims to have all its 900 MWe series of reactors running with at least one third MOX. Japan aims to have one third of its reactors using MOX by 2010, and has
  • 16. 16 Today MOX is widely used in Europe and is planned to be used in Japan. Currently over 30 reactors in Europe (Belgium, Switzerland, Germany and France) are using MOX and a further 20 have been licensed to do so. Japan also plans to use MOX in around a third of its reactors by 2010. Most reactors use it as about one third of their core, but some will accept up to 50% MOX assemblies. France aims to have all its 900 MWe series of reactors running with at least one third MOX. Japan aims to have one third of its reactors using MOX by 2010, and has uranium-plutonium mixed-oxide bnfl reactor reactor's plutonium-uranium bnfl's nuclear-fuel vver Instant Context Display
  • 17. 17 Incoming Document Stream Immediate Action Queue for Review No Further Action Conceptual Generalization + Context “Understanding” ⇒⇒⇒⇒ Accurate Categorization
  • 18. 18 Semantic Processing ⇒⇒⇒⇒ Noise Tolerance 0.0% 20.0% 40.0% 60.0% 80.0% 100.0% 0.0% 20.0% 40.0% 60.0% 80.0% 100.0% Percentage of Words Corrupted CategorizationAccuracy ComparedtoBaseline
  • 19. 19 Sample Document with 70% Word Error Rate – Still Categorized Correctly EORE MA4EUE uIqTED AT CAJFMARHUILLA Peru's stnte minerals markeing Arm, Minero PFou CVvercial SA (Mi1peco)C ifted a force majeure n zinc iWgotshipmenfwfrom 2he country's biggesD zioc sefintry at CajUmarquivla,Vaspekus an said. The spofesmanBsaid the p oblems affectiKg sulphurxg a iT and roast j plants that Uad hgltedWproduction sinUe May p had bsen resolvFd. D Howvjr,Khe BTid producti6 of znc ingots thisLyearPwas expe ted to fall to aro nd 86,000 tonnes thistyear at CajamruuiwpT, from 9wT000 Tonne7 in 198x becauNe of1the f5oppage O Thm refVuery has an opKimum annuv pjoduotibn capwcity of 100,000 tonn s but its highest qroueio ca 969000 tonnes of refined zRnc zLgots inw1985, the spokebman sa d.
  • 20. 20 Application: Sentiment Detection Mideast Sentiment 0400 hrs 5 August 2002 Automatic Detection in News Articles
  • 21. 21 Good, Quality, Standard Nice hotel, Clean Lobby, Small Enjoyed stay Bathrooms, Hair, Plug Showers, Water, Head, Floor Smoking room, Pool area, Cigarettes Light, Poor, Dark, Bedside Work properly, Flushing Excellent staff Property, View, Beautiful hotel Noise, Traffic, Construction, Sleep 5,000 Surveys from Large Hotel Chain Automatic Taxonomy Generation 426 253 410 348 281 260 231 251 250 242 198 192 Internet access, High-speed internet172
  • 22. 22 Application: Summarization • So I ask you to join me in expanding their access to child care, creating new hiring preferences for military spouses across the federal government, and allowing our troops to transfer their unused education benefits to their spouses or children. • And tonight, I ask Congress to support an innovative proposal to provide food assistance by purchasing crops directly from farmers in the developing world, so we can build up local agriculture and help break the cycle of famine. 45- minute Speech
  • 23. 23 Represent Complex Information Requirements (I) Job Descriptions E-mail Alert to Hiring Manager Résumé For One Fortune 500 Company ⇒⇒⇒⇒ Savings of >3 Days in Average Applicant Processing Time
  • 24. 24 Represent Complex Information Requirements (II) Sections of Previous Proposals Draft Proposal InputsRFP Subsection One Fortune 500 Company Estimates Savings of >$10 Million per Year
  • 25. 25 Zukas, A., GO-Driven Literature-Based Discovery using Semantic Analysis, MS Thesis, George Mason University, 2007. • PubMed Abstracts • Gene – Function Relationships Derived Semantically • 98,074,359 Potential Gene- function Associations. Literature-based Discovery (I)
  • 26. 26 Latent Gene and Function Relationships from the June 2006 Gene Ontology - Later Documented in the January 2007 Gene Ontology Literature-based Discovery (II)
  • 27. 27 Social Media – Goldmine of Public Opinion?
  • 28. 28 Volume – 200 Million Tweets per Day Worldwide Velocity – Minutes to Hours Variety - 17 Languages Variability – Very Low Signal to Noise Ratio 28 Processing Social Media = Big Data Problem
  • 32. 32 Buyer Seller Material Amount Date John Smith Ace Jewelers Diamond Ring $3,000 8/18/11 RDBMS Semantic Representation Space Rows = Mini-documents Text Combined Search of Structured and Unstructured Information
  • 33. 33 Where is Semantic Analysis Going? 1.Larger Databases (100s of Millions of Documents +) 2.Multimedia Collections Text Relational Data Audio Video 3.Visualization 4.Analytics
  • 34. 34 0 100 200 300 400 500 0 25 50 75 100 BuildTime(hrs) Millions of Documents 2010 2009 2007 2011 Hardware Advances ⇒⇒⇒⇒ Large Applications
  • 35. 35 Integrated Analytics Structured Data Images Multi- lingual Text Audio Sensor Data Video Integration of Multimedia Data Buyer Seller Material Amount Date John Smith Ace Jewelers Diamond Ring 3 Carat 8/18/06
  • 37. 37 Questions or Comments Roger Bradford Agilex Technologies Inc 703-889-3916 r.bradford@agilex.com