The global data sphere, consisting of machine data and human data, is growing exponentially reaching the order of zettabytes. In comparison, the processing power of computers has been stagnating for many years. Artificial Intelligence – a newer variant of Machine Learning – bypasses the need to understand a system when modelling it; however, this convenience comes with extremely high energy consumption.
The complexity of language makes statistical Natural Language Understanding (NLU) models particularly energy hungry. Since most of the zettabyte data sphere consists of human data, such as texts or social networks, we face four major obstacles:
1. Findability of Information – when truth is hard to find, fake news rule
2. Von Neumann Gap – when processors cannot process faster, then we need more of them (energy)
3. Stuck in the Average – when statistical models generate a bias toward the majority, innovation has a hard time
4. Privacy – if user profiles are created “passively” on the server side instead of “actively” on the client side, we lose control
The current approach to overcoming these limitations is to use larger and larger data sets on more and more processing nodes for training. AI algorithms should be optimized for efficiency rather than precision. In this case, statistical modelling should be disqualified as a brute force approach for language applications. When replacing statistical modelling and arithmetic, set theory and geometry seem to be a much better choice as it allows the direct processing of words instead of their occurrence counts, which is exactly what the human brain does with language – using only 7 Watts!
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
AI-SDV 2021: Francisco Webber - Efficiency is the New Precision
1. Efficiency is the New Precision
Semantic Supercomputing in the Zettabyte age
Francisco De Sousa Webber
Co-Founder & CEO
f.webber@cortical.io
2. 2
Big Bang: Data Explosion
Transactional Data
Human Files
Social Interactions
M
a
c
h
i
n
e
G
e
n
e
r
a
t
e
d
D
a
t
a
(
I
o
T
)
Terabyte
Petabyte
Exabyte
Zettabyte
Mainframe/Mini Era PC/Client Era Internet Era Virtualisation Era
2025
Data
Volume
3. 3
Current Status
ML & AI is the Answer … Is it?
Productivity Decreases
(Text) Content Increases
Current Von Neumann Computer Platform
Performance Stagnates
6. 6
Von Neumann Computing Limitations
Computing Unit
Processor
Memory
Adress Logic
Arithmetic Unit
Control Unit
I / O World
Address
Bottleneck
1000+
Instructions
Sequencial
Access
Every instruction
passes over Bus
8. 8
Statistical Machine Learning Limitations
Machine
Learning
Use Case Data
Annotated
data
Training Data
Training
Use Case
Model
Training
Engine
Inference
Engine
Data In-Flow
Data
Out-Flow
Test Data
Insufficient
Data
Slow
Training
Model
Imprecision
Inference
Latency
Manual
Effort
9. 9
Statistic AI & ML Problem: Efficiency
Need to improve the Principle not just increase the Computational Power
Findability Issue
Von Neumann Gap
Exponential Power Consumption
Million Model Multiverse
Both are cars, for people, using “water gas” - the difference between them is efficiency ….
Initial Principle - Steam Latest Principle - Hydrogen Fuel Cell
10. 10
Statistical Modelling: Findability Issue
The Google field of view
The actual Internet
The Google “virtual view” of the internet
The “blind spot”
user
internet is growing
visible internet is
growing slowly
invisible internet
is growing fast
The number of new pages
grows faster than the
number of keywords
pointing at them.
11. 11
Statistical Modelling: Von Neumann Gap
1980 1990 2000 2010 2020
Processing Speed
Data Amount
GAP
Current computing paradigm is
insufficient for the growing data load:
• Increased Error Rates
• Increased Power Consumption
• Increased Processing Delays
12. 12
Statistical Modelling: Exponential Energy Need
0%
4%
8%
2018
2030
Current Global Energy Consumption
of Computing Devices
equals that of Global Air Transport
In 2030 Global Energy
Consumption of Computing
Devices will reach that of Global
Automobile Transportation
13. 13
Statistical Modelling: Million Model Multiverse
Individually labeled data for
supervised learning
Local
Use-Case
Local
Statistical
Model
Local
Training
Data
Local
Gold
Standard
Individually trained model
Individually collected and
prepared training data
No network effects
14. 14
Statistical Modelling: Technology Impact
• Findability Issue: Fake News
• Von Neumann Gap: Climate Change
• Stuck in the Average: Innovation Gap
• Phased ML-User Profiles: Populism
When its hard to find
information its also hard to find
the truth
Statistics Averages: Innovation
is not made by Majorities
Green Computing is beyond
the Von Neumann Gap
Statistical ML Models
facilitate Opinion Meddling
15. 15
The Solution: Semantic Folding
Based on recent findings in Neuroscience
Implemented as Unsupervised Machine Learning approach
Replaces complex statistical modelling with Analogical Computation
16. 16
Semantic Folding: Analogical Computation
“signed contract” Overlap 36% ”done deal”
“star trek” Overlap 1%
Similar
Meanings
Different
Meanings
Context:
Bank,
Account,
Holder,
Payment,
Tax,
In-house,
Manager
”done deal”
17. 17
Semantic Fingerprinting
Training of the Semantic Space
Reference Material
Semantic Word Fingerprint Dictionary
Converting Text into Semantic Fingerprints
Use Case Data Semantic Text Fingerprint
Comparing Semantic Fingerprints
18. 18
Level 1: Word Fingerprints
organ
Fingerprint Generation
“organ”
Context 3:
church,
altar,
baroque,
architecture,
renaissance
Context 1:
liver,
heart,
muscle,
endothelia,
body,
anatomy
Context 2b:
piano,
guitar,
trombone,
flute,
trumpet,
quartet,
music
Contexts 2a:
composer,
baroque,
music,
score,
Johann
Sebastian
Bach
19. 19
Level 2: Text Fingerprints
organs and pianos are musical instruments
organs and pianos are musical
instruments
Aggregation + Sparsification
1 2
3
4
20. 20
Many Languages - One Semantic Fingerprint
Concepts & their Representations are Stable Across Languages
philosophy
EN
philosophie
FR
filosofía
ES
философия
RU
فلسفة
AR
哲學
ZH
21. 21
Example document Most similar documents
Ordered along the users
information need
query result set ranking
Similarity Engine
document index
NLU Primitive 1: Semantic Search
22. 22
Email 12
Semantic filter FP
Positive class
Negative class
Semantic
space
trained for
Compliance
SH1_email
AND
SH2_email
AND
SH3_email
Email 189 Email 2443
Email 12 Email 189 Email 2443
NLU Primitive 2: Semantic Classification
23. 23
Socrates 470/469 – 399 BC was a classical Greek (Athenian)
philosopher credited as one of the founders of Western
philosophy. He is an enigmatic figure known chiefly through the
accounts of classical writers, especially the writings of his students
Plato and Xenophon and the plays of his contemporary
Aristophanes. Plato's dialogues are among the most
comprehensive accounts of Socrates to survive from antiquity,
though it is unclear the degree to which Socrates himself is hidden
behind his best disciple, Plato.
Aggregation
[
“plato",
“socrates",
“philosopher",
“aristophanes",
“antiquity",
“writings",
“xenophon",
“dialogues",
“disciple",
“philosophy"
]
Text Fingerprint
Maximize for Similarity
NLU Primitive 3: Keyword Extraction Extract Keywords
Word Fingerprints
24. 24
There are a number of remedies for
snoring, but few are proven clinically
effective. Popular treatments include:
Mechanical devices. Many splints,
braces, and other devices are
available which reposition the nose,
jaw, and/or mouth in order to clear the
airways.
Nasal strips that attach like an
adhesive bandage to the bridge of the
nose are available at most drugstores,
and can help stop snoring in some
individualss. Continuous positive
airway pressure. Several surgical
procedures are available for treat? ing
chronic snoring.
Snoring usually worsens when an
individual sleeps on his or her back, so
sleeping on ones side may alleviate
the problem. Those who have difficulty
staying in a side sleeping position may
find sleeping with pillows behind them
helps them maintain the position
longer.
Retina Engine SVM
Random Forrest
DL Network
Algorithm 1
Algorithm n
• Classification
• Clustering
• Prediction
• Generating
• Computing
• Analyzing
Semantic Folding based Machine Learning
25. 25
Semantic Engine Semantic Search Semantic Annotation
Document
Classification/Clustering
Keyword Extraction
Context Term Generation
Information Discovery
Expert Finding
Text Analytics
Risk Analysis
Business Intelligence
Lease/Credit
Agreements
Cortical.io Engines
26. 26
Hardware Acceleration for Semantic Folding
Match one Query Fingerprint against
an unlimited number of Document
Fingerprints
Match one Filter Fingerprint against a
stream of incoming Fingerprints
Enterprise Search
Discovery Search
Web Search
Social Media Profile Search
Desktop: Each Board searches up to 1 Billion Fingerprints per second
Enterprise: Each Server searches up to 10 Billion Fingerprints per second
Web Scale: Each Rack searches up to 100 Billion Fingerprints per second
Real-time Document Classification
Email Filtering - Routing
DeepPacket Inspection
Social Media Topic Detection
27. 27
Semantic Super Computing Platform
Retina Engine
Converter
Module
Similar Term
Module
Context
Module
Compare
Module
Retina Database
Retina Search
Document Index
Fingerprint
Matcher
Search
Re-ranker
Retina Filter
Filter Bank
Fingerprint
Matcher
Filter
Re-ranker
[
Xilinx
Alveo
Host System
Storage
CPU-cores Memory
X86
Server
Application Server
Retina System
Administration App
Email-Filter App Semantic-Search App Next App
Integration Layer
Identity Access
Management
SMTP
Connector
RDB
Connector
CMS
Connector
DMS
Connector
File Service
Connector
Email-Filter API Semantic-Search API Next Application API
Web Service
Connector
BPM
Connector Management API
&
Monitoring API
28. 28
Comparing the Leading NLU Approaches
"The Enron Email
Corpus Archived
2011-03-08 at the
Wayback Machine"
Retrieved March 5, 2011.
Retina Engine (CPU) Retina Engine (FPGA)
Pure Keyword Baseline (CPU)
FastText (CPU)
Doc2Vec (CPU)
Word2Vec (CPU)
BERT (GPU)
BERT (CPU)
70%
75%
80%
85%
90%
0% 0% 1% 10% 100% 1000%
Precision
Speed ---> faster (logarithmic)
Classification Enron email dataset: Farmer Set
PyTorch
(bert-base-uncased)
PyTorch
(bert-base-uncased)
AWS g3.8xlarge EC2
Scikit-learn
TfidfVectorizer
official
pre-trained model
Facebook
Gensim
implementation
Pre-trained
Google Model
1 x Xilinx
Alveo 250
+
29. 29
• Banking:
• E-mail & Chat Compliance Monitoring
• Credit Risk Analysis
• CRM:
• Customer Intent Analysis
• Legal:
• Contract Intelligence
• Regulatory Process Optimization
• Financial Services:
• Investment Signal Extraction from News
Streams
• Life Sciences:
• Information Discovery
• Media:
• Viewer Stream Analytics
• Automotive:
• Handbook Search
• Car Supplier Management
• Consolidation of Car Terminology
• Technical Support:
• Support Intelligence
• Social Media:
• Organic Topic Mining
• Commerce:
• Catalogue Management & Automation
• Human Resources:
• Job Description - Resume Matching
Demonstrated Semantic Folding Use Cases
30. 30
Simplicity
One Algorithm, One Operator, One Data Format
Compositionality
Words, Sentences, Paragraphs, Documents
Analogy
Normalized Representation, Bitwise Similarity
Modelability
Unsupervised Semantic Model Generation
Efficiency
Small Amounts of Reference Data
Scaleability
One Semantic Model Many Use-Cases
Replicability
Same Use-Case in New Domain
Inspectability
Refinement, Debugging, Verification
Robustness
“Graceful Failing”
NLU by Semantic Folding
31. info@cortical.io
Global Data Sphere (Zettabyte)
Transactional Data
Machine Generated Data
Human Generated Data
Social Media Data
ML-Data
Text-ML-Data
Semantic Folding Potential Market
L
O
G
D
a
t
a
S
e
n
s
o
r
D
a
t
a
T
e
x
t
D
a
t
a
T
e
x
t
D
a
t
a
Market Potential - Semantic Folding