Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017

•

3 gefällt mir•938 views

This document provides an overview of representation learning techniques used at Red Hat, including word2vec, doc2vec, url2vec, and customer2vec. Word2vec is used to learn word embeddings from text, while doc2vec extends it to learn embeddings for documents. Url2vec and customer2vec apply the same technique to learn embeddings for URLs and customer accounts based on browsing behavior. These embeddings can be used for tasks like search, troubleshooting, and data-driven customer segmentation. Duplicate detection is another application, where title and content embeddings are compared. Representation learning is also explored for baseball players to model player value.

REPRESENTATION
LEARNING @ RED HAT
Michael A. Alcorn (malcorn@redhat.com)
Machine Learning Engineer - Information Retrieval
https://sites.google.com/view/michaelaalcorn/
1

Outline
Background
word2vec/url2vec
doc2vec/account2vec
Duplicate Detection
(batter|pitcher)2vec
MLconf Blog
2

Background
Why?
Small amount (zero?) of labeled data for task
Lots of unlabeled data (labeled data for a different
task?)
Can we use large amounts of unlabeled data to make
better predictions?
Not the same as traditional unsupervised learning!
in Goodfellow et al.'s Deep Learning
textbook
by Bengio et al.
Representation learning
Transfer learning
Excellent chapter
Article
3

word2vec
ew
TextTextTextText
NVIDIA - " "Introduction to Neural Machine Translation with GPUs (Part 2)
4

word2vec
ew
Deeplearning4j - " "
Mikolov et al. (2013)
Word2vec
5

word2vec
Analogies
"x is to y as ? is to z" x - y + z = ?
bash - shellshock + heartbleed = openssl
ﬁrefox - linux + windows = internet_explorer
openshift - cloud + storage = gluster
rhn_register - rhn + rhsm = subscription-
manager
=+—
6

Naming Colors
mapping RGB values to
color names
Results are pretty underwhelming for those in the
know
Can word embeddings improve ( )?
Blog post by Janelle Shane
GitHub
7

url2vec
Tasks concerning URLs
Search - returning relevant content
Troubleshooting - recommending related articles
Obvious method - look at text
Alternative/enhanced method - use customer
browsing behavior as additional contextual clues
8

url2vec
How?
Treat each day of browsing activity as a "sentence"
Treat each URL as a "word"
Run word2vec!
9

url2vec
https://access.redhat.com/solutions/25190
https://access.redhat.com/solutions/10107
Application: ScatterPlot3D
10

doc2vec
" "
Le and Mikolov (2014)
NLP 05: From Word2vec to Doc2vec: a simple example with Gensim
11

customer2vec
Why?
Data-driven segmentation
Same idea as url2vec except now we treat each account as
a "document" of many "sentences" (different browsing
days)
12

customer2vec
Why?
Data-driven segmentation
Same idea as url2vec except now we treat each account as
a "document" of many "sentences" (different browsing
days)
13

customer2vec
14

Duplicate Detection
There are a number of "duplicate" KCS solutions on
the Customer Portal
Muddy search results
How can we identify candidate duplicate documents?
Obvious approach - compare text (e.g., tf-idf)
Bag-of-words loses any structural meaning behind text
Can we learn better representations?
Title is essentially a summary of the solution content
Learn representations of body that are similar to title
representations (like the DSSM; )my code
15

Deep Semantic Similarity Model
Jianfeng Gao - " "Deep Learning for Web Search and Natural Language Processing
16

(batter|pitcher)2vec ( )GitHub
Can we learn meaningful representations of MLB
players?
Accurate representations could be used to simulate
games and inform trades
Find undervalued/overvalued players
17

Can we learn meaningful representations of MLB
players?
Accurate representations could be used to simulate
games and inform trades
Find undervalued/overvalued players
(batter|pitcher)2vec ( )GitHub
18

Can we learn meaningful representations of MLB
players?
Accurate representations could be used to simulate
games and inform trades
Find undervalued/overvalued players
SI.com NBCSports.com
=+— LR
(batter|pitcher)2vec ( )GitHub
19

(batter|pitcher)2vec
"
"
Learning to Coach
Football
Wang and Zemel (2016)
20

THANK YOU!
21

Empfohlen

Jonas Schneider, Head of Engineering for Robotics, OpenAI

Jonas Schneider, Head of Engineering for Robotics, OpenAI

Jonas Schneider, Head of Engineering for Robotics, OpenAIMLconf

Assisting Code Search with Automatic Query Reformulation for Bug Localization

Assisting Code Search with Automatic Query Reformulation for Bug Localization

Assisting Code Search with Automatic Query Reformulation for Bug LocalizationBunyamin Sisman

Basics of Python and Intro to Machine Learning

Basics of Python and Intro to Machine Learning

Basics of Python and Intro to Machine LearningManish Maharjan

PhD Projects in Computer Science Research Assistance

PhD Projects in Computer Science Research Assistance

PhD Projects in Computer Science Research AssistancePhD Services

Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...

Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...

Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...MLconf

Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017

Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017

Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017MLconf

Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...

Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...

Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...MLconf

Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...

Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...

Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...MLconf

Empfohlen

Jonas Schneider, Head of Engineering for Robotics, OpenAI

Jonas Schneider, Head of Engineering for Robotics, OpenAI

Jonas Schneider, Head of Engineering for Robotics, OpenAIMLconf

Assisting Code Search with Automatic Query Reformulation for Bug Localization

Assisting Code Search with Automatic Query Reformulation for Bug Localization

Assisting Code Search with Automatic Query Reformulation for Bug LocalizationBunyamin Sisman

Basics of Python and Intro to Machine Learning

Basics of Python and Intro to Machine Learning

Basics of Python and Intro to Machine LearningManish Maharjan

PhD Projects in Computer Science Research Assistance

PhD Projects in Computer Science Research Assistance

PhD Projects in Computer Science Research AssistancePhD Services

Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...

Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...

Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...MLconf

Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017

Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017

Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017MLconf

Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...

Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...

Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...MLconf

Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...

Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...

Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...MLconf

Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Talha Obaid, Email Security, Symantec at MLconf ATL 2017MLconf

Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017

Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017

Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017MLconf

Ashfaq Munshi, ML7 Fellow, Pepperdata

Ashfaq Munshi, ML7 Fellow, Pepperdata

Ashfaq Munshi, ML7 Fellow, PepperdataMLconf

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf

Daniel Shank, Data Scientist, Talla at MLconf SF 2017

Daniel Shank, Data Scientist, Talla at MLconf SF 2017

Daniel Shank, Data Scientist, Talla at MLconf SF 2017MLconf

LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...

LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...

LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...MLconf

Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017

Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017

Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017MLconf

Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017

Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017

Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017MLconf

Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017

Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017

Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017MLconf

Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017

Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017

Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017MLconf

Matei zaharia, spark presentation m lconf 2013

Matei zaharia, spark presentation m lconf 2013

Matei zaharia, spark presentation m lconf 2013MLconf

Lukas Biewald, MLconf

Lukas Biewald, MLconf

Lukas Biewald, MLconf MLconf

Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017

Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017

Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017MLconf

Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...

Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...

Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...MLconf

Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017

Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017

Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017MLconf

Breaking Through The Challenges of Scalable Deep Learning for Video Analytics

Breaking Through The Challenges of Scalable Deep Learning for Video Analytics

Breaking Through The Challenges of Scalable Deep Learning for Video AnalyticsJason Anderson

Neural Models for Information Retrieval

Neural Models for Information Retrieval

Neural Models for Information RetrievalBhaskar Mitra

Connecting the Dots for Information Discovery.pdf

Connecting the Dots for Information Discovery.pdf

Connecting the Dots for Information Discovery.pdfNeo4j

LLMs in Production: Tooling, Process, and Team Structure

LLMs in Production: Tooling, Process, and Team Structure

LLMs in Production: Tooling, Process, and Team StructureAggregage

Top 100 PHP Interview Questions and Answers

Top 100 PHP Interview Questions and Answers

Top 100 PHP Interview Questions and AnswersVineet Kumar Saini

Top 100-php-interview-questions-and-answers-are-below-120816023558-phpapp01

Top 100-php-interview-questions-and-answers-are-below-120816023558-phpapp01

Top 100-php-interview-questions-and-answers-are-below-120816023558-phpapp01Tekblink Jeeten

Open source Technology

Open source Technology

Open source TechnologyAmardeep Vishwakarma

Weitere ähnliche Inhalte

Andere mochten auch

Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Talha Obaid, Email Security, Symantec at MLconf ATL 2017MLconf

Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017

Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017

Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017MLconf

Ashfaq Munshi, ML7 Fellow, Pepperdata

Ashfaq Munshi, ML7 Fellow, Pepperdata

Ashfaq Munshi, ML7 Fellow, PepperdataMLconf

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf

Daniel Shank, Data Scientist, Talla at MLconf SF 2017

Daniel Shank, Data Scientist, Talla at MLconf SF 2017

Daniel Shank, Data Scientist, Talla at MLconf SF 2017MLconf

LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...

LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...

LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...MLconf

Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017

Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017

Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017MLconf

Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017

Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017

Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017MLconf

Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017

Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017

Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017MLconf

Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017

Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017

Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017MLconf

Matei zaharia, spark presentation m lconf 2013

Matei zaharia, spark presentation m lconf 2013

Matei zaharia, spark presentation m lconf 2013MLconf

Lukas Biewald, MLconf

Lukas Biewald, MLconf

Lukas Biewald, MLconf MLconf

Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017

Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017

Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017MLconf

Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...

Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...

Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...MLconf

Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017

Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017

Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017MLconf

Andere mochten auch (15)

Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017

Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017

Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017

Ashfaq Munshi, ML7 Fellow, Pepperdata

Ashfaq Munshi, ML7 Fellow, Pepperdata

Ashfaq Munshi, ML7 Fellow, Pepperdata

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017

Daniel Shank, Data Scientist, Talla at MLconf SF 2017

Daniel Shank, Data Scientist, Talla at MLconf SF 2017

Daniel Shank, Data Scientist, Talla at MLconf SF 2017

LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...

LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...

LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...

Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017

Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017

Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017

Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017

Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017

Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017

Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017

Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017

Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017

Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017

Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017

Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017

Matei zaharia, spark presentation m lconf 2013

Matei zaharia, spark presentation m lconf 2013

Matei zaharia, spark presentation m lconf 2013

Lukas Biewald, MLconf

Lukas Biewald, MLconf

Lukas Biewald, MLconf

Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017

Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017

Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017

Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...

Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...

Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...

Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017

Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017

Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017

Ähnlich wie Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017

Breaking Through The Challenges of Scalable Deep Learning for Video Analytics

Breaking Through The Challenges of Scalable Deep Learning for Video Analytics

Breaking Through The Challenges of Scalable Deep Learning for Video AnalyticsJason Anderson

Neural Models for Information Retrieval

Neural Models for Information Retrieval

Neural Models for Information RetrievalBhaskar Mitra

Connecting the Dots for Information Discovery.pdf

Connecting the Dots for Information Discovery.pdf

Connecting the Dots for Information Discovery.pdfNeo4j

LLMs in Production: Tooling, Process, and Team Structure

LLMs in Production: Tooling, Process, and Team Structure

LLMs in Production: Tooling, Process, and Team StructureAggregage

Top 100 PHP Interview Questions and Answers

Top 100 PHP Interview Questions and Answers

Top 100 PHP Interview Questions and AnswersVineet Kumar Saini

Top 100-php-interview-questions-and-answers-are-below-120816023558-phpapp01

Top 100-php-interview-questions-and-answers-are-below-120816023558-phpapp01

Top 100-php-interview-questions-and-answers-are-below-120816023558-phpapp01Tekblink Jeeten

Open source Technology

Open source Technology

Open source TechnologyAmardeep Vishwakarma

Neo4j: Data Engineering for RAG (retrieval augmented generation)

Neo4j: Data Engineering for RAG (retrieval augmented generation)

Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j

The Semantic Knowledge Graph

The Semantic Knowledge Graph

The Semantic Knowledge GraphTrey Grainger

The Neural Search Frontier - Doug Turnbull, OpenSource Connections

The Neural Search Frontier - Doug Turnbull, OpenSource Connections

The Neural Search Frontier - Doug Turnbull, OpenSource ConnectionsLucidworks

Hala skafkeynote@conferencedata2021

Hala skafkeynote@conferencedata2021

Hala skafkeynote@conferencedata2021hala Skaf

What is Node.js used for: The 2015 Node.js Overview Report

What is Node.js used for: The 2015 Node.js Overview Report

What is Node.js used for: The 2015 Node.js Overview ReportGabor Nagy

lecture_34e.pptx

lecture_34e.pptx

lecture_34e.pptxjanibashashaik25

Sem tech 2011 v8

Sem tech 2011 v8

Sem tech 2011 v8dallemang

Wanda Fiat (Ars Analitica) – NLP Beyond Chatbots. Quick Solutions to Hard Pro...

Wanda Fiat (Ars Analitica) – NLP Beyond Chatbots. Quick Solutions to Hard Pro...

Wanda Fiat (Ars Analitica) – NLP Beyond Chatbots. Quick Solutions to Hard Pro...Codiax

Concepts of JetBrains MPS

Concepts of JetBrains MPS

Concepts of JetBrains MPSVaclav Pech

Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...

Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...

Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking VN

Get your organization’s feet wet with Semantic Web Technologies

Get your organization’s feet wet with Semantic Web Technologies

Get your organization’s feet wet with Semantic Web TechnologiesAndré Torkveen

Jumpstart: Building Your First MongoDB App

Jumpstart: Building Your First MongoDB App

Jumpstart: Building Your First MongoDB AppMongoDB

Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark

Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark

Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital.AI

Ähnlich wie Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017 (20)

Breaking Through The Challenges of Scalable Deep Learning for Video Analytics

Breaking Through The Challenges of Scalable Deep Learning for Video Analytics

Breaking Through The Challenges of Scalable Deep Learning for Video Analytics

Neural Models for Information Retrieval

Neural Models for Information Retrieval

Neural Models for Information Retrieval

Connecting the Dots for Information Discovery.pdf

Connecting the Dots for Information Discovery.pdf

Connecting the Dots for Information Discovery.pdf

LLMs in Production: Tooling, Process, and Team Structure

LLMs in Production: Tooling, Process, and Team Structure

LLMs in Production: Tooling, Process, and Team Structure

Top 100 PHP Interview Questions and Answers

Top 100 PHP Interview Questions and Answers

Top 100 PHP Interview Questions and Answers

Top 100-php-interview-questions-and-answers-are-below-120816023558-phpapp01

Top 100-php-interview-questions-and-answers-are-below-120816023558-phpapp01

Top 100-php-interview-questions-and-answers-are-below-120816023558-phpapp01

Open source Technology

Open source Technology

Open source Technology

Neo4j: Data Engineering for RAG (retrieval augmented generation)

Neo4j: Data Engineering for RAG (retrieval augmented generation)

Neo4j: Data Engineering for RAG (retrieval augmented generation)

The Semantic Knowledge Graph

The Semantic Knowledge Graph

The Semantic Knowledge Graph

The Neural Search Frontier - Doug Turnbull, OpenSource Connections

The Neural Search Frontier - Doug Turnbull, OpenSource Connections

The Neural Search Frontier - Doug Turnbull, OpenSource Connections

Hala skafkeynote@conferencedata2021

Hala skafkeynote@conferencedata2021

Hala skafkeynote@conferencedata2021

What is Node.js used for: The 2015 Node.js Overview Report

What is Node.js used for: The 2015 Node.js Overview Report

What is Node.js used for: The 2015 Node.js Overview Report

lecture_34e.pptx

lecture_34e.pptx

lecture_34e.pptx

Sem tech 2011 v8

Sem tech 2011 v8

Sem tech 2011 v8

Wanda Fiat (Ars Analitica) – NLP Beyond Chatbots. Quick Solutions to Hard Pro...

Wanda Fiat (Ars Analitica) – NLP Beyond Chatbots. Quick Solutions to Hard Pro...

Wanda Fiat (Ars Analitica) – NLP Beyond Chatbots. Quick Solutions to Hard Pro...

Concepts of JetBrains MPS

Concepts of JetBrains MPS

Concepts of JetBrains MPS

Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...

Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...

Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...

Get your organization’s feet wet with Semantic Web Technologies

Get your organization’s feet wet with Semantic Web Technologies

Get your organization’s feet wet with Semantic Web Technologies

Jumpstart: Building Your First MongoDB App

Jumpstart: Building Your First MongoDB App

Jumpstart: Building Your First MongoDB App

Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark

Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark

Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark

Mehr von MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding

Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf

Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...

Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...

Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf

Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush

Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush

Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf

Josh Wills - Data Labeling as Religious Experience

Josh Wills - Data Labeling as Religious Experience

Josh Wills - Data Labeling as Religious ExperienceMLconf

Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...

Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...

Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf

Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...

Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...

Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf

Meghana Ravikumar - Optimized Image Classification on the Cheap

Meghana Ravikumar - Optimized Image Classification on the Cheap

Meghana Ravikumar - Optimized Image Classification on the CheapMLconf

Noam Finkelstein - The Importance of Modeling Data Collection

Noam Finkelstein - The Importance of Modeling Data Collection

Noam Finkelstein - The Importance of Modeling Data CollectionMLconf

June Andrews - The Uncanny Valley of ML

June Andrews - The Uncanny Valley of ML

June Andrews - The Uncanny Valley of MLMLconf

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf

Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...

Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...

Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf

Vito Ostuni - The Voice: New Challenges in a Zero UI World

Vito Ostuni - The Voice: New Challenges in a Zero UI World

Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf

Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...

Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...

Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf

Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...

Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...

Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf

Neel Sundaresan - Teaching a machine to code

Neel Sundaresan - Teaching a machine to code

Neel Sundaresan - Teaching a machine to codeMLconf

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf

Soumith Chintala - Increasing the Impact of AI Through Better Software

Soumith Chintala - Increasing the Impact of AI Through Better Software

Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf

Roy Lowrance - Predicting Bond Prices: Regime Changes

Roy Lowrance - Predicting Bond Prices: Regime Changes

Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf

Mehr von MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding

Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...

Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...

Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...

Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush

Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush

Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush

Josh Wills - Data Labeling as Religious Experience

Josh Wills - Data Labeling as Religious Experience

Josh Wills - Data Labeling as Religious Experience

Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...

Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...

Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...

Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...

Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...

Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...

Meghana Ravikumar - Optimized Image Classification on the Cheap

Meghana Ravikumar - Optimized Image Classification on the Cheap

Meghana Ravikumar - Optimized Image Classification on the Cheap

Noam Finkelstein - The Importance of Modeling Data Collection

Noam Finkelstein - The Importance of Modeling Data Collection

Noam Finkelstein - The Importance of Modeling Data Collection

June Andrews - The Uncanny Valley of ML

June Andrews - The Uncanny Valley of ML

June Andrews - The Uncanny Valley of ML

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks

Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...

Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...

Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...

Vito Ostuni - The Voice: New Challenges in a Zero UI World

Vito Ostuni - The Voice: New Challenges in a Zero UI World

Vito Ostuni - The Voice: New Challenges in a Zero UI World

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...

Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...

Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...

Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...

Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...

Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...

Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...

Neel Sundaresan - Teaching a machine to code

Neel Sundaresan - Teaching a machine to code

Neel Sundaresan - Teaching a machine to code

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...

Soumith Chintala - Increasing the Impact of AI Through Better Software

Soumith Chintala - Increasing the Impact of AI Through Better Software

Soumith Chintala - Increasing the Impact of AI Through Better Software

Roy Lowrance - Predicting Bond Prices: Regime Changes

Roy Lowrance - Predicting Bond Prices: Regime Changes

Roy Lowrance - Predicting Bond Prices: Regime Changes

Kürzlich hochgeladen

How AI, OpenAI, and ChatGPT impact business and software.

How AI, OpenAI, and ChatGPT impact business and software.

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

Developer Data Modeling Mistakes: From Postgres to NoSQL

Developer Data Modeling Mistakes: From Postgres to NoSQL

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Moving Beyond Passwords: FIDO Paris Seminar.pdf

Moving Beyond Passwords: FIDO Paris Seminar.pdf

Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3

DevEX - reference for building teams, processes, and platforms

DevEX - reference for building teams, processes, and platforms

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3

"ML in Production",Oleksandr Bagan

"ML in Production",Oleksandr Bagan

"ML in Production",Oleksandr BaganFwdays

The Ultimate Guide to Choosing WordPress Pros and Cons

The Ultimate Guide to Choosing WordPress Pros and Cons

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

SIP trunking in Janus @ Kamailio World 2024

SIP trunking in Janus @ Kamailio World 2024

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

What's New in Teams Calling, Meetings and Devices March 2024

What's New in Teams Calling, Meetings and Devices March 2024

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

DevoxxFR 2024 Reproducible Builds with Apache Maven

DevoxxFR 2024 Reproducible Builds with Apache Maven

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

What is Artificial Intelligence?????????

What is Artificial Intelligence?????????

What is Artificial Intelligence?????????blackmambaettijean

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Anypoint Exchange: It’s Not Just a Repo!

Anypoint Exchange: It’s Not Just a Repo!

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

A Journey Into the Emotions of Software Developers

A Journey Into the Emotions of Software Developers

A Journey Into the Emotions of Software DevelopersNicole Novielli

Training state-of-the-art general text embedding

Training state-of-the-art general text embedding

Training state-of-the-art general text embeddingZilliz

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

Kürzlich hochgeladen (20)

How AI, OpenAI, and ChatGPT impact business and software.

How AI, OpenAI, and ChatGPT impact business and software.

How AI, OpenAI, and ChatGPT impact business and software.

Developer Data Modeling Mistakes: From Postgres to NoSQL

Developer Data Modeling Mistakes: From Postgres to NoSQL

Developer Data Modeling Mistakes: From Postgres to NoSQL

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Moving Beyond Passwords: FIDO Paris Seminar.pdf

Moving Beyond Passwords: FIDO Paris Seminar.pdf

Moving Beyond Passwords: FIDO Paris Seminar.pdf

DevEX - reference for building teams, processes, and platforms

DevEX - reference for building teams, processes, and platforms

DevEX - reference for building teams, processes, and platforms

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx

"ML in Production",Oleksandr Bagan

"ML in Production",Oleksandr Bagan

"ML in Production",Oleksandr Bagan

The Ultimate Guide to Choosing WordPress Pros and Cons

The Ultimate Guide to Choosing WordPress Pros and Cons

The Ultimate Guide to Choosing WordPress Pros and Cons

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

SIP trunking in Janus @ Kamailio World 2024

SIP trunking in Janus @ Kamailio World 2024

SIP trunking in Janus @ Kamailio World 2024

What's New in Teams Calling, Meetings and Devices March 2024

What's New in Teams Calling, Meetings and Devices March 2024

What's New in Teams Calling, Meetings and Devices March 2024

DevoxxFR 2024 Reproducible Builds with Apache Maven

DevoxxFR 2024 Reproducible Builds with Apache Maven

DevoxxFR 2024 Reproducible Builds with Apache Maven

What is Artificial Intelligence?????????

What is Artificial Intelligence?????????

What is Artificial Intelligence?????????

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Anypoint Exchange: It’s Not Just a Repo!

Anypoint Exchange: It’s Not Just a Repo!

Anypoint Exchange: It’s Not Just a Repo!

A Journey Into the Emotions of Software Developers

A Journey Into the Emotions of Software Developers

A Journey Into the Emotions of Software Developers

Training state-of-the-art general text embedding

Training state-of-the-art general text embedding

Training state-of-the-art general text embedding

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017

1. REPRESENTATION LEARNING @ RED HAT Michael A. Alcorn (malcorn@redhat.com) Machine Learning Engineer - Information Retrieval https://sites.google.com/view/michaelaalcorn/ 1

2. Outline Background word2vec/url2vec doc2vec/account2vec Duplicate Detection (batter|pitcher)2vec MLconf Blog 2

3. Background Why? Small amount (zero?) of labeled data for task Lots of unlabeled data (labeled data for a different task?) Can we use large amounts of unlabeled data to make better predictions? Not the same as traditional unsupervised learning! in Goodfellow et al.'s Deep Learning textbook by Bengio et al. Representation learning Transfer learning Excellent chapter Article 3

4. word2vec ew TextTextTextText NVIDIA - " "Introduction to Neural Machine Translation with GPUs (Part 2) 4

5. word2vec ew Deeplearning4j - " " Mikolov et al. (2013) Word2vec 5

6. word2vec Analogies "x is to y as ? is to z" x - y + z = ? bash - shellshock + heartbleed = openssl ﬁrefox - linux + windows = internet_explorer openshift - cloud + storage = gluster rhn_register - rhn + rhsm = subscription- manager =+— 6

7. Naming Colors mapping RGB values to color names Results are pretty underwhelming for those in the know Can word embeddings improve ( )? Blog post by Janelle Shane GitHub 7

8. url2vec Tasks concerning URLs Search - returning relevant content Troubleshooting - recommending related articles Obvious method - look at text Alternative/enhanced method - use customer browsing behavior as additional contextual clues 8

9. url2vec How? Treat each day of browsing activity as a "sentence" Treat each URL as a "word" Run word2vec! 9

10. url2vec https://access.redhat.com/solutions/25190 https://access.redhat.com/solutions/10107 Application: ScatterPlot3D 10

11. doc2vec " " Le and Mikolov (2014) NLP 05: From Word2vec to Doc2vec: a simple example with Gensim 11

12. customer2vec Why? Data-driven segmentation Same idea as url2vec except now we treat each account as a "document" of many "sentences" (different browsing days) 12

13. customer2vec Why? Data-driven segmentation Same idea as url2vec except now we treat each account as a "document" of many "sentences" (different browsing days) 13

14. customer2vec 14

15. Duplicate Detection There are a number of "duplicate" KCS solutions on the Customer Portal Muddy search results How can we identify candidate duplicate documents? Obvious approach - compare text (e.g., tf-idf) Bag-of-words loses any structural meaning behind text Can we learn better representations? Title is essentially a summary of the solution content Learn representations of body that are similar to title representations (like the DSSM; )my code 15

16. Deep Semantic Similarity Model Jianfeng Gao - " "Deep Learning for Web Search and Natural Language Processing 16

17. (batter|pitcher)2vec ( )GitHub Can we learn meaningful representations of MLB players? Accurate representations could be used to simulate games and inform trades Find undervalued/overvalued players 17

18. Can we learn meaningful representations of MLB players? Accurate representations could be used to simulate games and inform trades Find undervalued/overvalued players (batter|pitcher)2vec ( )GitHub 18

19. Can we learn meaningful representations of MLB players? Accurate representations could be used to simulate games and inform trades Find undervalued/overvalued players SI.com NBCSports.com =+— LR (batter|pitcher)2vec ( )GitHub 19

20. (batter|pitcher)2vec " " Learning to Coach Football Wang and Zemel (2016) 20

21. THANK YOU! 21