Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
REPRESENTATION
LEARNING @ RED HAT
Michael A. Alcorn (malcorn@redhat.com)
Machine Learning Engineer - Information Retrieval...
Outline
Background
word2vec/url2vec
doc2vec/account2vec
Duplicate Detection
(batter|pitcher)2vec
MLconf Blog
2
Background
Why?​​
Small amount (zero?) of labeled data for task
Lots of unlabeled data (labeled data for a different
task?...
word2vec
ew
TextTextTextText
NVIDIA - " "Introduction to Neural Machine Translation with GPUs (Part 2)
4
word2vec
ew
Deeplearning4j - " "
Mikolov et al. (2013)
Word2vec
5
word2vec
Analogies
"x is to y as ? is to z" x - y + z = ?
bash - shellshock + heartbleed = openssl
firefox - linux + window...
Naming Colors
mapping RGB values to
color names
Results are pretty underwhelming for those in the
know
Can word embeddings...
url2vec
Tasks concerning URLs
Search - returning relevant content
Troubleshooting - recommending related articles
Obvious ...
url2vec
How?
Treat each day of browsing activity as a "sentence"
Treat each URL as a "word"
Run word2vec!
9
url2vec
https://access.redhat.com/solutions/25190
https://access.redhat.com/solutions/10107
Application: ScatterPlot3D
10
doc2vec
" "
Le and Mikolov (2014)
NLP 05: From Word2vec to Doc2vec: a simple example with Gensim
11
customer2vec
Why?
Data-driven segmentation
Same idea as url2vec except now we treat each account as
a "document" of many "...
customer2vec
Why?
Data-driven segmentation
Same idea as url2vec except now we treat each account as
a "document" of many "...
customer2vec
14
Duplicate Detection
There are a number of "duplicate" KCS solutions​ on
the Customer Portal
Muddy search results
How can w...
Deep Semantic Similarity Model
Jianfeng Gao - " "Deep Learning for Web Search and Natural Language Processing
16
(batter|pitcher)2vec ( )GitHub
Can we learn meaningful representations of MLB
players?
Accurate representations could be u...
Can we learn meaningful representations of MLB
players?
Accurate representations could be used to simulate
games and infor...
Can we learn meaningful representations of MLB
players?
Accurate representations could be used to simulate
games and infor...
(batter|pitcher)2vec
"
"
Learning to Coach
Football
Wang and Zemel (2016)
20
THANK YOU!
21
Nächste SlideShare
Wird geladen in …5
×

Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017

608 Aufrufe

Veröffentlicht am

Representation Learning @ Red Hat:
For many companies, the vast majority of their data is unstructured and unlabeled; however, the data often contains information that could be useful in a variety of scenarios. Representation learning is the process of extracting meaningful features from unlabeled data so that it can be used in other tasks. In this talk, you’ll hear about how Red Hat is using deep learning to discover meaningful entity representations in a number of different settings, including: (1) identifying duplicate documents on the Customer Portal, (2) finding contextually similar URLs with word2vec, and (3) clustering behaviorally similar customers with doc2vec. To close, we will walk through an example demonstrating how representation learning can be applied to Major League Baseball players.

Bio: Michael first developed his data crunching chops as an undergraduate at Auburn University (War Eagle!) where he used a number of different statistical techniques to investigate various aspects of salamander biology (work that led to several publications). He then went on to earn a M.S. in evolutionary biology from The University of Chicago (where he wrote a thesis on frog ecomorphology) before changing directions and earning a second M.S. in computer science (with a focus on intelligent systems) from The University of Texas at Dallas. As a Machine Learning Engineer – Information Retrieval at Red Hat, Michael is constantly looking for ways to use the latest and greatest machine learning technology to improve search.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017

  1. 1. REPRESENTATION LEARNING @ RED HAT Michael A. Alcorn (malcorn@redhat.com) Machine Learning Engineer - Information Retrieval https://sites.google.com/view/michaelaalcorn/ 1
  2. 2. Outline Background word2vec/url2vec doc2vec/account2vec Duplicate Detection (batter|pitcher)2vec MLconf Blog 2
  3. 3. Background Why?​​ Small amount (zero?) of labeled data for task Lots of unlabeled data (labeled data for a different task?) Can we use large amounts of unlabeled data to make better predictions? Not the same as traditional unsupervised learning! in Goodfellow et al.'s Deep Learning textbook by Bengio et al. Representation learning Transfer learning Excellent chapter Article 3
  4. 4. word2vec ew TextTextTextText NVIDIA - " "Introduction to Neural Machine Translation with GPUs (Part 2) 4
  5. 5. word2vec ew Deeplearning4j - " " Mikolov et al. (2013) Word2vec 5
  6. 6. word2vec Analogies "x is to y as ? is to z" x - y + z = ? bash - shellshock + heartbleed = openssl firefox - linux + windows = internet_explorer openshift - cloud + storage = gluster rhn_register - rhn + rhsm = subscription- manager =+— 6
  7. 7. Naming Colors mapping RGB values to color names Results are pretty underwhelming for those in the know Can word embeddings improve ( )? Blog post by Janelle Shane GitHub 7
  8. 8. url2vec Tasks concerning URLs Search - returning relevant content Troubleshooting - recommending related articles Obvious method - look at text Alternative/enhanced method - use customer browsing behavior as additional contextual clues 8
  9. 9. url2vec How? Treat each day of browsing activity as a "sentence" Treat each URL as a "word" Run word2vec! 9
  10. 10. url2vec https://access.redhat.com/solutions/25190 https://access.redhat.com/solutions/10107 Application: ScatterPlot3D 10
  11. 11. doc2vec " " Le and Mikolov (2014) NLP 05: From Word2vec to Doc2vec: a simple example with Gensim 11
  12. 12. customer2vec Why? Data-driven segmentation Same idea as url2vec except now we treat each account as a "document" of many "sentences" (different browsing days) 12
  13. 13. customer2vec Why? Data-driven segmentation Same idea as url2vec except now we treat each account as a "document" of many "sentences" (different browsing days) 13
  14. 14. customer2vec 14
  15. 15. Duplicate Detection There are a number of "duplicate" KCS solutions​ on the Customer Portal Muddy search results How can we identify candidate duplicate documents? Obvious approach - compare text (e.g., tf-idf) ​Bag-of-words loses any structural meaning behind text ​Can we learn better representations? Title is essentially a summary of the solution content Learn representations of body that are similar to title representations (like the DSSM; )my code 15
  16. 16. Deep Semantic Similarity Model Jianfeng Gao - " "Deep Learning for Web Search and Natural Language Processing 16
  17. 17. (batter|pitcher)2vec ( )GitHub Can we learn meaningful representations of MLB players? Accurate representations could be used to simulate games and inform trades Find undervalued/overvalued players 17
  18. 18. Can we learn meaningful representations of MLB players? Accurate representations could be used to simulate games and inform trades Find undervalued/overvalued players (batter|pitcher)2vec ( )GitHub 18
  19. 19. Can we learn meaningful representations of MLB players? Accurate representations could be used to simulate games and inform trades Find undervalued/overvalued players SI.com NBCSports.com =+— LR (batter|pitcher)2vec ( )GitHub 19
  20. 20. (batter|pitcher)2vec " " Learning to Coach Football Wang and Zemel (2016) 20
  21. 21. THANK YOU! 21

×