Deriving Conversational Insight by Learning Emoji Representation by Jeff Weintraub

Data Con LA
Data Con LADirector, Cloud Data Engineering um Data Con LA
Deriving  Conversational  Insight  by  
Learning  Emoji  Representations
VP,  Technology
Jeff  Weintraub
a  You  &  Mr  Jones  company
//BigDataLA2017
AGENDA
1. Emoji  Adop?on  
2. Emojineering  
3. Conversa?onal  Insight
Product  &  Technology  Development1.  Emoji  Adoption
4
//BigDataLA2017
Emoji  Adoption  -­‐  Instagram
October  2011  
Emoji  keyboard  launches  on  iOS
10%  
Instagram  Comments  contained  emoji  
(Nov  2011)
50%+
Instagram  Comments  contained  
emoji  (March  2015)
See  https://engineering.instagram.com/emojineering-­‐part-­‐1-­‐machine-­‐learning-­‐for-­‐emoji-­‐trendsmachine-­‐learning-­‐for-­‐emoji-­‐trends-­‐7f5f9cb979ad
5
//BigDataLA2017
Emoji  Adoption  -­‐  Instagram
2,666  
Emojis  in  Unicode  Standard  as  of  
May  2017
-­‐0.93  
Correla?on  coefficient  within  respec?ve  
cohorts
See  https://engineering.instagram.com/emojineering-­‐part-­‐1-­‐machine-­‐learning-­‐for-­‐emoji-­‐trendsmachine-­‐learning-­‐for-­‐emoji-­‐trends-­‐7f5f9cb979ad
Product  &  Technology  Development2.  Emojineering
7
//BigDataLA2017
Emojineering
Ford  GTs  are  the    
Ford  GTs  are
!
!
8
//BigDataLA2017
Emojineering
Ford  GTs  are  the    
Ford  GTs  are
!
!
(Pos)
(Neg)
9
//BigDataLA2017
Emojineering
NLP  SemanCc  Analysis  
-­‐ N-­‐gram  Nueral  Network  Language  
Model  (NNLM)
See  Mikolov,  et  al.  Efficient  Estimation  of  Word  Representations  in  Vector  Space,  2013
Q  =  Training  Complexity;  Goal  is  to  minimize  so  can  be  trained  efficiently  on        
more  data  
C  is  the  maximum  distance  of  the  words.    
V  is  size  of  the  vocabulary;  output  layer  dimensionality
-­‐ Trained  with  stochas?c  gradient  descent  
(SGD)  and  back  propaga?on
-­‐ Maximize  classifica?on  of  a  word  based  
on  another  word  in  the  same  sentence.
ConCnuous  Skip-­‐gram  Model  
10
//BigDataLA2017
Emojineering
Skip-­‐gram  Model  
-­‐ if  we  choose  C  =  5,  for  each  training  
word  we  will  select  randomly  a  number  
R  in  range  <  1;  C  >,  and  then  use  R  
words  from  history  and  R  words  from  
the  future  of  the  current  word  as  
correct  labels.
See  Mikolov,  et  al.  Efficient  Estimation  of  Word  Representations  in  Vector  Space,  2013
-­‐ increasing  the  range  improves  quality  of  
the  resul?ng  word  vectors,  but  it  also  
increases  the  computa?onal  complexity
11
//BigDataLA2017
Emojineering
DistribuConal  Hypothesis  
Words  that  occur  in  similar  contexts  tend  
to  have  similar  meanings  (Harris,  1954;  
Firth,  1957;  Deerwester  et  al.,  1990)
Training  Accuracy  
-­‐ 300  dimensional  vectors;  words  and  
emojis  
-­‐ 3  million  phrases  
-­‐ 6B  tokens
the, Ford, GT
cars, Ford, :)
See  https://engineering.instagram.com/emojineering-­‐part-­‐1-­‐machine-­‐learning-­‐for-­‐emoji-­‐trendsmachine-­‐learning-­‐for-­‐emoji-­‐trends-­‐7f5f9cb979ad
12
//BigDataLA2017
Emojineering  -­‐  Visualization
the, Ford, GT
cars, Ford, :)
13
//BigDataLA2017
Emojineering
DistribuConal  Hypothesis  
Words  that  occur  in  similar  contexts  tend  
to  have  similar  meanings  (Harris,  1954;  
Firth,  1957;  Deerwester  et  al.,  1990)
100  Billion  Words  
Model  contains  300  dimensional  vectors  
for  3  million  words  and  phrases
the, Ford, GT
cars, Ford, :)
3.  Conversational  Insight
14
//BigDataLA2017
Conversational  Insight  -­‐  Entertainment  Vertical
65.23%  
of  Emojis  used  were  Top  10  Emojis
34.7%  
of  Emojis  uses  were                      and  😂 😍
30.01%  of  Emojis  used  were  seman?cally  
relevant  to  key  words
15
//BigDataLA2017
Conversational  Insight  -­‐  Retail  Vertical
58.14%  
of  Emojis  used  were  Top  10  Emojis
22.5%  
of  Emojis  uses  were                      and   😍
11.78%  of  Emojis  used  were  seman?cally  
relevant  to  key  words
❤
16
//BigDataLA2017
Conversational  Insight  -­‐  Beauty  Vertical
71.22%  
of  Emojis  used  were  Top  10  Emojis
37.8%  
of  Emojis  uses  were                      and  😂 😍
4%  of  Emojis  used  were  seman?cally  
relevant  to  key  words
//BigDataLA2017
AGENDA
1. Emoji  Adop?on  
2. Emojineering  
3. Conversa?onal  Insight
Thank  You!
jeff@theamplify.com
@jeff_weintraub
a  You  &  Mr  Jones  company
1 von 18

Más contenido relacionado

Similar a Deriving Conversational Insight by Learning Emoji Representation by Jeff Weintraub(20)

Responsive Web Cross-Media and MobileResponsive Web Cross-Media and Mobile
Responsive Web Cross-Media and Mobile
Matthew Snyder929 views
Improving Emoji Understanding Tasks using EmojiNet – A Mini-TutorialImproving Emoji Understanding Tasks using EmojiNet – A Mini-Tutorial
Improving Emoji Understanding Tasks using EmojiNet – A Mini-Tutorial
Artificial Intelligence Institute at UofSC460 views
EmojiNet: An Open Service and API for Emoji Sense DiscoveryEmojiNet: An Open Service and API for Emoji Sense Discovery
EmojiNet: An Open Service and API for Emoji Sense Discovery
Artificial Intelligence Institute at UofSC391 views
Itamoji: Italian Emoji Prediction @ Evalita 2018Itamoji: Italian Emoji Prediction @ Evalita 2018
Itamoji: Italian Emoji Prediction @ Evalita 2018
University of Torino443 views
Maturation of the Twitter EcosystemMaturation of the Twitter Ecosystem
Maturation of the Twitter Ecosystem
Kevin Makice10.4K views
How many types of mobile appsHow many types of mobile apps
How many types of mobile apps
Quantum Innovation83 views
Mobile design | development servicesMobile design | development services
Mobile design | development services
ZnSoftech Pvt.Ltd15 views
TotalSynch-PitchDeckTotalSynch-PitchDeck
TotalSynch-PitchDeck
Mangesh Mahajan333 views
My cvMy cv
My cv
Mehdavi155 views
So you want to build an appSo you want to build an app
So you want to build an app
Karmen Karamanian28 views
Machine Learning on MobileMachine Learning on Mobile
Machine Learning on Mobile
Amazon Web Services160 views
Machine Learning on MobileMachine Learning on Mobile
Machine Learning on Mobile
Amazon Web Services61 views
2012014050301220120140503012
20120140503012
IAEME Publication252 views

Más de Data Con LA(20)

Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA180 views
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA16 views
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
Data Con LA9 views
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
Data Con LA10 views
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
Data Con LA25 views

Último(20)

ThroughputThroughput
Throughput
Moisés Armani Ramírez31 views
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
Prity Khastgir IPR Strategic India Patent Attorney Amplify Innovation24 views
The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)
CSUC - Consorci de Serveis Universitaris de Catalunya59 views

Deriving Conversational Insight by Learning Emoji Representation by Jeff Weintraub

  • 1. Deriving  Conversational  Insight  by   Learning  Emoji  Representations VP,  Technology Jeff  Weintraub a  You  &  Mr  Jones  company
  • 2. //BigDataLA2017 AGENDA 1. Emoji  Adop?on   2. Emojineering   3. Conversa?onal  Insight
  • 3. Product  &  Technology  Development1.  Emoji  Adoption
  • 4. 4 //BigDataLA2017 Emoji  Adoption  -­‐  Instagram October  2011   Emoji  keyboard  launches  on  iOS 10%   Instagram  Comments  contained  emoji   (Nov  2011) 50%+ Instagram  Comments  contained   emoji  (March  2015) See  https://engineering.instagram.com/emojineering-­‐part-­‐1-­‐machine-­‐learning-­‐for-­‐emoji-­‐trendsmachine-­‐learning-­‐for-­‐emoji-­‐trends-­‐7f5f9cb979ad
  • 5. 5 //BigDataLA2017 Emoji  Adoption  -­‐  Instagram 2,666   Emojis  in  Unicode  Standard  as  of   May  2017 -­‐0.93   Correla?on  coefficient  within  respec?ve   cohorts See  https://engineering.instagram.com/emojineering-­‐part-­‐1-­‐machine-­‐learning-­‐for-­‐emoji-­‐trendsmachine-­‐learning-­‐for-­‐emoji-­‐trends-­‐7f5f9cb979ad
  • 6. Product  &  Technology  Development2.  Emojineering
  • 7. 7 //BigDataLA2017 Emojineering Ford  GTs  are  the     Ford  GTs  are ! !
  • 8. 8 //BigDataLA2017 Emojineering Ford  GTs  are  the     Ford  GTs  are ! ! (Pos) (Neg)
  • 9. 9 //BigDataLA2017 Emojineering NLP  SemanCc  Analysis   -­‐ N-­‐gram  Nueral  Network  Language   Model  (NNLM) See  Mikolov,  et  al.  Efficient  Estimation  of  Word  Representations  in  Vector  Space,  2013 Q  =  Training  Complexity;  Goal  is  to  minimize  so  can  be  trained  efficiently  on         more  data   C  is  the  maximum  distance  of  the  words.     V  is  size  of  the  vocabulary;  output  layer  dimensionality -­‐ Trained  with  stochas?c  gradient  descent   (SGD)  and  back  propaga?on -­‐ Maximize  classifica?on  of  a  word  based   on  another  word  in  the  same  sentence. ConCnuous  Skip-­‐gram  Model  
  • 10. 10 //BigDataLA2017 Emojineering Skip-­‐gram  Model   -­‐ if  we  choose  C  =  5,  for  each  training   word  we  will  select  randomly  a  number   R  in  range  <  1;  C  >,  and  then  use  R   words  from  history  and  R  words  from   the  future  of  the  current  word  as   correct  labels. See  Mikolov,  et  al.  Efficient  Estimation  of  Word  Representations  in  Vector  Space,  2013 -­‐ increasing  the  range  improves  quality  of   the  resul?ng  word  vectors,  but  it  also   increases  the  computa?onal  complexity
  • 11. 11 //BigDataLA2017 Emojineering DistribuConal  Hypothesis   Words  that  occur  in  similar  contexts  tend   to  have  similar  meanings  (Harris,  1954;   Firth,  1957;  Deerwester  et  al.,  1990) Training  Accuracy   -­‐ 300  dimensional  vectors;  words  and   emojis   -­‐ 3  million  phrases   -­‐ 6B  tokens the, Ford, GT cars, Ford, :) See  https://engineering.instagram.com/emojineering-­‐part-­‐1-­‐machine-­‐learning-­‐for-­‐emoji-­‐trendsmachine-­‐learning-­‐for-­‐emoji-­‐trends-­‐7f5f9cb979ad
  • 13. 13 //BigDataLA2017 Emojineering DistribuConal  Hypothesis   Words  that  occur  in  similar  contexts  tend   to  have  similar  meanings  (Harris,  1954;   Firth,  1957;  Deerwester  et  al.,  1990) 100  Billion  Words   Model  contains  300  dimensional  vectors   for  3  million  words  and  phrases the, Ford, GT cars, Ford, :) 3.  Conversational  Insight
  • 14. 14 //BigDataLA2017 Conversational  Insight  -­‐  Entertainment  Vertical 65.23%   of  Emojis  used  were  Top  10  Emojis 34.7%   of  Emojis  uses  were                      and  😂 😍 30.01%  of  Emojis  used  were  seman?cally   relevant  to  key  words
  • 15. 15 //BigDataLA2017 Conversational  Insight  -­‐  Retail  Vertical 58.14%   of  Emojis  used  were  Top  10  Emojis 22.5%   of  Emojis  uses  were                      and   😍 11.78%  of  Emojis  used  were  seman?cally   relevant  to  key  words ❤
  • 16. 16 //BigDataLA2017 Conversational  Insight  -­‐  Beauty  Vertical 71.22%   of  Emojis  used  were  Top  10  Emojis 37.8%   of  Emojis  uses  were                      and  😂 😍 4%  of  Emojis  used  were  seman?cally   relevant  to  key  words
  • 17. //BigDataLA2017 AGENDA 1. Emoji  Adop?on   2. Emojineering   3. Conversa?onal  Insight