SlideShare ist ein Scribd-Unternehmen logo
1 von 164
Language Independent Methods of Clustering Similar Contexts (with applications) Ted Pedersen University of Minnesota, Duluth  http://www.d.umn.edu/~tpederse [email_address]
The Problem ,[object Object],[object Object],[object Object],[object Object],[object Object]
Language Independent Methods ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Outline (Tutorial) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Outline (Practical Session) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SenseClusters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Many thanks… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Practical Session ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Background and Motivations
Headed and Headless Contexts ,[object Object],[object Object],[object Object],[object Object]
Headed Contexts (input) ,[object Object],[object Object],[object Object],[object Object],[object Object]
Headed Contexts (output) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Headless Contexts (input) ,[object Object],[object Object],[object Object],[object Object],[object Object]
Headless Contexts (output) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Applications ,[object Object],[object Object],[object Object],[object Object],[object Object]
 
 
 
 
 
Applications ,[object Object],[object Object],[object Object],[object Object],[object Object]
 
 
Applications ,[object Object],[object Object],[object Object],[object Object]
 
 
 
Underlying Premise… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Identifying Lexical Features Measures of Association and  Tests of Significance
What are features? ,[object Object],[object Object],[object Object]
Where do features come from?  ,[object Object],[object Object],[object Object],[object Object]
Feature Selection ,[object Object],[object Object],[object Object],[object Object],[object Object]
Lexical Features ,[object Object],[object Object],[object Object],[object Object],[object Object]
Bigrams ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Co-occurrences ,[object Object],[object Object],[object Object],[object Object],[object Object]
Bigrams and Co-occurrences ,[object Object],[object Object],[object Object],[object Object],[object Object]
“ occur together more often than expected by chance…” ,[object Object],[object Object],[object Object],[object Object],[object Object]
2x2 Contingency Table 100,000 300 !Artificial 400 100 Artificial !Intelligence Intelligence
2x2 Contingency Table 100,000 99,700 300 99,600 99,400 200 !Artificial 400 300 100 Artificial !Intelligence Intelligence
2x2 Contingency Table 100,000 99,700 300 99,600 99,400.0 99,301.2 200.0 298.8 !Artificial 400 300.0 398.8 100.0 000.12 Artificial !Intelligence Intelligence
Measures of Association
Measures of Association
Interpreting the Scores… ,[object Object],[object Object]
 
Interpreting the Scores… ,[object Object],[object Object],[object Object]
Measures of Association ,[object Object],[object Object],[object Object]
Measures Supported in NSP ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
NSP ,[object Object],[object Object],[object Object],[object Object],[object Object]
Summary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Related Work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Context Representations First and Second Order Methods
Once features selected… ,[object Object],[object Object],[object Object]
First Order Representation ,[object Object],[object Object],[object Object]
Contexts ,[object Object],[object Object],[object Object],[object Object]
Unigram Feature Set  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
First Order Vectors of Unigrams 1 0 1 0 1 C4 0 0 0 0 0 C3 1 1 0 1 0 C2 1 1 1 1 1 C1 child magic curse black island
Bigram Feature Set ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
First Order Vectors of Bigrams 1 0 1 1 0 C4 0 1 1 0 0 C3 1 0 0 0 1 C2 1 0 0 1 1 C1 voodoo child serious error military might  island curse  black magic
First Order Vectors ,[object Object],[object Object],[object Object],[object Object],[object Object]
Second Order Representation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Word by Word Matrix 120.0 0 69.4 0 0 voodoo 0 89.2 0 21.2 0 serious 0 54.9 100.3 0 0 military 73.2 0 0 189.2 0 island 43.2 0 0 0 123.5 black child error might curse magic
Word by Word Matrix ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
There was an  island  curse of  black  magic cast by that  voodoo  child.  120.0 0 69.4 0 0 voodoo 73.2 0 0 189.2 0 island 43.2 0 0 0 123.5 black child error might curse magic
Second Order Representation ,[object Object],[object Object]
There was an  island  curse of  black  magic cast by that  voodoo  child.  78.8 0 24.4 63.1 41.2 C1 child error might curse magic
First versus Second Order ,[object Object],[object Object],[object Object],[object Object]
Second Order Co-Occurrences ,[object Object],[object Object]
Second Order Co-occurrences ,[object Object],[object Object],[object Object],[object Object],[object Object]
Summary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Related Work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dimensionality Reduction Singular Value Decomposition
Motivation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Many Methods  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Effect of SVD ,[object Object],[object Object]
Effect of SVD ,[object Object],[object Object],[object Object]
How can SVD be used? ,[object Object],[object Object],[object Object],[object Object],[object Object]
Word by Word Matrix 4 2 0 0 0 3 0 1 box 0 1 2 2 1 2 0 0 memory 0 0 0 1 0 0 2 0 organ 0 2 0 3 2 0 0 0 debt 0 1 0 3 1 0 0 2 linux 0 1 0 3 2 0 0 0 sales 3 0 2 2 0 3 0 0 lab 1 0 2 0 0 1 2 0 petri 0 1 0 0 2 0 0 1 disk 1 0 2 0 0 0 3 0 body 0 0 0 3 1 0 0 2 pc plasma graphics tissue data ibm cells blood apple
Singular Value Decomposition A=UDV’
U -.52 .39 -.48 .02 .09 .41 -.09 .40 -.30 .08 .31 .43 -.26 -.39 -.6 .20 .00 -.00 -.00 -.02 -.01 .00 -.02 -.00 -.07 -.3 .14 -.49 -.07 .30 .25 .56 -.01 .08 .05 -.01 .24 -.08 .11 .46 .08 .03 -.04 .72 .09 -.31 -.01 .37 -.07 .01 -.21 -.31 -.34 -.45 -.68 .29 .00 .05 .83 .17 -.02 .25 -.45 .08 .03 .20 -.22 .31 -.60 .39 .13 .35 -.01 -.04 -.44 .08 .44 .59 -.49 .05 -.02 .63 .02 -.09 .52 -.2 .09 .35
D 0.00 0.00 0.00 0.66 1.26 2.30 2.52 3.25 3.99 6.36 9.19
V -.20 .22 -.07 -.10 -.87 -.07 -.06 .17 .19 -.26 .04 .03 .17 -.32 .02 .13 -.26 -.17 .06 -.04 .86 .50 -.58 .12 .09 -.18 -.27 -.18 -.12 -.47 .11 -.03 .12 .31 -.32 -.04 .64 -.45 -.14 -.23 .28 .07 -.23 -.62 -.59 .05 .02 -.12 .15 .11 .25 -.71 -.31 -.04 .08 .29 -.05 .05 .20 -.51 .09 -.03 .12 .31 -.01 .02 -.45 -.32 .50 .27 .49 -.02 .08 .21 -.06 .08 -.09 .52 -.45 -.01 .63 .03 -.12 -.31 .71 -.13 .39 -.12 .12 .15 .37 .07 .58 -.41 .15 .17 -.30 -.32 -.27 -.39 .11 .44 .25 .03 -.02 .26 .23 .39 .57 -.37 .04 .03 -.12 -.31 -.05 -.05 .04 .28 -.04 .08 .21
Word by Word Matrix After SVD 1.1 1.0 .98 1.7 .86 .72 .85 .77 memory .00 .00 .17 1.2 .77 .00 .84 .00 organ .00 1.5 .00 3.2 2.1 .00 .00 1.2 debt .13 1.1 .03 2.7 1.7 .16 .00 .96 linux .41 .85 .35 2.2 1.3 .39 .15 .73 sales 2.3 .18 2.5 1.7 .35 2.0 1.7 .21 lab 1.4 .00 1.5 .49 .00 1.2 1.1 .00 germ .00 .91 .00 2.1 1.3 .01 .00 .76 disk 1.5 .00 1.6 .33 .00 1.3 1.2 .00 body .09 .86 .01 2.0 1.3 .11 .00 .73 pc plasma graphics tissue data ibm cells blood apple
Second Order Representation ,[object Object],[object Object],[object Object],[object Object],1.0 .72 memory .00 .00 organ .13 1.1 .03 2.7 1.7 .16 .00 .96 linux .00 .91 .00 2.1 1.3 .01 .00 .76 disk Plasma graphics tissue data ibm cells blood apple
Clustering Methods Agglomerative and  Partitional
Many many methods… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
General Methodology ,[object Object],[object Object],[object Object],[object Object]
Agglomerative Clustering ,[object Object],[object Object],[object Object]
Measuring Similarity ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Agglomerative Clustering ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
  Average Link Clustering 1 2 4 S3 1 2 4 S3 0 2 S4 0 3 S2 2 3 S1 S4 S2 S1 0 S4 0 S2 S1S3 S4 S2 S1S3 S4 S1S3S2 S4 S1S3S2
Partitional Methods ,[object Object],[object Object],[object Object],[object Object],[object Object]
Partitional Methods ,[object Object],[object Object]
Cluster Labeling
Results of Clustering ,[object Object],[object Object],[object Object],[object Object]
Label Types ,[object Object],[object Object]
Evaluation Techniques Comparison to gold standard data
Evaluation ,[object Object],[object Object],[object Object],[object Object]
Evaluation ,[object Object],[object Object],[object Object],[object Object]
Evaluation ,[object Object],[object Object],[object Object]
Baseline Algorithm ,[object Object],[object Object]
Baseline Performance ,[object Object],170 55 35 80 Totals 170 55 35 80 C3 0 0 0 0 C2 0 0 0 0 C1 Totals S3 S2 S1 170 80 35 55 Totals 170 80 35 55 C3 0 0 0 0 C2 0 0 0 0 C1 Totals S1 S2 S3
Evaluation ,[object Object],[object Object],[object Object],[object Object],[object Object],170 55 35 80 Totals 65 10 5 50 C3 60 40 0 20 C2 45 5 30 10 C1 Totals S3 S2 S1
Evaluation ,[object Object],[object Object],[object Object],170 80 55 35 Totals 65 50 10 5 C3 60 20 40 0 C2 45 10 5 30 C1 Totals S1 S3 S2
Analysis ,[object Object],[object Object],[object Object],[object Object]
Practical Session Experiments with SenseClusters
Experimental Data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Creating Experimental Data ,[object Object],[object Object],[object Object],[object Object],[object Object]
Name Conflation Data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Clustering Contexts ,[object Object],[object Object],[object Object],[object Object]
Name Discrimination
George Millers!
Headed Clustering ,[object Object],[object Object],[object Object]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Headless Contexts ,[object Object],[object Object]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
If you after all these matrices you crave knowledge based resources… Read on…
WordNet-Similarity ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Many thanks! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Vector measure ,[object Object],[object Object],[object Object],[object Object],[object Object]
Many other measures ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
 
 
Thank you! ,[object Object],[object Object],[object Object]

Weitere ähnliche Inhalte

Was ist angesagt?

Kevin teh insight presentation
Kevin teh   insight presentationKevin teh   insight presentation
Kevin teh insight presentation
Kevin Teh
 
Entity relationship modelling - DE L300
Entity relationship modelling - DE L300Entity relationship modelling - DE L300
Entity relationship modelling - DE L300
Edwin Ayernor
 
Relational Databases 2
Relational Databases 2Relational Databases 2
Relational Databases 2
Jason Hando
 
Sales_Prediction_Technique using R Programming
Sales_Prediction_Technique using R ProgrammingSales_Prediction_Technique using R Programming
Sales_Prediction_Technique using R Programming
Nagarjun Kotyada
 

Was ist angesagt? (20)

Kevin teh insight presentation
Kevin teh   insight presentationKevin teh   insight presentation
Kevin teh insight presentation
 
ER Modeling and Introduction to RDBMS
ER Modeling and Introduction to RDBMSER Modeling and Introduction to RDBMS
ER Modeling and Introduction to RDBMS
 
E R model
E R modelE R model
E R model
 
Entity relationship modelling - DE L300
Entity relationship modelling - DE L300Entity relationship modelling - DE L300
Entity relationship modelling - DE L300
 
Entity relationship modelling
Entity relationship modellingEntity relationship modelling
Entity relationship modelling
 
Sherlock a deep learning approach to semantic data type dete
Sherlock a deep learning approach to semantic data type deteSherlock a deep learning approach to semantic data type dete
Sherlock a deep learning approach to semantic data type dete
 
Er model
Er modelEr model
Er model
 
Relational Databases 2
Relational Databases 2Relational Databases 2
Relational Databases 2
 
Robust Coreference Resolution and Entity Linking on Dialogues: Character Iden...
Robust Coreference Resolution and Entity Linking on Dialogues: Character Iden...Robust Coreference Resolution and Entity Linking on Dialogues: Character Iden...
Robust Coreference Resolution and Entity Linking on Dialogues: Character Iden...
 
Computer sec2-1st term
Computer sec2-1st termComputer sec2-1st term
Computer sec2-1st term
 
Er Modeling
Er ModelingEr Modeling
Er Modeling
 
Semantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional ApproachSemantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional Approach
 
Sec 2 1st term rev.
Sec 2  1st term rev.Sec 2  1st term rev.
Sec 2 1st term rev.
 
Chapter 3 Entity Relationship Model
Chapter 3 Entity Relationship ModelChapter 3 Entity Relationship Model
Chapter 3 Entity Relationship Model
 
DBMS UNIT1
DBMS UNIT1DBMS UNIT1
DBMS UNIT1
 
Database Systems - Entity Relationship Modeling (Chapter 4/2)
Database Systems - Entity Relationship Modeling (Chapter 4/2)Database Systems - Entity Relationship Modeling (Chapter 4/2)
Database Systems - Entity Relationship Modeling (Chapter 4/2)
 
Presentation
PresentationPresentation
Presentation
 
Er model
Er modelEr model
Er model
 
Data modelingpresentation
Data modelingpresentationData modelingpresentation
Data modelingpresentation
 
Sales_Prediction_Technique using R Programming
Sales_Prediction_Technique using R ProgrammingSales_Prediction_Technique using R Programming
Sales_Prediction_Technique using R Programming
 

Andere mochten auch (6)

I2b2 2008
I2b2 2008I2b2 2008
I2b2 2008
 
Acm ihi-2010-pedersen-final
Acm ihi-2010-pedersen-finalAcm ihi-2010-pedersen-final
Acm ihi-2010-pedersen-final
 
Feb20 mayo-webinar-21feb2012
Feb20 mayo-webinar-21feb2012Feb20 mayo-webinar-21feb2012
Feb20 mayo-webinar-21feb2012
 
What it's like to do a Master's thesis with me (Ted Pedersen)
What it's like to do a Master's thesis with me (Ted Pedersen)What it's like to do a Master's thesis with me (Ted Pedersen)
What it's like to do a Master's thesis with me (Ted Pedersen)
 
Pedersen masters-thesis-oct-10-2014
Pedersen masters-thesis-oct-10-2014Pedersen masters-thesis-oct-10-2014
Pedersen masters-thesis-oct-10-2014
 
Screening Twitter Users for Depression and PTSD
Screening Twitter Users for Depression and PTSDScreening Twitter Users for Depression and PTSD
Screening Twitter Users for Depression and PTSD
 

Ähnlich wie Eurolan 2005 Pedersen

CMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics ICMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics I
butest
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Chunyang Chen
 
Lexicon base approch
Lexicon base approchLexicon base approch
Lexicon base approch
anil maurya
 
Emulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning MethodsEmulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning Methods
butest
 

Ähnlich wie Eurolan 2005 Pedersen (20)

Ijcai 2007 Pedersen
Ijcai 2007 PedersenIjcai 2007 Pedersen
Ijcai 2007 Pedersen
 
Aaai 2006 Pedersen
Aaai 2006 PedersenAaai 2006 Pedersen
Aaai 2006 Pedersen
 
Using topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchUsing topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic search
 
Icon 2007 Pedersen
Icon 2007 PedersenIcon 2007 Pedersen
Icon 2007 Pedersen
 
CMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics ICMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics I
 
Ontology
OntologyOntology
Ontology
 
Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search Component
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
 
Information retrieval chapter 2-Text Operations.ppt
Information retrieval chapter 2-Text Operations.pptInformation retrieval chapter 2-Text Operations.ppt
Information retrieval chapter 2-Text Operations.ppt
 
The Duet model
The Duet modelThe Duet model
The Duet model
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic Computing
 
Using construction grammar in conversational systems
Using construction grammar in conversational systemsUsing construction grammar in conversational systems
Using construction grammar in conversational systems
 
Towards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web StackTowards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web Stack
 
Lexicon base approch
Lexicon base approchLexicon base approch
Lexicon base approch
 
The Semantic Quilt
The Semantic QuiltThe Semantic Quilt
The Semantic Quilt
 
Supervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured TextSupervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured Text
 
Emulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning MethodsEmulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning Methods
 
The role of linguistic information for shallow language processing
The role of linguistic information for shallow language processingThe role of linguistic information for shallow language processing
The role of linguistic information for shallow language processing
 
Measuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and ConceptsMeasuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and Concepts
 
Information Retrieval
Information Retrieval Information Retrieval
Information Retrieval
 

Mehr von University of Minnesota, Duluth

Mehr von University of Minnesota, Duluth (20)

Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
 
Automatically Identifying Islamophobia in Social Media
Automatically Identifying Islamophobia in Social MediaAutomatically Identifying Islamophobia in Social Media
Automatically Identifying Islamophobia in Social Media
 
What Makes Hate Speech : an interactive workshop
What Makes Hate Speech : an interactive workshopWhat Makes Hate Speech : an interactive workshop
What Makes Hate Speech : an interactive workshop
 
Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it? Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it?
 
Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?
 
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
 
Who's to say what's funny? A computer using Language Models and Deep Learning...
Who's to say what's funny? A computer using Language Models and Deep Learning...Who's to say what's funny? A computer using Language Models and Deep Learning...
Who's to say what's funny? A computer using Language Models and Deep Learning...
 
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
 
Puns upon a midnight dreary, lexical semantics for the weak and weary
Puns upon a midnight dreary, lexical semantics for the weak and wearyPuns upon a midnight dreary, lexical semantics for the weak and weary
Puns upon a midnight dreary, lexical semantics for the weak and weary
 
The horizon isn't found in a dictionary : Identifying emerging word senses a...
The horizon isn't found in a  dictionary : Identifying emerging word senses a...The horizon isn't found in a  dictionary : Identifying emerging word senses a...
The horizon isn't found in a dictionary : Identifying emerging word senses a...
 
Duluth : Word Sense Discrimination in the Service of Lexicography
Duluth : Word Sense Discrimination in the Service of LexicographyDuluth : Word Sense Discrimination in the Service of Lexicography
Duluth : Word Sense Discrimination in the Service of Lexicography
 
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
 
Pedersen naacl-2013-demo-poster-may25
Pedersen naacl-2013-demo-poster-may25Pedersen naacl-2013-demo-poster-may25
Pedersen naacl-2013-demo-poster-may25
 
Pedersen semeval-2013-poster-may24
Pedersen semeval-2013-poster-may24Pedersen semeval-2013-poster-may24
Pedersen semeval-2013-poster-may24
 
Talk at UAB, April 12, 2013
Talk at UAB, April 12, 2013Talk at UAB, April 12, 2013
Talk at UAB, April 12, 2013
 
Ihi2012 semantic-similarity-tutorial-part1
Ihi2012 semantic-similarity-tutorial-part1Ihi2012 semantic-similarity-tutorial-part1
Ihi2012 semantic-similarity-tutorial-part1
 
Pedersen ACL Disco-2011 workshop
Pedersen ACL Disco-2011 workshopPedersen ACL Disco-2011 workshop
Pedersen ACL Disco-2011 workshop
 
Pedersen acl2011-business-meeting
Pedersen acl2011-business-meetingPedersen acl2011-business-meeting
Pedersen acl2011-business-meeting
 
Pedersen naacl-2010-poster
Pedersen naacl-2010-posterPedersen naacl-2010-poster
Pedersen naacl-2010-poster
 
Advances In Wsd Aaai 2005
Advances In Wsd Aaai 2005Advances In Wsd Aaai 2005
Advances In Wsd Aaai 2005
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Eurolan 2005 Pedersen

  • 1. Language Independent Methods of Clustering Similar Contexts (with applications) Ted Pedersen University of Minnesota, Duluth http://www.d.umn.edu/~tpederse [email_address]
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.  
  • 17.  
  • 18.  
  • 19.  
  • 20.  
  • 21.
  • 22.  
  • 23.  
  • 24.
  • 25.  
  • 26.  
  • 27.  
  • 28.
  • 29. Identifying Lexical Features Measures of Association and Tests of Significance
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38. 2x2 Contingency Table 100,000 300 !Artificial 400 100 Artificial !Intelligence Intelligence
  • 39. 2x2 Contingency Table 100,000 99,700 300 99,600 99,400 200 !Artificial 400 300 100 Artificial !Intelligence Intelligence
  • 40. 2x2 Contingency Table 100,000 99,700 300 99,600 99,400.0 99,301.2 200.0 298.8 !Artificial 400 300.0 398.8 100.0 000.12 Artificial !Intelligence Intelligence
  • 43.
  • 44.  
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51. Context Representations First and Second Order Methods
  • 52.
  • 53.
  • 54.
  • 55.
  • 56. First Order Vectors of Unigrams 1 0 1 0 1 C4 0 0 0 0 0 C3 1 1 0 1 0 C2 1 1 1 1 1 C1 child magic curse black island
  • 57.
  • 58. First Order Vectors of Bigrams 1 0 1 1 0 C4 0 1 1 0 0 C3 1 0 0 0 1 C2 1 0 0 1 1 C1 voodoo child serious error military might island curse black magic
  • 59.
  • 60.
  • 61. Word by Word Matrix 120.0 0 69.4 0 0 voodoo 0 89.2 0 21.2 0 serious 0 54.9 100.3 0 0 military 73.2 0 0 189.2 0 island 43.2 0 0 0 123.5 black child error might curse magic
  • 62.
  • 63. There was an island curse of black magic cast by that voodoo child. 120.0 0 69.4 0 0 voodoo 73.2 0 0 189.2 0 island 43.2 0 0 0 123.5 black child error might curse magic
  • 64.
  • 65. There was an island curse of black magic cast by that voodoo child. 78.8 0 24.4 63.1 41.2 C1 child error might curse magic
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71. Dimensionality Reduction Singular Value Decomposition
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77. Word by Word Matrix 4 2 0 0 0 3 0 1 box 0 1 2 2 1 2 0 0 memory 0 0 0 1 0 0 2 0 organ 0 2 0 3 2 0 0 0 debt 0 1 0 3 1 0 0 2 linux 0 1 0 3 2 0 0 0 sales 3 0 2 2 0 3 0 0 lab 1 0 2 0 0 1 2 0 petri 0 1 0 0 2 0 0 1 disk 1 0 2 0 0 0 3 0 body 0 0 0 3 1 0 0 2 pc plasma graphics tissue data ibm cells blood apple
  • 79. U -.52 .39 -.48 .02 .09 .41 -.09 .40 -.30 .08 .31 .43 -.26 -.39 -.6 .20 .00 -.00 -.00 -.02 -.01 .00 -.02 -.00 -.07 -.3 .14 -.49 -.07 .30 .25 .56 -.01 .08 .05 -.01 .24 -.08 .11 .46 .08 .03 -.04 .72 .09 -.31 -.01 .37 -.07 .01 -.21 -.31 -.34 -.45 -.68 .29 .00 .05 .83 .17 -.02 .25 -.45 .08 .03 .20 -.22 .31 -.60 .39 .13 .35 -.01 -.04 -.44 .08 .44 .59 -.49 .05 -.02 .63 .02 -.09 .52 -.2 .09 .35
  • 80. D 0.00 0.00 0.00 0.66 1.26 2.30 2.52 3.25 3.99 6.36 9.19
  • 81. V -.20 .22 -.07 -.10 -.87 -.07 -.06 .17 .19 -.26 .04 .03 .17 -.32 .02 .13 -.26 -.17 .06 -.04 .86 .50 -.58 .12 .09 -.18 -.27 -.18 -.12 -.47 .11 -.03 .12 .31 -.32 -.04 .64 -.45 -.14 -.23 .28 .07 -.23 -.62 -.59 .05 .02 -.12 .15 .11 .25 -.71 -.31 -.04 .08 .29 -.05 .05 .20 -.51 .09 -.03 .12 .31 -.01 .02 -.45 -.32 .50 .27 .49 -.02 .08 .21 -.06 .08 -.09 .52 -.45 -.01 .63 .03 -.12 -.31 .71 -.13 .39 -.12 .12 .15 .37 .07 .58 -.41 .15 .17 -.30 -.32 -.27 -.39 .11 .44 .25 .03 -.02 .26 .23 .39 .57 -.37 .04 .03 -.12 -.31 -.05 -.05 .04 .28 -.04 .08 .21
  • 82. Word by Word Matrix After SVD 1.1 1.0 .98 1.7 .86 .72 .85 .77 memory .00 .00 .17 1.2 .77 .00 .84 .00 organ .00 1.5 .00 3.2 2.1 .00 .00 1.2 debt .13 1.1 .03 2.7 1.7 .16 .00 .96 linux .41 .85 .35 2.2 1.3 .39 .15 .73 sales 2.3 .18 2.5 1.7 .35 2.0 1.7 .21 lab 1.4 .00 1.5 .49 .00 1.2 1.1 .00 germ .00 .91 .00 2.1 1.3 .01 .00 .76 disk 1.5 .00 1.6 .33 .00 1.3 1.2 .00 body .09 .86 .01 2.0 1.3 .11 .00 .73 pc plasma graphics tissue data ibm cells blood apple
  • 83.
  • 85.
  • 86.
  • 87.
  • 88.
  • 89.
  • 90. Average Link Clustering 1 2 4 S3 1 2 4 S3 0 2 S4 0 3 S2 2 3 S1 S4 S2 S1 0 S4 0 S2 S1S3 S4 S2 S1S3 S4 S1S3S2 S4 S1S3S2
  • 91.
  • 92.
  • 94.
  • 95.
  • 96. Evaluation Techniques Comparison to gold standard data
  • 97.
  • 98.
  • 99.
  • 100.
  • 101.
  • 102.
  • 103.
  • 104.
  • 105. Practical Session Experiments with SenseClusters
  • 106.
  • 107.
  • 108.
  • 109.
  • 112.
  • 113.  
  • 114.  
  • 115.  
  • 116.  
  • 117.  
  • 118.  
  • 119.  
  • 120.  
  • 121.  
  • 122.  
  • 123.  
  • 124.  
  • 125.  
  • 126.  
  • 127.  
  • 128.  
  • 129.  
  • 130.  
  • 131.  
  • 132.  
  • 133.  
  • 134.  
  • 135.
  • 136.  
  • 137.  
  • 138.  
  • 139.  
  • 140.  
  • 141.  
  • 142.  
  • 143.  
  • 144.  
  • 145.  
  • 146.  
  • 147.  
  • 148.  
  • 149.  
  • 150.  
  • 151.  
  • 152.  
  • 153.  
  • 154.  
  • 155.  
  • 156.  
  • 157. If you after all these matrices you crave knowledge based resources… Read on…
  • 158.
  • 159.
  • 160.
  • 161.
  • 162.  
  • 163.  
  • 164.