SlideShare a Scribd company logo
1 of 13
Download to read offline
Learning with the Web: SpottingLearning with the Web: Spotting
Named Entities on the intersectionNamed Entities on the intersection
of NERD and Machine Learningof NERD and Machine Learning
Marieke van Erp, Giuseppe Rizzo, Raphaël Troncy
@giusepperizzo
May 13, 2013 2/13Making Sense of Microposts (#MSM2013)
NERD-ML @ MSM'13
May 13, 2013 3/13Making Sense of Microposts (#MSM2013)
Preprocessing
➢
Dataset is converted in CoNLL IOB
format
➢
Applied 10 cross-fold validation
➢
Chunked the set of tweets in 50KB parts
in order to comply with NERD filesize
limitations
May 13, 2013 4/13Making Sense of Microposts (#MSM2013)
NERD extractors
➢
Retrieves named entities from 10 extractors (Web
APIs)
➢
Harmonizes the classification according to the
NERD Ontology v0.5
http://nerd.eurecom.fr/ontology
➢
75 entity classes mapped to 4 MSM'13 classes
http://nerd.eurecom.fr
May 13, 2013 5/13Making Sense of Microposts (#MSM2013)
Ritter et al. (2011)
➢
Off-the-shelf tool tailored to a Twitter
stream based on:
– LabelledLDA (+CRF)
– Textual features (POS,Capitalization,Suffix, etc.)
– Freebase gazetters (names of PER, ORG, LOC)
➢
10 entity classes mapped to 4 classes
Ritter, A., Clark, S., Mausam, Etzioni, O.: Named Entity Recognition in Tweets: An
Experimental Study. In: Empirical Methods in Natural Language Processing
(EMNLP’11) (2011)
May 13, 2013 6/13Making Sense of Microposts (#MSM2013)
Stanford CRF
➢
Re-trained on the MSM'13 corpora
➢
Parameters based on
english.conll.4class.distsim.crf.ser.gz
properties file provided with the
Stanford distribution
➢
Baseline of our approach
Jenny Rose Finkel, Trond Grenager, and Christopher Manning. Incorporating Non-local
Information into Information Extraction Systems by Gibbs Sampling. In: 43nd Annual
Meeting of the Association for Computational Linguistics (ACL'05) (2005)
May 13, 2013 7/13Making Sense of Microposts (#MSM2013)
Textual features
➢
POS
➢
Capitalisation information
– initial capital
– all capitalized
– proportion of token capitals
➢
Prefix (first three letters of the token)
➢
Suffix (last three letters of the token)
➢
Whether token is at the beginning of at the
end of the micropost
Ritter, A., Clark, S., Mausam, Etzioni, O.: Named Entity Recognition in Tweets: An Experimental
Study. In: Empirical Methods in Natural Language Processing (EMNLP’11) (2011)
May 13, 2013 8/13Making Sense of Microposts (#MSM2013)
ML settings
Run01: 7 textual features (POS, initial capital,
proportion of capitals, prefix, sufix, end/start token); 0
extractor; ML=k-NN, k =1, Euclidean distance
Run02: 0 textual feature; 12 extractors (AlchemyAPI,
DBpedia Spotlight, Extractiv, Lupedia, OpenCalais,
Saplo, Yahoo, Textrazor, Wikimeta, Zemanta,
Stanford NER, Ritter et al.); ML=SVM, polynomial
kernel, SMO
Run03: 4 textual features (POS, initial capital, suffix,
Proportion of Capitals); 8 extractors (AlchemyAPI,
DBpedia Spotlight, Extractiv, Opencalais, Textrazor,
Wikimeta, Stanford NER, Ritter et al.); ML=SVM,
polynomial kernel, SMO
May 13, 2013 9/13Making Sense of Microposts (#MSM2013)
Precision – MSM'13 training,
10 cross-fold validation
May 13, 2013 10/13Making Sense of Microposts (#MSM2013)
Recall - MSM'13 training,
10 cross-fold validation
May 13, 2013 11/13Making Sense of Microposts (#MSM2013)
F1 – MSM'13 training,
10 cross-fold validation
May 13, 2013 12/13Making Sense of Microposts (#MSM2013)
Lessons learned
➢
MISC class is ambiguously defined
➢
8.1% of the named entities from the
training data occurs in the test data
➢
Best Run03: not all extractors and some
textual features
➢
For the next challenge what about
entity linking?
May 13, 2013 13/13Making Sense of Microposts (#MSM2013)
Thanks for your time and attention
http://www.slideshare.net/giusepperizzo
N ERD-ML
http://github.com/giusepperizzo/nerdml

More Related Content

Viewers also liked

Viewers also liked (18)

Prã©sentation c arwidi 3 mai 2010
Prã©sentation c arwidi 3 mai 2010Prã©sentation c arwidi 3 mai 2010
Prã©sentation c arwidi 3 mai 2010
 
15 sep 11 bt property 2011_makings of a choice location
15 sep 11 bt property 2011_makings of a choice location15 sep 11 bt property 2011_makings of a choice location
15 sep 11 bt property 2011_makings of a choice location
 
Savannah Problem Solving (Unit 2 2011)
Savannah Problem Solving (Unit 2 2011)Savannah Problem Solving (Unit 2 2011)
Savannah Problem Solving (Unit 2 2011)
 
act4_fortitude
act4_fortitudeact4_fortitude
act4_fortitude
 
5jun n as
5jun n as5jun n as
5jun n as
 
Habits of mind launch
Habits of mind launchHabits of mind launch
Habits of mind launch
 
Aprendizaje colaborativo
Aprendizaje colaborativoAprendizaje colaborativo
Aprendizaje colaborativo
 
11jun aceh
11jun aceh11jun aceh
11jun aceh
 
ICS Overview Cycle 1 6 9 1 10
ICS Overview Cycle 1 6   9 1 10ICS Overview Cycle 1 6   9 1 10
ICS Overview Cycle 1 6 9 1 10
 
Edisi Medan
Edisi MedanEdisi Medan
Edisi Medan
 
The Publishers - Ch 9 and 10
The Publishers  - Ch 9 and 10  The Publishers  - Ch 9 and 10
The Publishers - Ch 9 and 10
 
280909aceh
280909aceh280909aceh
280909aceh
 
Edisi 26 Maret Aceh
Edisi 26 Maret AcehEdisi 26 Maret Aceh
Edisi 26 Maret Aceh
 
Binder20
Binder20Binder20
Binder20
 
Expo navigation revision
Expo navigation revisionExpo navigation revision
Expo navigation revision
 
Yoshitaka Fujii - MMR vaccines and autism
Yoshitaka Fujii - MMR vaccines and autismYoshitaka Fujii - MMR vaccines and autism
Yoshitaka Fujii - MMR vaccines and autism
 
Edisi 29 Maret Aceh
Edisi 29 Maret AcehEdisi 29 Maret Aceh
Edisi 29 Maret Aceh
 
Sinatra Heroku You And You - PDF Format
Sinatra Heroku You And You - PDF FormatSinatra Heroku You And You - PDF Format
Sinatra Heroku You And You - PDF Format
 

Similar to Learning with the Web: Spotting Named Entities on the intersection of NERD and Machine Learning

GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003
butest
 
Large Scale Processing of Unstructured Text
Large Scale Processing of Unstructured TextLarge Scale Processing of Unstructured Text
Large Scale Processing of Unstructured Text
DataWorks Summit
 
Scc2012 Scala
Scc2012 ScalaScc2012 Scala
Scc2012 Scala
steccami
 
Automatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural NetworksAutomatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural Networks
Jinho Choi
 
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Apache OpenNLP
 

Similar to Learning with the Web: Spotting Named Entities on the intersection of NERD and Machine Learning (20)

Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012
 
GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
 
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
 
Large Scale Text Processing
Large Scale Text ProcessingLarge Scale Text Processing
Large Scale Text Processing
 
Large Scale Processing of Unstructured Text
Large Scale Processing of Unstructured TextLarge Scale Processing of Unstructured Text
Large Scale Processing of Unstructured Text
 
Santhosh_Resume Current
Santhosh_Resume CurrentSanthosh_Resume Current
Santhosh_Resume Current
 
Fun with Functional Programming in Clojure
Fun with Functional Programming in ClojureFun with Functional Programming in Clojure
Fun with Functional Programming in Clojure
 
PBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHM
PBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHMPBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHM
PBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHM
 
PBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHM
PBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHMPBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHM
PBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHM
 
Pbcbt an improvement of ntbcbt algorithm
Pbcbt an improvement of ntbcbt algorithmPbcbt an improvement of ntbcbt algorithm
Pbcbt an improvement of ntbcbt algorithm
 
Pbcbt an improvement of ntbcbt algorithm
Pbcbt an improvement of ntbcbt algorithmPbcbt an improvement of ntbcbt algorithm
Pbcbt an improvement of ntbcbt algorithm
 
Fun with Functional Programming in Clojure - John Stevenson - Codemotion Amst...
Fun with Functional Programming in Clojure - John Stevenson - Codemotion Amst...Fun with Functional Programming in Clojure - John Stevenson - Codemotion Amst...
Fun with Functional Programming in Clojure - John Stevenson - Codemotion Amst...
 
Arabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approachArabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approach
 
Macros in nemerle
Macros in nemerleMacros in nemerle
Macros in nemerle
 
Scc2012 Scala
Scc2012 ScalaScc2012 Scala
Scc2012 Scala
 
Automatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural NetworksAutomatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural Networks
 
Bench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization
Bench4BL: Reproducibility Study on the Performance of IR-Based Bug LocalizationBench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization
Bench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization
 
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
 

More from Giuseppe Rizzo

More from Giuseppe Rizzo (20)

Artificial intelligence for social good
Artificial intelligence for social goodArtificial intelligence for social good
Artificial intelligence for social good
 
AI in 60 minutes
AI in 60 minutesAI in 60 minutes
AI in 60 minutes
 
COMPRENDE, PERSONALIZZA, INTERAGISCE E IMPARA: L’AI COGNITIVA PER L’HR
COMPRENDE, PERSONALIZZA, INTERAGISCE E  IMPARA: L’AI COGNITIVA PER L’HRCOMPRENDE, PERSONALIZZA, INTERAGISCE E  IMPARA: L’AI COGNITIVA PER L’HR
COMPRENDE, PERSONALIZZA, INTERAGISCE E IMPARA: L’AI COGNITIVA PER L’HR
 
Understand, Answer and Argument: Conversational Agents
Understand, Answer and Argument: Conversational AgentsUnderstand, Answer and Argument: Conversational Agents
Understand, Answer and Argument: Conversational Agents
 
AI For Profiling Your Customers
AI For Profiling Your CustomersAI For Profiling Your Customers
AI For Profiling Your Customers
 
AI for Personalized Chatbot
AI for Personalized ChatbotAI for Personalized Chatbot
AI for Personalized Chatbot
 
Tourist Knowledge Graph Creation to Automating Travel Bookings
Tourist Knowledge Graph Creation to Automating Travel BookingsTourist Knowledge Graph Creation to Automating Travel Bookings
Tourist Knowledge Graph Creation to Automating Travel Bookings
 
The SentiME System at the SSA Challenge Task 1
The SentiME System at the SSA Challenge Task 1The SentiME System at the SSA Challenge Task 1
The SentiME System at the SSA Challenge Task 1
 
Context-Enhanced Adaptive Entity Linking
Context-Enhanced Adaptive Entity LinkingContext-Enhanced Adaptive Entity Linking
Context-Enhanced Adaptive Entity Linking
 
From Data to Knowledge for Tourists
From Data to Knowledge for TouristsFrom Data to Knowledge for Tourists
From Data to Knowledge for Tourists
 
Enabling Visitors to Explore a Smart City
Enabling Visitors to Explore a Smart CityEnabling Visitors to Explore a Smart City
Enabling Visitors to Explore a Smart City
 
NEEL2015 challenge summary
NEEL2015 challenge summaryNEEL2015 challenge summary
NEEL2015 challenge summary
 
Inductive Entity Typing Alignment
Inductive Entity Typing AlignmentInductive Entity Typing Alignment
Inductive Entity Typing Alignment
 
Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...
Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...
Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...
 
CrossLanguageSpotter: A Library for Detecting Relations in Polyglot Frameworks
CrossLanguageSpotter: A Library for Detecting Relations in Polyglot FrameworksCrossLanguageSpotter: A Library for Detecting Relations in Polyglot Frameworks
CrossLanguageSpotter: A Library for Detecting Relations in Polyglot Frameworks
 
Learning with the Web. Structuring data to ease machine understanding
Learning with the Web. Structuring data to ease  machine understandingLearning with the Web. Structuring data to ease  machine understanding
Learning with the Web. Structuring data to ease machine understanding
 
NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud
NERD meets NIF:  Lifting NLP Extraction Results to the Linked Data CloudNERD meets NIF:  Lifting NLP Extraction Results to the Linked Data Cloud
NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud
 
The NERD project
The NERD projectThe NERD project
The NERD project
 
L'enorme archivio di dati: il Web
L'enorme archivio di dati: il WebL'enorme archivio di dati: il Web
L'enorme archivio di dati: il Web
 
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of DataNERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Learning with the Web: Spotting Named Entities on the intersection of NERD and Machine Learning

  • 1. Learning with the Web: SpottingLearning with the Web: Spotting Named Entities on the intersectionNamed Entities on the intersection of NERD and Machine Learningof NERD and Machine Learning Marieke van Erp, Giuseppe Rizzo, Raphaël Troncy @giusepperizzo
  • 2. May 13, 2013 2/13Making Sense of Microposts (#MSM2013) NERD-ML @ MSM'13
  • 3. May 13, 2013 3/13Making Sense of Microposts (#MSM2013) Preprocessing ➢ Dataset is converted in CoNLL IOB format ➢ Applied 10 cross-fold validation ➢ Chunked the set of tweets in 50KB parts in order to comply with NERD filesize limitations
  • 4. May 13, 2013 4/13Making Sense of Microposts (#MSM2013) NERD extractors ➢ Retrieves named entities from 10 extractors (Web APIs) ➢ Harmonizes the classification according to the NERD Ontology v0.5 http://nerd.eurecom.fr/ontology ➢ 75 entity classes mapped to 4 MSM'13 classes http://nerd.eurecom.fr
  • 5. May 13, 2013 5/13Making Sense of Microposts (#MSM2013) Ritter et al. (2011) ➢ Off-the-shelf tool tailored to a Twitter stream based on: – LabelledLDA (+CRF) – Textual features (POS,Capitalization,Suffix, etc.) – Freebase gazetters (names of PER, ORG, LOC) ➢ 10 entity classes mapped to 4 classes Ritter, A., Clark, S., Mausam, Etzioni, O.: Named Entity Recognition in Tweets: An Experimental Study. In: Empirical Methods in Natural Language Processing (EMNLP’11) (2011)
  • 6. May 13, 2013 6/13Making Sense of Microposts (#MSM2013) Stanford CRF ➢ Re-trained on the MSM'13 corpora ➢ Parameters based on english.conll.4class.distsim.crf.ser.gz properties file provided with the Stanford distribution ➢ Baseline of our approach Jenny Rose Finkel, Trond Grenager, and Christopher Manning. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. In: 43nd Annual Meeting of the Association for Computational Linguistics (ACL'05) (2005)
  • 7. May 13, 2013 7/13Making Sense of Microposts (#MSM2013) Textual features ➢ POS ➢ Capitalisation information – initial capital – all capitalized – proportion of token capitals ➢ Prefix (first three letters of the token) ➢ Suffix (last three letters of the token) ➢ Whether token is at the beginning of at the end of the micropost Ritter, A., Clark, S., Mausam, Etzioni, O.: Named Entity Recognition in Tweets: An Experimental Study. In: Empirical Methods in Natural Language Processing (EMNLP’11) (2011)
  • 8. May 13, 2013 8/13Making Sense of Microposts (#MSM2013) ML settings Run01: 7 textual features (POS, initial capital, proportion of capitals, prefix, sufix, end/start token); 0 extractor; ML=k-NN, k =1, Euclidean distance Run02: 0 textual feature; 12 extractors (AlchemyAPI, DBpedia Spotlight, Extractiv, Lupedia, OpenCalais, Saplo, Yahoo, Textrazor, Wikimeta, Zemanta, Stanford NER, Ritter et al.); ML=SVM, polynomial kernel, SMO Run03: 4 textual features (POS, initial capital, suffix, Proportion of Capitals); 8 extractors (AlchemyAPI, DBpedia Spotlight, Extractiv, Opencalais, Textrazor, Wikimeta, Stanford NER, Ritter et al.); ML=SVM, polynomial kernel, SMO
  • 9. May 13, 2013 9/13Making Sense of Microposts (#MSM2013) Precision – MSM'13 training, 10 cross-fold validation
  • 10. May 13, 2013 10/13Making Sense of Microposts (#MSM2013) Recall - MSM'13 training, 10 cross-fold validation
  • 11. May 13, 2013 11/13Making Sense of Microposts (#MSM2013) F1 – MSM'13 training, 10 cross-fold validation
  • 12. May 13, 2013 12/13Making Sense of Microposts (#MSM2013) Lessons learned ➢ MISC class is ambiguously defined ➢ 8.1% of the named entities from the training data occurs in the test data ➢ Best Run03: not all extractors and some textual features ➢ For the next challenge what about entity linking?
  • 13. May 13, 2013 13/13Making Sense of Microposts (#MSM2013) Thanks for your time and attention http://www.slideshare.net/giusepperizzo N ERD-ML http://github.com/giusepperizzo/nerdml