SlideShare a Scribd company logo
1 of 28
Perfect Text Analytics	 Seth Redmore VP, Product Management
Perfect per·fect     [adj., n. pur-fikt; v. per-fekt] 1. conforming absolutely to the description or definition of an ideal type: a perfect sphere; a perfect gentleman. 2. excellent or complete beyond practical or theoretical improvement: There is no perfect legal code. The proportions of this temple are almost perfect. 2 All right reserved © 2010 Lexalytics Inc.
Text Analytics The term text analytics describes a set of linguistic statistical, and machine learning techniques that model and structure the information content of textual sources. (Wikipedia) In other words, enhancing the value of text content by extracting entities, features, context, relationships and emotion. 3 All right reserved © 2010 Lexalytics Inc.
Perfect is Fast Average Human Reading Speed:  250wpm Conservative computer reading speed: 6000 wpm/core (our speed on a moderate single core) Each core is equivalent to the reading bandwidth of 12 people. Modern machines have 8 cores.  That’s just about 100 people in a box.   Nice. 4 All right reserved © 2010 Lexalytics Inc.
Perfect is Useable “I don’t like the results” is not the same as “the results are incorrect” Understanding the behavior key to usefulness Can you make better decisions? Can you make more money or save money? What is the most controversial area of text analytics? Thompson Reuters trading w/Sentiment Analysis increased Alpha (profit over market) by 80 basis points 5 All right reserved © 2010 Lexalytics Inc.
Useable: How much can you differ? “In my shop, that up until now has relied exclusively on human coding, we consider anything below 90% to be unacceptably inaccurate…. There is no doubt that automated sentiment is getting much much better, but to suggest that people should be okay with 20% of their data being wrong is just absurd.”  Katie Delahaye Payne Why is 10% “wrong” so much less absurd than 20% “wrong”? 20% Error 10% Error 6 All right reserved © 2010 Lexalytics Inc.
Perfect is Consistent Same results for same content, every time University of Pittsburgh “Multi-Perspective Question Answering” Corpus:  535 documents, 11k+ sentences.   40 hours of training for each rater ~80% inter-rater agreement 7 All right reserved © 2010 Lexalytics Inc.
Perfect is (new) Knowledge Discover the stuff you don’t know Text Analytics is really, really great at telling you the who, the what, and the where.  Sometimes the “how” You have to supply the “why” – but that question is way easier to answer when you know the other “w’s and the h” 8 All right reserved © 2010 Lexalytics Inc.
Perfect Includes Everything Running our top of the line software flat out across one year will cost you about $.002/document analyzed (news article sized content) (assuming 3 docs/core-second, 8 core machine) The more data the better and the greater worth your ta has 9 All right reserved © 2010 Lexalytics Inc.
Perfect is Trainable Can you solve YOUR business problem with it? Can you optimize to suit different kinds of content and roll those results up into a single reporting system? 10 All right reserved © 2010 Lexalytics Inc.
Perfect Text Analytics 11 All right reserved © 2010 Lexalytics Inc. Fast Useable Consistent Knowledge (that is) Inclusive Trainable
Customer Snapshots (or, “rubber, meet road”)
Reputation Management 13 All right reserved © 2010 Lexalytics Inc.
Politics 14 All right reserved © 2010 Lexalytics Inc.
Market Intelligence Client Employee User  Authentication Single  Sign-on External Content Providers SinglePoint Client Company User  Authentication Web 2.0 Collaboration Search Results Secondary Research Suppliers User  Authentication MI Analyst  Text Analytics Integrated  Index News & Journals  NL Search Engine FIREWALL Internal Document  Repository Optional Document  Repository Financial  analyst  reports Internal   research Content  Processing Custom Web  Crawls & Gov. Databases Trash can crawl,  FTP or CD 15 All right reserved © 2010 Lexalytics Inc.
Hospitality 16 All right reserved © 2010 Lexalytics Inc.
Financial Services Turns News into numbers for automatic trading systems ,[object Object]
Resilient server productAll right reserved © 2010 Lexalytics Inc. 17 Algorithmic Trading (QED firm) Financial data Indicators Buy/Sell RNSE Server Indicators ,[object Object]
QED (Quantitative and Event-Driven Trading) Banks, hedge funds.
JPMorgan, SocGen, Alpha Equities…and others,[object Object]
Pharma 19 All right reserved © 2010 Lexalytics Inc.
The Next Year…
Opinion Mining Who said what about whom? All right reserved © 2010 Lexalytics Inc. 21
Sarcasm, Twitter Model trained to detect sarcasm Once detected, you can decide what to do with it – because actually determining the sentiment is going to be unreliable New model trained on Twitter content Moving towards a concept of text analytics driven by business logic All right reserved © 2010 Lexalytics Inc. 22
Thesaurus-based Theme Rollup Machine generated conceptual taxonomy Gas/Electric Hybrid and EV might roll up to EV Fewer themes, but very useful to detect patterns across content All right reserved © 2010 Lexalytics Inc. 23
Foreign Language Support French is first, followed by other Romance languages New stemmer New summarization algorithm New part-of-speech tagger Automatic language detection New sentiment/entity extraction algorithms Also applicable to vertical specific content Confidence scoring by algorithm Use business logic to meld the results All right reserved © 2010 Lexalytics Inc. 24
Trainable Entity Sentiment New technique for entity sentiment Initial results from testing in English extremely promising Average human scoring overlap of >> 90% for scored sentences Initially used only for French 25 All right reserved © 2010 Lexalytics Inc.
Tool Enhancements Eventually use on English content: Twitter Customer Satisfaction Others… Entity Management Toolkit   Part of Speech Tagset training Using to train Salience on French Sentiment Toolkit Build your own entity sentiment models: French (first) New Sentiment Toolkit + Maximum Entropy  model builder allows new Entity and Sentiment modules New EMT helps us build a new French PoS tagger Entity Extraction & Sentiment Models Fully  Tagged Document Doc POS Tagger 26 All right reserved © 2010 Lexalytics Inc. Themes & Summaries

More Related Content

What's hot

Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextSeth Grimes
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesDerek Kane
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataSeth Grimes
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016George Roth
 
Getting Started with Unstructured Data
Getting Started with Unstructured DataGetting Started with Unstructured Data
Getting Started with Unstructured DataChristine Connors
 
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm IJECEIAES
 
Nuts and bolts
Nuts and boltsNuts and bolts
Nuts and boltsNBER
 
How many truths can you handle?
How many truths can you handle?How many truths can you handle?
How many truths can you handle?Panos Alexopoulos
 
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habibConceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habibEl Habib NFAOUI
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introductionguest0edcaf
 
Zomato eda report
Zomato eda reportZomato eda report
Zomato eda reportvidit jain
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2Sara Hooker
 
Applications: Prediction
Applications: PredictionApplications: Prediction
Applications: PredictionNBER
 
Text Analytics Applied (LIDER roadmapping presentation)
Text Analytics Applied (LIDER roadmapping presentation)Text Analytics Applied (LIDER roadmapping presentation)
Text Analytics Applied (LIDER roadmapping presentation)Seth Grimes
 
Role of Text Mining in Search Engine
Role of Text Mining in Search EngineRole of Text Mining in Search Engine
Role of Text Mining in Search EngineJay R Modi
 
Detailed Investigation of Text Classification and Clustering of Twitter Data ...
Detailed Investigation of Text Classification and Clustering of Twitter Data ...Detailed Investigation of Text Classification and Clustering of Twitter Data ...
Detailed Investigation of Text Classification and Clustering of Twitter Data ...ijtsrd
 
Troubleshooting and Optimizing Named Entity Resolution Systems in the Industry
Troubleshooting and Optimizing Named Entity Resolution Systems in the IndustryTroubleshooting and Optimizing Named Entity Resolution Systems in the Industry
Troubleshooting and Optimizing Named Entity Resolution Systems in the IndustryPanos Alexopoulos
 
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific DataEvaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific DataAM Publications
 

What's hot (20)

Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics Capabilities
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016
 
Getting Started with Unstructured Data
Getting Started with Unstructured DataGetting Started with Unstructured Data
Getting Started with Unstructured Data
 
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
 
Nuts and bolts
Nuts and boltsNuts and bolts
Nuts and bolts
 
Mind the Semantic Gap
Mind the Semantic GapMind the Semantic Gap
Mind the Semantic Gap
 
How many truths can you handle?
How many truths can you handle?How many truths can you handle?
How many truths can you handle?
 
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habibConceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
Zomato eda report
Zomato eda reportZomato eda report
Zomato eda report
 
Text analytics
Text analyticsText analytics
Text analytics
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
 
Applications: Prediction
Applications: PredictionApplications: Prediction
Applications: Prediction
 
Text Analytics Applied (LIDER roadmapping presentation)
Text Analytics Applied (LIDER roadmapping presentation)Text Analytics Applied (LIDER roadmapping presentation)
Text Analytics Applied (LIDER roadmapping presentation)
 
Role of Text Mining in Search Engine
Role of Text Mining in Search EngineRole of Text Mining in Search Engine
Role of Text Mining in Search Engine
 
Detailed Investigation of Text Classification and Clustering of Twitter Data ...
Detailed Investigation of Text Classification and Clustering of Twitter Data ...Detailed Investigation of Text Classification and Clustering of Twitter Data ...
Detailed Investigation of Text Classification and Clustering of Twitter Data ...
 
Troubleshooting and Optimizing Named Entity Resolution Systems in the Industry
Troubleshooting and Optimizing Named Entity Resolution Systems in the IndustryTroubleshooting and Optimizing Named Entity Resolution Systems in the Industry
Troubleshooting and Optimizing Named Entity Resolution Systems in the Industry
 
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific DataEvaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
 

Similar to Perfect Text Analytics Provides Fast, Useable, Consistent and Inclusive Insights

Ibm watson boston meetup may 27 2015
Ibm watson boston meetup may 27 2015Ibm watson boston meetup may 27 2015
Ibm watson boston meetup may 27 2015IBM
 
Findability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligenceFindability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligenceFindwise
 
INTELLLEX tech team hiring
INTELLLEX tech team hiringINTELLLEX tech team hiring
INTELLLEX tech team hiringEllery Sutanto
 
Wdc tech talk cooper hackathon 2015
Wdc tech talk cooper hackathon 2015Wdc tech talk cooper hackathon 2015
Wdc tech talk cooper hackathon 2015IBM
 
IBM Cognitive platform: IBM Watson
IBM Cognitive platform: IBM WatsonIBM Cognitive platform: IBM Watson
IBM Cognitive platform: IBM WatsonDaniela Zuppini
 
Get Started With Python Language.pdf
Get Started With Python Language.pdfGet Started With Python Language.pdf
Get Started With Python Language.pdfCerebrum Infotech
 
Intro to watson bluemix services
Intro to watson bluemix servicesIntro to watson bluemix services
Intro to watson bluemix servicesVikas Manoria
 
Natural language processing
Natural language processingNatural language processing
Natural language processingJanu Jahnavi
 
Taking A Look At Web Services
Taking A Look At Web ServicesTaking A Look At Web Services
Taking A Look At Web ServicesStacey Cruz
 
Steve Mills - Your Cognitive Future
Steve Mills - Your Cognitive FutureSteve Mills - Your Cognitive Future
Steve Mills - Your Cognitive FutureSogetiLabs
 
ChatGPT Deck.pptx
ChatGPT Deck.pptxChatGPT Deck.pptx
ChatGPT Deck.pptxomornahid1
 
IBM cognitive service introduction
IBM cognitive service introductionIBM cognitive service introduction
IBM cognitive service introductionHui Wen Han
 
Cognitive Computing
Cognitive ComputingCognitive Computing
Cognitive ComputingPietro Leo
 
ICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPTICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPTDr. Haxel Consult
 
Internet vs intranet vs extranet
Internet vs intranet vs extranetInternet vs intranet vs extranet
Internet vs intranet vs extranetTej Kiran
 
Content Analytics for Better Search
Content Analytics for Better SearchContent Analytics for Better Search
Content Analytics for Better SearchSeth Grimes
 
Mamba Media - Decoding Digital Language
Mamba Media - Decoding Digital LanguageMamba Media - Decoding Digital Language
Mamba Media - Decoding Digital LanguageMamba Media
 
MambaMedia-decoding-digital
MambaMedia-decoding-digitalMambaMedia-decoding-digital
MambaMedia-decoding-digitalRobin Cormack
 

Similar to Perfect Text Analytics Provides Fast, Useable, Consistent and Inclusive Insights (20)

Ibm watson boston meetup may 27 2015
Ibm watson boston meetup may 27 2015Ibm watson boston meetup may 27 2015
Ibm watson boston meetup may 27 2015
 
Findability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligenceFindability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligence
 
INTELLLEX tech team hiring
INTELLLEX tech team hiringINTELLLEX tech team hiring
INTELLLEX tech team hiring
 
Wdc tech talk cooper hackathon 2015
Wdc tech talk cooper hackathon 2015Wdc tech talk cooper hackathon 2015
Wdc tech talk cooper hackathon 2015
 
IBM Cognitive platform: IBM Watson
IBM Cognitive platform: IBM WatsonIBM Cognitive platform: IBM Watson
IBM Cognitive platform: IBM Watson
 
Get Started With Python Language.pdf
Get Started With Python Language.pdfGet Started With Python Language.pdf
Get Started With Python Language.pdf
 
Intro to watson bluemix services
Intro to watson bluemix servicesIntro to watson bluemix services
Intro to watson bluemix services
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Taking A Look At Web Services
Taking A Look At Web ServicesTaking A Look At Web Services
Taking A Look At Web Services
 
Steve Mills - Your Cognitive Future
Steve Mills - Your Cognitive FutureSteve Mills - Your Cognitive Future
Steve Mills - Your Cognitive Future
 
ChatGPT Deck.pptx
ChatGPT Deck.pptxChatGPT Deck.pptx
ChatGPT Deck.pptx
 
IBM cognitive service introduction
IBM cognitive service introductionIBM cognitive service introduction
IBM cognitive service introduction
 
iadaatpa gala boston
iadaatpa gala bostoniadaatpa gala boston
iadaatpa gala boston
 
Cognitive Computing
Cognitive ComputingCognitive Computing
Cognitive Computing
 
ICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPTICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPT
 
Symfony2
Symfony2Symfony2
Symfony2
 
Internet vs intranet vs extranet
Internet vs intranet vs extranetInternet vs intranet vs extranet
Internet vs intranet vs extranet
 
Content Analytics for Better Search
Content Analytics for Better SearchContent Analytics for Better Search
Content Analytics for Better Search
 
Mamba Media - Decoding Digital Language
Mamba Media - Decoding Digital LanguageMamba Media - Decoding Digital Language
Mamba Media - Decoding Digital Language
 
MambaMedia-decoding-digital
MambaMedia-decoding-digitalMambaMedia-decoding-digital
MambaMedia-decoding-digital
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Perfect Text Analytics Provides Fast, Useable, Consistent and Inclusive Insights

  • 1. Perfect Text Analytics Seth Redmore VP, Product Management
  • 2. Perfect per·fect     [adj., n. pur-fikt; v. per-fekt] 1. conforming absolutely to the description or definition of an ideal type: a perfect sphere; a perfect gentleman. 2. excellent or complete beyond practical or theoretical improvement: There is no perfect legal code. The proportions of this temple are almost perfect. 2 All right reserved © 2010 Lexalytics Inc.
  • 3. Text Analytics The term text analytics describes a set of linguistic statistical, and machine learning techniques that model and structure the information content of textual sources. (Wikipedia) In other words, enhancing the value of text content by extracting entities, features, context, relationships and emotion. 3 All right reserved © 2010 Lexalytics Inc.
  • 4. Perfect is Fast Average Human Reading Speed: 250wpm Conservative computer reading speed: 6000 wpm/core (our speed on a moderate single core) Each core is equivalent to the reading bandwidth of 12 people. Modern machines have 8 cores. That’s just about 100 people in a box. Nice. 4 All right reserved © 2010 Lexalytics Inc.
  • 5. Perfect is Useable “I don’t like the results” is not the same as “the results are incorrect” Understanding the behavior key to usefulness Can you make better decisions? Can you make more money or save money? What is the most controversial area of text analytics? Thompson Reuters trading w/Sentiment Analysis increased Alpha (profit over market) by 80 basis points 5 All right reserved © 2010 Lexalytics Inc.
  • 6. Useable: How much can you differ? “In my shop, that up until now has relied exclusively on human coding, we consider anything below 90% to be unacceptably inaccurate…. There is no doubt that automated sentiment is getting much much better, but to suggest that people should be okay with 20% of their data being wrong is just absurd.” Katie Delahaye Payne Why is 10% “wrong” so much less absurd than 20% “wrong”? 20% Error 10% Error 6 All right reserved © 2010 Lexalytics Inc.
  • 7. Perfect is Consistent Same results for same content, every time University of Pittsburgh “Multi-Perspective Question Answering” Corpus: 535 documents, 11k+ sentences. 40 hours of training for each rater ~80% inter-rater agreement 7 All right reserved © 2010 Lexalytics Inc.
  • 8. Perfect is (new) Knowledge Discover the stuff you don’t know Text Analytics is really, really great at telling you the who, the what, and the where. Sometimes the “how” You have to supply the “why” – but that question is way easier to answer when you know the other “w’s and the h” 8 All right reserved © 2010 Lexalytics Inc.
  • 9. Perfect Includes Everything Running our top of the line software flat out across one year will cost you about $.002/document analyzed (news article sized content) (assuming 3 docs/core-second, 8 core machine) The more data the better and the greater worth your ta has 9 All right reserved © 2010 Lexalytics Inc.
  • 10. Perfect is Trainable Can you solve YOUR business problem with it? Can you optimize to suit different kinds of content and roll those results up into a single reporting system? 10 All right reserved © 2010 Lexalytics Inc.
  • 11. Perfect Text Analytics 11 All right reserved © 2010 Lexalytics Inc. Fast Useable Consistent Knowledge (that is) Inclusive Trainable
  • 12. Customer Snapshots (or, “rubber, meet road”)
  • 13. Reputation Management 13 All right reserved © 2010 Lexalytics Inc.
  • 14. Politics 14 All right reserved © 2010 Lexalytics Inc.
  • 15. Market Intelligence Client Employee User Authentication Single Sign-on External Content Providers SinglePoint Client Company User Authentication Web 2.0 Collaboration Search Results Secondary Research Suppliers User Authentication MI Analyst Text Analytics Integrated Index News & Journals NL Search Engine FIREWALL Internal Document Repository Optional Document Repository Financial analyst reports Internal research Content Processing Custom Web Crawls & Gov. Databases Trash can crawl, FTP or CD 15 All right reserved © 2010 Lexalytics Inc.
  • 16. Hospitality 16 All right reserved © 2010 Lexalytics Inc.
  • 17.
  • 18.
  • 19. QED (Quantitative and Event-Driven Trading) Banks, hedge funds.
  • 20.
  • 21. Pharma 19 All right reserved © 2010 Lexalytics Inc.
  • 23. Opinion Mining Who said what about whom? All right reserved © 2010 Lexalytics Inc. 21
  • 24. Sarcasm, Twitter Model trained to detect sarcasm Once detected, you can decide what to do with it – because actually determining the sentiment is going to be unreliable New model trained on Twitter content Moving towards a concept of text analytics driven by business logic All right reserved © 2010 Lexalytics Inc. 22
  • 25. Thesaurus-based Theme Rollup Machine generated conceptual taxonomy Gas/Electric Hybrid and EV might roll up to EV Fewer themes, but very useful to detect patterns across content All right reserved © 2010 Lexalytics Inc. 23
  • 26. Foreign Language Support French is first, followed by other Romance languages New stemmer New summarization algorithm New part-of-speech tagger Automatic language detection New sentiment/entity extraction algorithms Also applicable to vertical specific content Confidence scoring by algorithm Use business logic to meld the results All right reserved © 2010 Lexalytics Inc. 24
  • 27. Trainable Entity Sentiment New technique for entity sentiment Initial results from testing in English extremely promising Average human scoring overlap of >> 90% for scored sentences Initially used only for French 25 All right reserved © 2010 Lexalytics Inc.
  • 28. Tool Enhancements Eventually use on English content: Twitter Customer Satisfaction Others… Entity Management Toolkit Part of Speech Tagset training Using to train Salience on French Sentiment Toolkit Build your own entity sentiment models: French (first) New Sentiment Toolkit + Maximum Entropy model builder allows new Entity and Sentiment modules New EMT helps us build a new French PoS tagger Entity Extraction & Sentiment Models Fully Tagged Document Doc POS Tagger 26 All right reserved © 2010 Lexalytics Inc. Themes & Summaries
  • 29. Business Logic + TA Algorithms Content Source Search Business Logic Other TA System Sarcasm Route On Sports Finance Unknown $ ? A B C D Entity: Cisco 27 All right reserved © 2010 Lexalytics Inc. ProbabilityScores Cisco : Positive
  • 30. Summary Lots of people making money with text analytics In lots of different verticals Next 12 months brings online a whole host of features to make our software even more flexible Check out tas.lexalytics.com Check out www.lexalytics.com/lexascope All right reserved © 2010 Lexalytics Inc. 28