SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Hadoop Based Intelligent Text Processing System
October 12, 2010
Hadoop World, NYC
Page 2
Who are we?
•Vaijanath N. Rao
•AOL
•Contact: vaijanath.rao@teamaol.com
•Rohini Uppuluri
•AOL
•Contact: rohini.uppuluri@teamaol.com
Page 3
Agenda
1. Introduction
2. Problem Statement
3. Our Intelligent Text Processing System
1. Overview
2. Detailed
3. Application(s)
4. Q and A
Page 4
Introduction
Page 5
Introduction( Continued…)
• Information Extraction - Extracting information From Text
• Part of Speech Analysis
Ex: BlackBeauty<noun> is<verb> a<det> pretty<adjective> horse<noun>
• Named Entity Extraction
Ex: The CEO <Person>Mr. A</Person> of <Location>New York</Location> based Firm
<Organization>Foo.Inc</Organization> announced its new Product
<date>today</date>
• Sentiment Analysis
Ex: Watch this film. AVATAR is an achievement in many technical departments. It is a
beautiful experience
• Sentence Detection
Ex: <Start Sentence>BlackBeauty is a pretty horse <End Sentence>
• Some Tools: OpenNLP[5], LingPipe[6], GATE[7], NLTK[8] etc
• Categorization/Classification - Categorize items into one of the predefined
classes
Ex: An article talking about some baseball match is a “Sports” article.
Page 6
Introduction (Continued…)
• Challenges
• Processing large amount of data
• Most approaches use machine learning methods
• Need to be trained on large amount of data
• Need to way to perform the computations in a scalable manner
• Domain Dependency
Page 7
Problem Statement
• What we want to do?
• Build Large Scale applications (processing text)
• Why is this useful?
• Analyze Large Content available at AOL
• Applications: User interests Mining, Ad Targeting, Personalization etc
• We need
• A Large Scale NLP System
• A Pipeline sort of architecture with users being able to plug in or out
components
• Abstraction or Transparency of the algorithms used as requested by the user
Page 8
Our Intelligent
Text Processing System
• Overview
• Pipelined Architecture
• Pluggable components
• Work Flow Manager
• Recovery Manager
• Job Manager
• Applications
• Large Scale Applications using scalable way of applying NLP Models
Page 9
Overview
Page 10
Job Manager
•Creates series of parallel and sequential dependent jobs (takes configuration
file)
•Example :
Jobs A, B, C, D, E and F
Job B depends on Job A ; Job E depends on D
•Job manager creates following Tree
•Jobs A,D and F are executed parallel
•Jobs B and E will be executed parallel depending upon there parent jobs
completion.
Page 11
Recovery Manager
•Each job writes the configuration, start time, end time (
if completed) into the status file
•Periodically checks for the status file updates to see if
any job failed, if so restarts the job, by calling the Job
manager with required configuration
Page 12
Sample Configuration
<job name="keyphrase">
<mapreduce depends="none" name="postagger">
<inputargs>input arguments as string</inputargs>
<output>$hdfsoutputLocation</output>
<jar>postagger.jar</jar>
<mainClass>com.aol.datalayer.nlp.postagger</mainClass>
</mapreduce>
<mapreduce depends="postagger" name="nounphrase">
<inputargs>input arguments as string</inputargs>
<output>$hdfsoutputlocation</output>
<jar>chunker.jar</jar>
<mainClass>com.aol.datalayer.nlp.chunker</mainClass>
</mapreduce>
</job>
Page 13
Overview
Page 14
NLP Modeling Engine
Page 15
Detailed
Page 16
Applications
Page 17
Application 1- Location Aware Contextual Advertising -
Example
Page 18
Location Aware Contextual Advertising- Overview
Page 19
Application 2- User Aware Ad Targetting - Example
This is an illustrative example and does not represent any real user
Page 20
User Aware Ad Targetting
Page 21
Conclusions
• Pipelined Architecture
• NLP System
• Large Scale Applications
• Location aware Contextual Ad Targetting
• User aware Ad targetting
Page 22
Future Work
• Developing distributed algorithms for
• POS Tagger
• Sentiment Analyzer models
• Exploring if it might be useful integrating with any
open source distributed ML/TM framework
Page 23
References
1. Part-of-Speech Tagging: en.wikipedia.org/wiki/Part-of-
speech_tagging
2. Coreference Resolution: en.wikipedia.org/wiki/Coreference
3. Named Entity Recognition:
en.wikipedia.org/wiki/Named_entity_recognition
4. Sentiment
Analysis:en.wikipedia.org/wiki/Sentiment_analysis
5. Open NLP: http://opennlp.sourceforge.net/
6. LingPipe: http://alias-i.com/lingpipe/
7. GATE: http://gate.ac.uk/ie/
8. NLTK: www.nltk.org
Page 24
Q & A
Thank You 

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (7)

Apache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystemApache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystem
 
Software Architecture: Styles
Software Architecture: StylesSoftware Architecture: Styles
Software Architecture: Styles
 
Principles of software architecture design
Principles of software architecture designPrinciples of software architecture design
Principles of software architecture design
 
Software Architecture and Design - An Overview
Software Architecture and Design - An OverviewSoftware Architecture and Design - An Overview
Software Architecture and Design - An Overview
 
Three Software Architecture Styles
Three Software Architecture StylesThree Software Architecture Styles
Three Software Architecture Styles
 
A Software Architect's View On Diagramming
A Software Architect's View On DiagrammingA Software Architect's View On Diagramming
A Software Architect's View On Diagramming
 
revenue model of paytm
revenue model of paytmrevenue model of paytm
revenue model of paytm
 

Ähnlich wie AOL - Rao & Uppuluri - Hadoop World 2010

Hadoop at Meebo: Lessons in the Real World
Hadoop at Meebo: Lessons in the Real WorldHadoop at Meebo: Lessons in the Real World
Hadoop at Meebo: Lessons in the Real World
voberoi
 
Stat 5.4 Pre Sales Demo Master
Stat 5.4 Pre Sales Demo MasterStat 5.4 Pre Sales Demo Master
Stat 5.4 Pre Sales Demo Master
reachtimsq
 
Resume_Sunil_Faroz
Resume_Sunil_FarozResume_Sunil_Faroz
Resume_Sunil_Faroz
Sunil Faroz
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
PyData
 
Nirdesh_Developer_2.0_Years_6_months_Exp
Nirdesh_Developer_2.0_Years_6_months_ExpNirdesh_Developer_2.0_Years_6_months_Exp
Nirdesh_Developer_2.0_Years_6_months_Exp
Nirdesh Kulshreshtha
 
Mohd_Shaukath_5_Exp_Datastage
Mohd_Shaukath_5_Exp_DatastageMohd_Shaukath_5_Exp_Datastage
Mohd_Shaukath_5_Exp_Datastage
Mohammed Shaukath
 

Ähnlich wie AOL - Rao & Uppuluri - Hadoop World 2010 (20)

Santhosh_ Production Support_
Santhosh_ Production Support_Santhosh_ Production Support_
Santhosh_ Production Support_
 
Workflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. TokyoWorkflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. Tokyo
 
MyResume
MyResumeMyResume
MyResume
 
JS Essence
JS EssenceJS Essence
JS Essence
 
SumitJaiswal
SumitJaiswalSumitJaiswal
SumitJaiswal
 
My C.V
My C.VMy C.V
My C.V
 
Hadoop at Meebo: Lessons in the Real World
Hadoop at Meebo: Lessons in the Real WorldHadoop at Meebo: Lessons in the Real World
Hadoop at Meebo: Lessons in the Real World
 
Resume
ResumeResume
Resume
 
Resume
ResumeResume
Resume
 
Using primavera analytics across multiple remote site locations - Oracle Prim...
Using primavera analytics across multiple remote site locations - Oracle Prim...Using primavera analytics across multiple remote site locations - Oracle Prim...
Using primavera analytics across multiple remote site locations - Oracle Prim...
 
Using primavera analytics across multiple remote site locations - Oracle Prim...
Using primavera analytics across multiple remote site locations - Oracle Prim...Using primavera analytics across multiple remote site locations - Oracle Prim...
Using primavera analytics across multiple remote site locations - Oracle Prim...
 
Stat 5.4 Pre Sales Demo Master
Stat 5.4 Pre Sales Demo MasterStat 5.4 Pre Sales Demo Master
Stat 5.4 Pre Sales Demo Master
 
Resume_Sunil_Faroz
Resume_Sunil_FarozResume_Sunil_Faroz
Resume_Sunil_Faroz
 
Getting your project off the ground (BuildStuffLt)
Getting your project off the ground (BuildStuffLt)Getting your project off the ground (BuildStuffLt)
Getting your project off the ground (BuildStuffLt)
 
RKCV
RKCVRKCV
RKCV
 
Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Untangling - fall2017 - week 9
Untangling - fall2017 - week 9
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
 
Nirdesh_Developer_2.0_Years_6_months_Exp
Nirdesh_Developer_2.0_Years_6_months_ExpNirdesh_Developer_2.0_Years_6_months_Exp
Nirdesh_Developer_2.0_Years_6_months_Exp
 
Mohd_Shaukath_5_Exp_Datastage
Mohd_Shaukath_5_Exp_DatastageMohd_Shaukath_5_Exp_Datastage
Mohd_Shaukath_5_Exp_Datastage
 
Ranjit gupta(mainframe 6.1 years)
Ranjit gupta(mainframe 6.1 years)Ranjit gupta(mainframe 6.1 years)
Ranjit gupta(mainframe 6.1 years)
 

Mehr von Cloudera, Inc.

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Kürzlich hochgeladen

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

AOL - Rao & Uppuluri - Hadoop World 2010

  • 1. Hadoop Based Intelligent Text Processing System October 12, 2010 Hadoop World, NYC
  • 2. Page 2 Who are we? •Vaijanath N. Rao •AOL •Contact: vaijanath.rao@teamaol.com •Rohini Uppuluri •AOL •Contact: rohini.uppuluri@teamaol.com
  • 3. Page 3 Agenda 1. Introduction 2. Problem Statement 3. Our Intelligent Text Processing System 1. Overview 2. Detailed 3. Application(s) 4. Q and A
  • 5. Page 5 Introduction( Continued…) • Information Extraction - Extracting information From Text • Part of Speech Analysis Ex: BlackBeauty<noun> is<verb> a<det> pretty<adjective> horse<noun> • Named Entity Extraction Ex: The CEO <Person>Mr. A</Person> of <Location>New York</Location> based Firm <Organization>Foo.Inc</Organization> announced its new Product <date>today</date> • Sentiment Analysis Ex: Watch this film. AVATAR is an achievement in many technical departments. It is a beautiful experience • Sentence Detection Ex: <Start Sentence>BlackBeauty is a pretty horse <End Sentence> • Some Tools: OpenNLP[5], LingPipe[6], GATE[7], NLTK[8] etc • Categorization/Classification - Categorize items into one of the predefined classes Ex: An article talking about some baseball match is a “Sports” article.
  • 6. Page 6 Introduction (Continued…) • Challenges • Processing large amount of data • Most approaches use machine learning methods • Need to be trained on large amount of data • Need to way to perform the computations in a scalable manner • Domain Dependency
  • 7. Page 7 Problem Statement • What we want to do? • Build Large Scale applications (processing text) • Why is this useful? • Analyze Large Content available at AOL • Applications: User interests Mining, Ad Targeting, Personalization etc • We need • A Large Scale NLP System • A Pipeline sort of architecture with users being able to plug in or out components • Abstraction or Transparency of the algorithms used as requested by the user
  • 8. Page 8 Our Intelligent Text Processing System • Overview • Pipelined Architecture • Pluggable components • Work Flow Manager • Recovery Manager • Job Manager • Applications • Large Scale Applications using scalable way of applying NLP Models
  • 10. Page 10 Job Manager •Creates series of parallel and sequential dependent jobs (takes configuration file) •Example : Jobs A, B, C, D, E and F Job B depends on Job A ; Job E depends on D •Job manager creates following Tree •Jobs A,D and F are executed parallel •Jobs B and E will be executed parallel depending upon there parent jobs completion.
  • 11. Page 11 Recovery Manager •Each job writes the configuration, start time, end time ( if completed) into the status file •Periodically checks for the status file updates to see if any job failed, if so restarts the job, by calling the Job manager with required configuration
  • 12. Page 12 Sample Configuration <job name="keyphrase"> <mapreduce depends="none" name="postagger"> <inputargs>input arguments as string</inputargs> <output>$hdfsoutputLocation</output> <jar>postagger.jar</jar> <mainClass>com.aol.datalayer.nlp.postagger</mainClass> </mapreduce> <mapreduce depends="postagger" name="nounphrase"> <inputargs>input arguments as string</inputargs> <output>$hdfsoutputlocation</output> <jar>chunker.jar</jar> <mainClass>com.aol.datalayer.nlp.chunker</mainClass> </mapreduce> </job>
  • 17. Page 17 Application 1- Location Aware Contextual Advertising - Example
  • 18. Page 18 Location Aware Contextual Advertising- Overview
  • 19. Page 19 Application 2- User Aware Ad Targetting - Example This is an illustrative example and does not represent any real user
  • 20. Page 20 User Aware Ad Targetting
  • 21. Page 21 Conclusions • Pipelined Architecture • NLP System • Large Scale Applications • Location aware Contextual Ad Targetting • User aware Ad targetting
  • 22. Page 22 Future Work • Developing distributed algorithms for • POS Tagger • Sentiment Analyzer models • Exploring if it might be useful integrating with any open source distributed ML/TM framework
  • 23. Page 23 References 1. Part-of-Speech Tagging: en.wikipedia.org/wiki/Part-of- speech_tagging 2. Coreference Resolution: en.wikipedia.org/wiki/Coreference 3. Named Entity Recognition: en.wikipedia.org/wiki/Named_entity_recognition 4. Sentiment Analysis:en.wikipedia.org/wiki/Sentiment_analysis 5. Open NLP: http://opennlp.sourceforge.net/ 6. LingPipe: http://alias-i.com/lingpipe/ 7. GATE: http://gate.ac.uk/ie/ 8. NLTK: www.nltk.org
  • 24. Page 24 Q & A Thank You 