SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
Using Graph Theory to understand Intent & Concepts – January 2013	
  



                               tumra.com	
  
UNDERSTANDING INTENT & CONCEPTS	
  
•  Use case:
    -  Enhancing Social TV user experience
    -  Matching users to content that interests them

•  Topics we’ll cover:
    -  Natural Language Processing
    -  Graph Theory
    -  Machine Learning


                         tumra.com	
  
USE CASE ENHANCED SOCIAL TV	
  
•  Objectives:
    -  Increase engagement with content
    -  Enhance multi-channel user experience

•  We built a prototype solution:
    -  Mines unstructured data in real-time
    -  Understands:
      -  What interests individual users
      -  Entities & Concepts (People, Places, Events)


                          tumra.com	
  
THE CHALLENGE	
  


THANKS FORtoLISTENING	
  
 Help users to “follow the story” regardless of the
 news outlet, integrated web / second-screen	
  




                      tumra.com	
  
                                             Photo Credit: byrion on Flickr (cc)
THE PROBLEM	
  


Unstructured
    Data
                  Magic?!?!         Awesomeness!




                    tumra.com	
  
THE PROBLEM	
  
•  Little useful data to work with…
    -  Streams of continuous live TV
    -  Have to create metadata

•  Where did we start?
    -  Ingest several live news channels
    -  Extract whatever data was available:
      -  In-video text using OCR
      -  Subtitles / Closed Captions


                         tumra.com	
  
STEP 1 NAMED ENTITY RECOGNITION	
  


We used a simple N-Gram model for exact matches;
    then Apache Lucene for everything else…	
  




                      tumra.com	
  
EXAMPLE N.E.R.	
  

  “David Cameron and the German
Chancellor Angela Merkel meets to
 discuss the debt crisis and signal
their approval for greater eurozone
           integration.”	
  


               tumra.com	
  
EXAMPLE N.E.R.	
  

  “David Cameron and the German
Chancellor Angela Merkel meets to
 discuss the debt crisis and signal
their approval for greater eurozone
           integration.”	
  


               tumra.com	
  
INITIAL SOLUTION	
  

                       NoSQL

Unstructured
                                       Awesomeness!
    Data


                         NER




                       tumra.com	
  
OH NO!!!
 *facepalm*	
  




     Photo Credit: cesarastudillo on Flickr (cc)
DISAMBIGUATION	
  
•  Which “David Cameron”?
    -  We have many in our Knowledgebase
    -  Sportsmen, actors, painters & characters…

•  Our initial simplistic approach was naïve
    -  Works great with unambiguous matches
    -  Best-case returns top-scoring entity

•  We needed a smarter approach
                       tumra.com	
  
RECAP	
  
•  We have an effectively ‘flat’ KB of Entities
    -    “David Cameron” -> Politician (Person)
    -    “Angela Merkel” -> Politician (Person)
    -    “German Chancellor” -> Political office (Concept)
    -    “Debt” -> Economic concept (Concept)
    -    “Eurozone” -> Economic area (Place)


•  We needed a way to find relationships
   between Entities

                            tumra.com	
  
THE BIG IDEA	
  




Graphs allow us to store relationships between entities, and
graph algorithms allow us to interrogate those connections…	
  
GRAPH DATABASES	
  
                                              Graph
   Neo4J
                                               Lab

                    Apache                             Golden
                    Giraph                              Orb


… of course there are many more open-source & proprietary ones	
  
                              tumra.com	
  
SO, WHICH ONE?	
  


                       ???
… it had to be fast, scalable, active development	
  

                        tumra.com	
  
STEP 2 BUILDING RELATIONSHIPS	
  

We had 250 million Nodes, and 4 billion Edges…
great initial results but horrendously inefficient!

  Example: “David Cameron” & “Angela Merkel”	
  



                       tumra.com	
  
INITIAL IMPROVEMENTS	
  
•  We didn’t need everything… just:
    -    People: “David Cameron”, “Angela Merkel”
    -    Places: “London”, “Downing Street”, “Eurozone”
    -    Concepts: “Debt”, “President”, “Eurozone”
    -    Things: Companies, Products etc.


•  Pruned the graph using Map/Reduce

•  This reduced the number of Entities…
    -  … but we still had billions of connections
                            tumra.com	
  
EXAMPLE PEOPLE, PLACES, CONCEPTS	
  

       “David Cameron and the German
     Chancellor Angela Merkel meets to
      discuss the debt crisis and signal
     their approval for greater eurozone
                integration.”	
  


                    tumra.com	
  
EXAMPLE PEOPLE, PLACES, CONCEPTS	
  
                  	
  
             “David Cameron and the German
           Chancellor Angela Merkel meets to
            discuss the debt crisis and signal
           their approval for greater eurozone
                      integration.”	
  
Concepts                                         Places
                          People

                          tumra.com	
  
DISAMBIGUATION	
  
                                                                         Angela
                                                                         Merkel

   David
 Cameron
 (painter)                  Living
                            Person         Politician
                                                               Head of
                                                                State




   David
  Cameron                                         David
(footballer)           David
                     Cameron                     Cameron
                      (actor)                   (politician)



Possibilities: shortest path, number of common connections etc.	
  
STEP 3 SIMPLIFYING THE GRAPH	
  

Sure all that extra metadata was tasty but we didn’t
           need it all to solve the use-case…

   So we used Map/Reduce to count the common
                  connections	
  


                        tumra.com	
  
SIMPLIFIED	
  
                                                                     Angela
                                                                     Merkel

   David
 Cameron
 (painter)
                                   1
                                                                 3
                                              1
   David
  Cameron                                           David
(footballer)              David
                        Cameron                    Cameron
                         (actor)                  (politician)



       Woah … that looks a lot like Least Cost Routing problem	
  
LEAST COST PATH	
  
                                                                 Angela
                                                                 Merkel

   David
 Cameron
 (painter)
                                   1/1
                                                               1/3
                                              1/1
   David
  Cameron                                         David
(footballer)              David
                        Cameron                  Cameron
                         (actor)                (politician)



               1 / number of common connections = cost	
  
UPDATED SOLUTION	
  

                  Neo4J                      NoSQL

Unstructured
                          Disambiguation             Awesomeness!
    Data


                               NER




                             tumra.com	
  
RECAP	
  
•  Graphs allow us to interrogate relationships
    -  Disambiguate when faced with multiple possibilities
    -  Infer more about the context of what’s happening


•  Went through iterations of improvements
    -  Kept our Entity data in NoSQL = TB’s
    -  Used the Graph as an index of sorts = GB’s


•  Neo4j was a great fit for our needs

                           tumra.com	
  
STEP 4 MAKING IT WORK REAL-TIME	
  

Some queries were taking ‘seconds’ and we needed
 to go a lot faster because TV wont wait for us …

 Do we really need to check the Graph everytime?	
  



                        tumra.com	
  
ENTER MACHINE LEARNING	
  
•  We can use simple predictors to estimate
   the likelihood of Entities occurring
    -  i.e. every time we’ve looked for “David Cameron” in
       the past the best match was the Politician


•  Keeping a ‘probabilistic context’ of recent
   Entities allows us to detect shifts in topics
    -  Works especially well on News channels
    -  Reduces the demand on Graph lookups

                          tumra.com	
  
BAYES THEOREM	
  




Looks complicated, but its basically just counting & division	
  
                                                         Photo Credit: mattbuck007 on Flickr (cc)
STEP 5 MAKING IT WORK WORLDWIDE	
  


 We solved the problem for English, but what about
                 other languages?	
  




                       tumra.com	
  
LANGUAGE	
  
•  Our core Entities of ‘People’, ‘Places’, &
   ‘Concepts’ are language agnostic…

•  We needed a way to ditch ‘language’ and
   jump straight to entities…
    -  The colour ‘Red’ means the same thing regardless of
       you calling it ‘Rot’, ‘Rouge’ or ‘赤’


•  Again, Graphs could solve the problem
                          tumra.com	
  
LANGUAGE INDEPENDENT	
  
Red                                   !"#‫أ‬

                       Color:
Rouge
                        Red           赤


        Rot                     Röd
                Rojo        紅
PROBLEM SOLVED	
  


Typical response time ~30ms … relevancy improves
     over time and learns new entities ‘online’	
  




                       tumra.com	
  
FINAL SOLUTION	
  

                 Neo4J                           NoSQL

Unstructured    Language Model              Disambiguation
                                                             Awesomeness!
    Data
                         Machine Learning

                                 NER




                                 tumra.com	
  
ABOUT US	
  
•  We’ve built a product…
    -  Our ‘Digital Marketing Optimization’ platform
       improves conversion rates & customer satisfaction
       for eCommerce & Marketing campaigns
    -  Launches Q1 2013

•  What else do we do?
    -  ‘Big Data’ & ‘Data Science’ professional services
    -  Bespoke prototype & solution development


         “TUMRA” is a transliteration of the Sanskrit word for “BIG”;
        we thought it’s a great name … ( and the .COM was available )
                                   tumra.com	
  
TUMRA
                                   You?

THANKS FOR LISTENING	
  
         We’re hiring!
        Data Scientists & Developers
              work@tumra.com
                     tumra.com	
  
THANKS FOR LISTENING
    Questions?	
  
          tumra.com
      hello@tumra.com
               	
  
      twitter.com/tumra
            tumra.com	
  

Weitere ähnliche Inhalte

Andere mochten auch

Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4jWebinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4jNeo4j
 
Introduction: Relational to Graphs
Introduction: Relational to GraphsIntroduction: Relational to Graphs
Introduction: Relational to GraphsNeo4j
 
Deploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime InsightsDeploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime InsightsNeo4j
 
Digital Transformation in a Connected World
Digital Transformation in a Connected WorldDigital Transformation in a Connected World
Digital Transformation in a Connected WorldNeo4j
 
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
Neo4j   graphs in the real world - graph days d.c. - april 14, 2015Neo4j   graphs in the real world - graph days d.c. - april 14, 2015
Neo4j graphs in the real world - graph days d.c. - april 14, 2015Neo4j
 
Graphs for Enterprise Architects
Graphs for Enterprise ArchitectsGraphs for Enterprise Architects
Graphs for Enterprise ArchitectsNeo4j
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Trey Grainger
 
RDBMS to Graphs
RDBMS to GraphsRDBMS to Graphs
RDBMS to GraphsNeo4j
 
An Introduction to Graph Databases
An Introduction to Graph DatabasesAn Introduction to Graph Databases
An Introduction to Graph DatabasesInfiniteGraph
 
Using a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDMUsing a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDMNeo4j
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jDebanjan Mahata
 
Relational to Big Graph
Relational to Big GraphRelational to Big Graph
Relational to Big GraphNeo4j
 
Importing Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowImporting Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowNeo4j
 
Neo4j the Anti Crime Database
Neo4j the Anti Crime DatabaseNeo4j the Anti Crime Database
Neo4j the Anti Crime DatabaseNeo4j
 
Fraud Detection with Neo4j
Fraud Detection with Neo4jFraud Detection with Neo4j
Fraud Detection with Neo4jNeo4j
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j InternalsTobias Lindaaker
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentationjexp
 
Neo4j PartnerDay Amsterdam 2017
Neo4j PartnerDay Amsterdam 2017Neo4j PartnerDay Amsterdam 2017
Neo4j PartnerDay Amsterdam 2017Neo4j
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesMax De Marzi
 
Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater
Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater
Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater Neo4j
 

Andere mochten auch (20)

Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4jWebinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
 
Introduction: Relational to Graphs
Introduction: Relational to GraphsIntroduction: Relational to Graphs
Introduction: Relational to Graphs
 
Deploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime InsightsDeploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime Insights
 
Digital Transformation in a Connected World
Digital Transformation in a Connected WorldDigital Transformation in a Connected World
Digital Transformation in a Connected World
 
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
Neo4j   graphs in the real world - graph days d.c. - april 14, 2015Neo4j   graphs in the real world - graph days d.c. - april 14, 2015
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
 
Graphs for Enterprise Architects
Graphs for Enterprise ArchitectsGraphs for Enterprise Architects
Graphs for Enterprise Architects
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
 
RDBMS to Graphs
RDBMS to GraphsRDBMS to Graphs
RDBMS to Graphs
 
An Introduction to Graph Databases
An Introduction to Graph DatabasesAn Introduction to Graph Databases
An Introduction to Graph Databases
 
Using a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDMUsing a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDM
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
Relational to Big Graph
Relational to Big GraphRelational to Big Graph
Relational to Big Graph
 
Importing Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowImporting Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflow
 
Neo4j the Anti Crime Database
Neo4j the Anti Crime DatabaseNeo4j the Anti Crime Database
Neo4j the Anti Crime Database
 
Fraud Detection with Neo4j
Fraud Detection with Neo4jFraud Detection with Neo4j
Fraud Detection with Neo4j
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j Internals
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentation
 
Neo4j PartnerDay Amsterdam 2017
Neo4j PartnerDay Amsterdam 2017Neo4j PartnerDay Amsterdam 2017
Neo4j PartnerDay Amsterdam 2017
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater
Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater
Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater
 

Kürzlich hochgeladen

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Using Graph theory to understand Intent & Concepts - Neo4j User Group (January 2013)

  • 1. Using Graph Theory to understand Intent & Concepts – January 2013   tumra.com  
  • 2. UNDERSTANDING INTENT & CONCEPTS   •  Use case: -  Enhancing Social TV user experience -  Matching users to content that interests them •  Topics we’ll cover: -  Natural Language Processing -  Graph Theory -  Machine Learning tumra.com  
  • 3. USE CASE ENHANCED SOCIAL TV   •  Objectives: -  Increase engagement with content -  Enhance multi-channel user experience •  We built a prototype solution: -  Mines unstructured data in real-time -  Understands: -  What interests individual users -  Entities & Concepts (People, Places, Events) tumra.com  
  • 4. THE CHALLENGE   THANKS FORtoLISTENING   Help users to “follow the story” regardless of the news outlet, integrated web / second-screen   tumra.com   Photo Credit: byrion on Flickr (cc)
  • 5. THE PROBLEM   Unstructured Data Magic?!?! Awesomeness! tumra.com  
  • 6. THE PROBLEM   •  Little useful data to work with… -  Streams of continuous live TV -  Have to create metadata •  Where did we start? -  Ingest several live news channels -  Extract whatever data was available: -  In-video text using OCR -  Subtitles / Closed Captions tumra.com  
  • 7. STEP 1 NAMED ENTITY RECOGNITION   We used a simple N-Gram model for exact matches; then Apache Lucene for everything else…   tumra.com  
  • 8. EXAMPLE N.E.R.   “David Cameron and the German Chancellor Angela Merkel meets to discuss the debt crisis and signal their approval for greater eurozone integration.”   tumra.com  
  • 9. EXAMPLE N.E.R.   “David Cameron and the German Chancellor Angela Merkel meets to discuss the debt crisis and signal their approval for greater eurozone integration.”   tumra.com  
  • 10. INITIAL SOLUTION   NoSQL Unstructured Awesomeness! Data NER tumra.com  
  • 11. OH NO!!! *facepalm*   Photo Credit: cesarastudillo on Flickr (cc)
  • 12. DISAMBIGUATION   •  Which “David Cameron”? -  We have many in our Knowledgebase -  Sportsmen, actors, painters & characters… •  Our initial simplistic approach was naïve -  Works great with unambiguous matches -  Best-case returns top-scoring entity •  We needed a smarter approach tumra.com  
  • 13. RECAP   •  We have an effectively ‘flat’ KB of Entities -  “David Cameron” -> Politician (Person) -  “Angela Merkel” -> Politician (Person) -  “German Chancellor” -> Political office (Concept) -  “Debt” -> Economic concept (Concept) -  “Eurozone” -> Economic area (Place) •  We needed a way to find relationships between Entities tumra.com  
  • 14. THE BIG IDEA   Graphs allow us to store relationships between entities, and graph algorithms allow us to interrogate those connections…  
  • 15. GRAPH DATABASES   Graph Neo4J Lab Apache Golden Giraph Orb … of course there are many more open-source & proprietary ones   tumra.com  
  • 16. SO, WHICH ONE?   ??? … it had to be fast, scalable, active development   tumra.com  
  • 17. STEP 2 BUILDING RELATIONSHIPS   We had 250 million Nodes, and 4 billion Edges… great initial results but horrendously inefficient! Example: “David Cameron” & “Angela Merkel”   tumra.com  
  • 18.
  • 19.
  • 20. INITIAL IMPROVEMENTS   •  We didn’t need everything… just: -  People: “David Cameron”, “Angela Merkel” -  Places: “London”, “Downing Street”, “Eurozone” -  Concepts: “Debt”, “President”, “Eurozone” -  Things: Companies, Products etc. •  Pruned the graph using Map/Reduce •  This reduced the number of Entities… -  … but we still had billions of connections tumra.com  
  • 21. EXAMPLE PEOPLE, PLACES, CONCEPTS   “David Cameron and the German Chancellor Angela Merkel meets to discuss the debt crisis and signal their approval for greater eurozone integration.”   tumra.com  
  • 22. EXAMPLE PEOPLE, PLACES, CONCEPTS     “David Cameron and the German Chancellor Angela Merkel meets to discuss the debt crisis and signal their approval for greater eurozone integration.”   Concepts Places People tumra.com  
  • 23. DISAMBIGUATION   Angela Merkel David Cameron (painter) Living Person Politician Head of State David Cameron David (footballer) David Cameron Cameron (actor) (politician) Possibilities: shortest path, number of common connections etc.  
  • 24. STEP 3 SIMPLIFYING THE GRAPH   Sure all that extra metadata was tasty but we didn’t need it all to solve the use-case… So we used Map/Reduce to count the common connections   tumra.com  
  • 25. SIMPLIFIED   Angela Merkel David Cameron (painter) 1 3 1 David Cameron David (footballer) David Cameron Cameron (actor) (politician) Woah … that looks a lot like Least Cost Routing problem  
  • 26. LEAST COST PATH   Angela Merkel David Cameron (painter) 1/1 1/3 1/1 David Cameron David (footballer) David Cameron Cameron (actor) (politician) 1 / number of common connections = cost  
  • 27. UPDATED SOLUTION   Neo4J NoSQL Unstructured Disambiguation Awesomeness! Data NER tumra.com  
  • 28. RECAP   •  Graphs allow us to interrogate relationships -  Disambiguate when faced with multiple possibilities -  Infer more about the context of what’s happening •  Went through iterations of improvements -  Kept our Entity data in NoSQL = TB’s -  Used the Graph as an index of sorts = GB’s •  Neo4j was a great fit for our needs tumra.com  
  • 29. STEP 4 MAKING IT WORK REAL-TIME   Some queries were taking ‘seconds’ and we needed to go a lot faster because TV wont wait for us … Do we really need to check the Graph everytime?   tumra.com  
  • 30. ENTER MACHINE LEARNING   •  We can use simple predictors to estimate the likelihood of Entities occurring -  i.e. every time we’ve looked for “David Cameron” in the past the best match was the Politician •  Keeping a ‘probabilistic context’ of recent Entities allows us to detect shifts in topics -  Works especially well on News channels -  Reduces the demand on Graph lookups tumra.com  
  • 31. BAYES THEOREM   Looks complicated, but its basically just counting & division   Photo Credit: mattbuck007 on Flickr (cc)
  • 32. STEP 5 MAKING IT WORK WORLDWIDE   We solved the problem for English, but what about other languages?   tumra.com  
  • 33. LANGUAGE   •  Our core Entities of ‘People’, ‘Places’, & ‘Concepts’ are language agnostic… •  We needed a way to ditch ‘language’ and jump straight to entities… -  The colour ‘Red’ means the same thing regardless of you calling it ‘Rot’, ‘Rouge’ or ‘赤’ •  Again, Graphs could solve the problem tumra.com  
  • 34. LANGUAGE INDEPENDENT   Red !"#‫أ‬ Color: Rouge Red 赤 Rot Röd Rojo 紅
  • 35. PROBLEM SOLVED   Typical response time ~30ms … relevancy improves over time and learns new entities ‘online’   tumra.com  
  • 36. FINAL SOLUTION   Neo4J NoSQL Unstructured Language Model Disambiguation Awesomeness! Data Machine Learning NER tumra.com  
  • 37. ABOUT US   •  We’ve built a product… -  Our ‘Digital Marketing Optimization’ platform improves conversion rates & customer satisfaction for eCommerce & Marketing campaigns -  Launches Q1 2013 •  What else do we do? -  ‘Big Data’ & ‘Data Science’ professional services -  Bespoke prototype & solution development “TUMRA” is a transliteration of the Sanskrit word for “BIG”; we thought it’s a great name … ( and the .COM was available ) tumra.com  
  • 38. TUMRA You? THANKS FOR LISTENING   We’re hiring! Data Scientists & Developers work@tumra.com tumra.com  
  • 39. THANKS FOR LISTENING Questions?   tumra.com hello@tumra.com   twitter.com/tumra tumra.com