SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
Using Graph Theory to understand Intent & Concepts – January 2013	
  



                               tumra.com	
  
UNDERSTANDING INTENT & CONCEPTS	
  
•  Use case:
    -  Enhancing Social TV user experience
    -  Matching users to content that interests them

•  Topics we’ll cover:
    -  Natural Language Processing
    -  Graph Theory
    -  Machine Learning


                         tumra.com	
  
USE CASE ENHANCED SOCIAL TV	
  
•  Objectives:
    -  Increase engagement with content
    -  Enhance multi-channel user experience

•  We built a prototype solution:
    -  Mines unstructured data in real-time
    -  Understands:
      -  What interests individual users
      -  Entities & Concepts (People, Places, Events)


                          tumra.com	
  
THE CHALLENGE	
  


THANKS FORtoLISTENING	
  
 Help users to “follow the story” regardless of the
 news outlet, integrated web / second-screen	
  




                      tumra.com	
  
                                             Photo Credit: byrion on Flickr (cc)
THE PROBLEM	
  


Unstructured
    Data
                  Magic?!?!         Awesomeness!




                    tumra.com	
  
THE PROBLEM	
  
•  Little useful data to work with…
    -  Streams of continuous live TV
    -  Have to create metadata

•  Where did we start?
    -  Ingest several live news channels
    -  Extract whatever data was available:
      -  In-video text using OCR
      -  Subtitles / Closed Captions


                         tumra.com	
  
STEP 1 NAMED ENTITY RECOGNITION	
  


We used a simple N-Gram model for exact matches;
    then Apache Lucene for everything else…	
  




                      tumra.com	
  
EXAMPLE N.E.R.	
  

  “David Cameron and the German
Chancellor Angela Merkel meets to
 discuss the debt crisis and signal
their approval for greater eurozone
           integration.”	
  


               tumra.com	
  
EXAMPLE N.E.R.	
  

  “David Cameron and the German
Chancellor Angela Merkel meets to
 discuss the debt crisis and signal
their approval for greater eurozone
           integration.”	
  


               tumra.com	
  
INITIAL SOLUTION	
  

                       NoSQL

Unstructured
                                       Awesomeness!
    Data


                         NER




                       tumra.com	
  
OH NO!!!
 *facepalm*	
  




     Photo Credit: cesarastudillo on Flickr (cc)
DISAMBIGUATION	
  
•  Which “David Cameron”?
    -  We have many in our Knowledgebase
    -  Sportsmen, actors, painters & characters…

•  Our initial simplistic approach was naïve
    -  Works great with unambiguous matches
    -  Best-case returns top-scoring entity

•  We needed a smarter approach
                       tumra.com	
  
RECAP	
  
•  We have an effectively ‘flat’ KB of Entities
    -    “David Cameron” -> Politician (Person)
    -    “Angela Merkel” -> Politician (Person)
    -    “German Chancellor” -> Political office (Concept)
    -    “Debt” -> Economic concept (Concept)
    -    “Eurozone” -> Economic area (Place)


•  We needed a way to find relationships
   between Entities

                            tumra.com	
  
THE BIG IDEA	
  




Graphs allow us to store relationships between entities, and
graph algorithms allow us to interrogate those connections…	
  
GRAPH DATABASES	
  
                                              Graph
   Neo4J
                                               Lab

                    Apache                             Golden
                    Giraph                              Orb


… of course there are many more open-source & proprietary ones	
  
                              tumra.com	
  
SO, WHICH ONE?	
  


                       ???
… it had to be fast, scalable, active development	
  

                        tumra.com	
  
STEP 2 BUILDING RELATIONSHIPS	
  

We had 250 million Nodes, and 4 billion Edges…
great initial results but horrendously inefficient!

  Example: “David Cameron” & “Angela Merkel”	
  



                       tumra.com	
  
INITIAL IMPROVEMENTS	
  
•  We didn’t need everything… just:
    -    People: “David Cameron”, “Angela Merkel”
    -    Places: “London”, “Downing Street”, “Eurozone”
    -    Concepts: “Debt”, “President”, “Eurozone”
    -    Things: Companies, Products etc.


•  Pruned the graph using Map/Reduce

•  This reduced the number of Entities…
    -  … but we still had billions of connections
                            tumra.com	
  
EXAMPLE PEOPLE, PLACES, CONCEPTS	
  

       “David Cameron and the German
     Chancellor Angela Merkel meets to
      discuss the debt crisis and signal
     their approval for greater eurozone
                integration.”	
  


                    tumra.com	
  
EXAMPLE PEOPLE, PLACES, CONCEPTS	
  
                  	
  
             “David Cameron and the German
           Chancellor Angela Merkel meets to
            discuss the debt crisis and signal
           their approval for greater eurozone
                      integration.”	
  
Concepts                                         Places
                          People

                          tumra.com	
  
DISAMBIGUATION	
  
                                                                         Angela
                                                                         Merkel

   David
 Cameron
 (painter)                  Living
                            Person         Politician
                                                               Head of
                                                                State




   David
  Cameron                                         David
(footballer)           David
                     Cameron                     Cameron
                      (actor)                   (politician)



Possibilities: shortest path, number of common connections etc.	
  
STEP 3 SIMPLIFYING THE GRAPH	
  

Sure all that extra metadata was tasty but we didn’t
           need it all to solve the use-case…

   So we used Map/Reduce to count the common
                  connections	
  


                        tumra.com	
  
SIMPLIFIED	
  
                                                                     Angela
                                                                     Merkel

   David
 Cameron
 (painter)
                                   1
                                                                 3
                                              1
   David
  Cameron                                           David
(footballer)              David
                        Cameron                    Cameron
                         (actor)                  (politician)



       Woah … that looks a lot like Least Cost Routing problem	
  
LEAST COST PATH	
  
                                                                 Angela
                                                                 Merkel

   David
 Cameron
 (painter)
                                   1/1
                                                               1/3
                                              1/1
   David
  Cameron                                         David
(footballer)              David
                        Cameron                  Cameron
                         (actor)                (politician)



               1 / number of common connections = cost	
  
UPDATED SOLUTION	
  

                  Neo4J                      NoSQL

Unstructured
                          Disambiguation             Awesomeness!
    Data


                               NER




                             tumra.com	
  
RECAP	
  
•  Graphs allow us to interrogate relationships
    -  Disambiguate when faced with multiple possibilities
    -  Infer more about the context of what’s happening


•  Went through iterations of improvements
    -  Kept our Entity data in NoSQL = TB’s
    -  Used the Graph as an index of sorts = GB’s


•  Neo4j was a great fit for our needs

                           tumra.com	
  
STEP 4 MAKING IT WORK REAL-TIME	
  

Some queries were taking ‘seconds’ and we needed
 to go a lot faster because TV wont wait for us …

 Do we really need to check the Graph everytime?	
  



                        tumra.com	
  
ENTER MACHINE LEARNING	
  
•  We can use simple predictors to estimate
   the likelihood of Entities occurring
    -  i.e. every time we’ve looked for “David Cameron” in
       the past the best match was the Politician


•  Keeping a ‘probabilistic context’ of recent
   Entities allows us to detect shifts in topics
    -  Works especially well on News channels
    -  Reduces the demand on Graph lookups

                          tumra.com	
  
BAYES THEOREM	
  




Looks complicated, but its basically just counting & division	
  
                                                         Photo Credit: mattbuck007 on Flickr (cc)
STEP 5 MAKING IT WORK WORLDWIDE	
  


 We solved the problem for English, but what about
                 other languages?	
  




                       tumra.com	
  
LANGUAGE	
  
•  Our core Entities of ‘People’, ‘Places’, &
   ‘Concepts’ are language agnostic…

•  We needed a way to ditch ‘language’ and
   jump straight to entities…
    -  The colour ‘Red’ means the same thing regardless of
       you calling it ‘Rot’, ‘Rouge’ or ‘赤’


•  Again, Graphs could solve the problem
                          tumra.com	
  
LANGUAGE INDEPENDENT	
  
Red                                   !"#‫أ‬

                       Color:
Rouge
                        Red           赤


        Rot                     Röd
                Rojo        紅
PROBLEM SOLVED	
  


Typical response time ~30ms … relevancy improves
     over time and learns new entities ‘online’	
  




                       tumra.com	
  
FINAL SOLUTION	
  

                 Neo4J                           NoSQL

Unstructured    Language Model              Disambiguation
                                                             Awesomeness!
    Data
                         Machine Learning

                                 NER




                                 tumra.com	
  
ABOUT US	
  
•  We’ve built a product…
    -  Our ‘Digital Marketing Optimization’ platform
       improves conversion rates & customer satisfaction
       for eCommerce & Marketing campaigns
    -  Launches Q1 2013

•  What else do we do?
    -  ‘Big Data’ & ‘Data Science’ professional services
    -  Bespoke prototype & solution development


         “TUMRA” is a transliteration of the Sanskrit word for “BIG”;
        we thought it’s a great name … ( and the .COM was available )
                                   tumra.com	
  
TUMRA
                                   You?

THANKS FOR LISTENING	
  
         We’re hiring!
        Data Scientists & Developers
              work@tumra.com
                     tumra.com	
  
THANKS FOR LISTENING
    Questions?	
  
          tumra.com
      hello@tumra.com
               	
  
      twitter.com/tumra
            tumra.com	
  

Weitere ähnliche Inhalte

Andere mochten auch

Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4jWebinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4jNeo4j
 
Introduction: Relational to Graphs
Introduction: Relational to GraphsIntroduction: Relational to Graphs
Introduction: Relational to GraphsNeo4j
 
Deploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime InsightsDeploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime InsightsNeo4j
 
Digital Transformation in a Connected World
Digital Transformation in a Connected WorldDigital Transformation in a Connected World
Digital Transformation in a Connected WorldNeo4j
 
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
Neo4j   graphs in the real world - graph days d.c. - april 14, 2015Neo4j   graphs in the real world - graph days d.c. - april 14, 2015
Neo4j graphs in the real world - graph days d.c. - april 14, 2015Neo4j
 
Graphs for Enterprise Architects
Graphs for Enterprise ArchitectsGraphs for Enterprise Architects
Graphs for Enterprise ArchitectsNeo4j
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Trey Grainger
 
RDBMS to Graphs
RDBMS to GraphsRDBMS to Graphs
RDBMS to GraphsNeo4j
 
An Introduction to Graph Databases
An Introduction to Graph DatabasesAn Introduction to Graph Databases
An Introduction to Graph DatabasesInfiniteGraph
 
Using a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDMUsing a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDMNeo4j
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jDebanjan Mahata
 
Relational to Big Graph
Relational to Big GraphRelational to Big Graph
Relational to Big GraphNeo4j
 
Importing Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowImporting Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowNeo4j
 
Neo4j the Anti Crime Database
Neo4j the Anti Crime DatabaseNeo4j the Anti Crime Database
Neo4j the Anti Crime DatabaseNeo4j
 
Fraud Detection with Neo4j
Fraud Detection with Neo4jFraud Detection with Neo4j
Fraud Detection with Neo4jNeo4j
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j InternalsTobias Lindaaker
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentationjexp
 
Neo4j PartnerDay Amsterdam 2017
Neo4j PartnerDay Amsterdam 2017Neo4j PartnerDay Amsterdam 2017
Neo4j PartnerDay Amsterdam 2017Neo4j
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesMax De Marzi
 
Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater
Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater
Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater Neo4j
 

Andere mochten auch (20)

Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4jWebinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
 
Introduction: Relational to Graphs
Introduction: Relational to GraphsIntroduction: Relational to Graphs
Introduction: Relational to Graphs
 
Deploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime InsightsDeploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime Insights
 
Digital Transformation in a Connected World
Digital Transformation in a Connected WorldDigital Transformation in a Connected World
Digital Transformation in a Connected World
 
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
Neo4j   graphs in the real world - graph days d.c. - april 14, 2015Neo4j   graphs in the real world - graph days d.c. - april 14, 2015
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
 
Graphs for Enterprise Architects
Graphs for Enterprise ArchitectsGraphs for Enterprise Architects
Graphs for Enterprise Architects
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
 
RDBMS to Graphs
RDBMS to GraphsRDBMS to Graphs
RDBMS to Graphs
 
An Introduction to Graph Databases
An Introduction to Graph DatabasesAn Introduction to Graph Databases
An Introduction to Graph Databases
 
Using a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDMUsing a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDM
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
Relational to Big Graph
Relational to Big GraphRelational to Big Graph
Relational to Big Graph
 
Importing Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowImporting Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflow
 
Neo4j the Anti Crime Database
Neo4j the Anti Crime DatabaseNeo4j the Anti Crime Database
Neo4j the Anti Crime Database
 
Fraud Detection with Neo4j
Fraud Detection with Neo4jFraud Detection with Neo4j
Fraud Detection with Neo4j
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j Internals
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentation
 
Neo4j PartnerDay Amsterdam 2017
Neo4j PartnerDay Amsterdam 2017Neo4j PartnerDay Amsterdam 2017
Neo4j PartnerDay Amsterdam 2017
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater
Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater
Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater
 

Kürzlich hochgeladen

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Kürzlich hochgeladen (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

Using Graph theory to understand Intent & Concepts - Neo4j User Group (January 2013)

  • 1. Using Graph Theory to understand Intent & Concepts – January 2013   tumra.com  
  • 2. UNDERSTANDING INTENT & CONCEPTS   •  Use case: -  Enhancing Social TV user experience -  Matching users to content that interests them •  Topics we’ll cover: -  Natural Language Processing -  Graph Theory -  Machine Learning tumra.com  
  • 3. USE CASE ENHANCED SOCIAL TV   •  Objectives: -  Increase engagement with content -  Enhance multi-channel user experience •  We built a prototype solution: -  Mines unstructured data in real-time -  Understands: -  What interests individual users -  Entities & Concepts (People, Places, Events) tumra.com  
  • 4. THE CHALLENGE   THANKS FORtoLISTENING   Help users to “follow the story” regardless of the news outlet, integrated web / second-screen   tumra.com   Photo Credit: byrion on Flickr (cc)
  • 5. THE PROBLEM   Unstructured Data Magic?!?! Awesomeness! tumra.com  
  • 6. THE PROBLEM   •  Little useful data to work with… -  Streams of continuous live TV -  Have to create metadata •  Where did we start? -  Ingest several live news channels -  Extract whatever data was available: -  In-video text using OCR -  Subtitles / Closed Captions tumra.com  
  • 7. STEP 1 NAMED ENTITY RECOGNITION   We used a simple N-Gram model for exact matches; then Apache Lucene for everything else…   tumra.com  
  • 8. EXAMPLE N.E.R.   “David Cameron and the German Chancellor Angela Merkel meets to discuss the debt crisis and signal their approval for greater eurozone integration.”   tumra.com  
  • 9. EXAMPLE N.E.R.   “David Cameron and the German Chancellor Angela Merkel meets to discuss the debt crisis and signal their approval for greater eurozone integration.”   tumra.com  
  • 10. INITIAL SOLUTION   NoSQL Unstructured Awesomeness! Data NER tumra.com  
  • 11. OH NO!!! *facepalm*   Photo Credit: cesarastudillo on Flickr (cc)
  • 12. DISAMBIGUATION   •  Which “David Cameron”? -  We have many in our Knowledgebase -  Sportsmen, actors, painters & characters… •  Our initial simplistic approach was naïve -  Works great with unambiguous matches -  Best-case returns top-scoring entity •  We needed a smarter approach tumra.com  
  • 13. RECAP   •  We have an effectively ‘flat’ KB of Entities -  “David Cameron” -> Politician (Person) -  “Angela Merkel” -> Politician (Person) -  “German Chancellor” -> Political office (Concept) -  “Debt” -> Economic concept (Concept) -  “Eurozone” -> Economic area (Place) •  We needed a way to find relationships between Entities tumra.com  
  • 14. THE BIG IDEA   Graphs allow us to store relationships between entities, and graph algorithms allow us to interrogate those connections…  
  • 15. GRAPH DATABASES   Graph Neo4J Lab Apache Golden Giraph Orb … of course there are many more open-source & proprietary ones   tumra.com  
  • 16. SO, WHICH ONE?   ??? … it had to be fast, scalable, active development   tumra.com  
  • 17. STEP 2 BUILDING RELATIONSHIPS   We had 250 million Nodes, and 4 billion Edges… great initial results but horrendously inefficient! Example: “David Cameron” & “Angela Merkel”   tumra.com  
  • 18.
  • 19.
  • 20. INITIAL IMPROVEMENTS   •  We didn’t need everything… just: -  People: “David Cameron”, “Angela Merkel” -  Places: “London”, “Downing Street”, “Eurozone” -  Concepts: “Debt”, “President”, “Eurozone” -  Things: Companies, Products etc. •  Pruned the graph using Map/Reduce •  This reduced the number of Entities… -  … but we still had billions of connections tumra.com  
  • 21. EXAMPLE PEOPLE, PLACES, CONCEPTS   “David Cameron and the German Chancellor Angela Merkel meets to discuss the debt crisis and signal their approval for greater eurozone integration.”   tumra.com  
  • 22. EXAMPLE PEOPLE, PLACES, CONCEPTS     “David Cameron and the German Chancellor Angela Merkel meets to discuss the debt crisis and signal their approval for greater eurozone integration.”   Concepts Places People tumra.com  
  • 23. DISAMBIGUATION   Angela Merkel David Cameron (painter) Living Person Politician Head of State David Cameron David (footballer) David Cameron Cameron (actor) (politician) Possibilities: shortest path, number of common connections etc.  
  • 24. STEP 3 SIMPLIFYING THE GRAPH   Sure all that extra metadata was tasty but we didn’t need it all to solve the use-case… So we used Map/Reduce to count the common connections   tumra.com  
  • 25. SIMPLIFIED   Angela Merkel David Cameron (painter) 1 3 1 David Cameron David (footballer) David Cameron Cameron (actor) (politician) Woah … that looks a lot like Least Cost Routing problem  
  • 26. LEAST COST PATH   Angela Merkel David Cameron (painter) 1/1 1/3 1/1 David Cameron David (footballer) David Cameron Cameron (actor) (politician) 1 / number of common connections = cost  
  • 27. UPDATED SOLUTION   Neo4J NoSQL Unstructured Disambiguation Awesomeness! Data NER tumra.com  
  • 28. RECAP   •  Graphs allow us to interrogate relationships -  Disambiguate when faced with multiple possibilities -  Infer more about the context of what’s happening •  Went through iterations of improvements -  Kept our Entity data in NoSQL = TB’s -  Used the Graph as an index of sorts = GB’s •  Neo4j was a great fit for our needs tumra.com  
  • 29. STEP 4 MAKING IT WORK REAL-TIME   Some queries were taking ‘seconds’ and we needed to go a lot faster because TV wont wait for us … Do we really need to check the Graph everytime?   tumra.com  
  • 30. ENTER MACHINE LEARNING   •  We can use simple predictors to estimate the likelihood of Entities occurring -  i.e. every time we’ve looked for “David Cameron” in the past the best match was the Politician •  Keeping a ‘probabilistic context’ of recent Entities allows us to detect shifts in topics -  Works especially well on News channels -  Reduces the demand on Graph lookups tumra.com  
  • 31. BAYES THEOREM   Looks complicated, but its basically just counting & division   Photo Credit: mattbuck007 on Flickr (cc)
  • 32. STEP 5 MAKING IT WORK WORLDWIDE   We solved the problem for English, but what about other languages?   tumra.com  
  • 33. LANGUAGE   •  Our core Entities of ‘People’, ‘Places’, & ‘Concepts’ are language agnostic… •  We needed a way to ditch ‘language’ and jump straight to entities… -  The colour ‘Red’ means the same thing regardless of you calling it ‘Rot’, ‘Rouge’ or ‘赤’ •  Again, Graphs could solve the problem tumra.com  
  • 34. LANGUAGE INDEPENDENT   Red !"#‫أ‬ Color: Rouge Red 赤 Rot Röd Rojo 紅
  • 35. PROBLEM SOLVED   Typical response time ~30ms … relevancy improves over time and learns new entities ‘online’   tumra.com  
  • 36. FINAL SOLUTION   Neo4J NoSQL Unstructured Language Model Disambiguation Awesomeness! Data Machine Learning NER tumra.com  
  • 37. ABOUT US   •  We’ve built a product… -  Our ‘Digital Marketing Optimization’ platform improves conversion rates & customer satisfaction for eCommerce & Marketing campaigns -  Launches Q1 2013 •  What else do we do? -  ‘Big Data’ & ‘Data Science’ professional services -  Bespoke prototype & solution development “TUMRA” is a transliteration of the Sanskrit word for “BIG”; we thought it’s a great name … ( and the .COM was available ) tumra.com  
  • 38. TUMRA You? THANKS FOR LISTENING   We’re hiring! Data Scientists & Developers work@tumra.com tumra.com  
  • 39. THANKS FOR LISTENING Questions?   tumra.com hello@tumra.com   twitter.com/tumra tumra.com