SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Discovering Emerging
Technology Through
Graph Analysis
GraphConnect | Chicago
June 2013
About Me
henry74@gmail.com || henry.hwangbo@us.pwc.com
@henry74
henry74
Founder / Director of PwC's Emerging Tech Lab
What is the Emerging Tech Lab?
We build stuff to help people get smart about applying technology to
solve problems
● Founded 3 years ago to identify and experiment with new
technologies relevant to but not widely adopted by the Enterprise
● Focuses on rapid prototyping & MVP build-outs for both
tactical internal projects and more creative, exploratory ideas
● Permanent core team, but operates a rotational program for
staff to provide them an opportunity for hands-on technical
experience, learning agile & lean principles, and exposure to a
startup-like environment
The Challenge
It usually starts with an idea…
“Build a platform to help discover emerging technologies.”
…followed by some pretty mock-ups…
…to raise expectations.
Envisioning success
● What are some emerging
technologies?
● How are they being used to solve
real problems?
● Who is talking about them?
● Who are the players?
● Are there related technologies?
● Get up to speed quickly
● Discover related topics
● Understand what is trending
● Find interesting applications
● See what's possible
What makes technology “emerging”?
● Cannot already be mainstream technology
● Needs to be more than a single event to be an emerging trend
● Must be growing in popularity, but not yet popular
● "Technology" could be a thing (e.g. nanotubes), but also an
aggregation or application of technologies (e.g. cloud
computing, quantified self)
The Journey
Initial design
Data Feeds
(RSS)
Pull &
Store Raw
Data
MongoDB
Analyze VisualizeSource
?
Postgres
Breaking ground
● Natural Language Processing
● Named Entity Recognition
● ???
● ???
● ???
● ???
● ???
Extract Text
Understand
Text
Discover
Insights
A bit more clarity
Data Feeds
(RSS)
Pull &
Store Raw
Data
MongoDB
Analyze VisualizeSource
?
3rd Party
APIs
Tag &
Update
Postgres
Digging a little deeper
● Natural Language Processing
● Named Entity Recognition
● Collocation?
● K-means clustering?
● Information Ontology?
● ???
● ???
Extract Text
Understand
Text
Discover
Insights
The Eureka moment...
…took a bit longer than it should have
Graphs are everywhere
Final design
Data Feeds
(RSS)
Pull &
Store Raw
Data
MongoDB
Analyze VisualizeSource
3rd Party
API
Tag &
Update
Neo4j Postgres
Lesson #1 - Graph data modeling is iterative
What should be a node, relationship, or a property? Depends on:
● What will you search on?
● How do you start your searches?
● How much data do you expect to have? What data?
Expect to change your graph based on:
● Experimentation
● Query syntax available to extract and aggregate graph data
● Query performance
TIP: Plan to reload your graph many times - save the raw data, start small,
use batch loading until you get it right
…but more flexible than traditional data modeling
Modeling the data
DO
C
P
P
C
K
K
C
T
C
DOC
P
P
C
K
K
O
T
Document are described by its
entities, concepts, and keywords
through relationships
This means:
● Document are related to other
documents through shared
entities, concepts, and keywords
● Concepts and entities are related
to each other through shared
documents
● Incoming relationships measures
# of referring documents
Simple, yet powerful
TAGGED_AS
RELATES_TO
REFERS_TO
CONTAINS
REFERS_TO
Lesson #2 - Connections are important
Highly connected data creates richer
graphs and increases potential for
discovering greater insights
BUT unnecessary connections can
create noise & extra work
Don't create artificial connections, but clean up data before importing when it
makes sense (e.g. networking, networks, network)
Prevent duplication which can impact your insights based on aggregation (e.g.
# of relationships) or certain patterns
Keeping it clean
Techniques Graph Benefits
Text extraction with
readability scoring
● Better named entity extraction
● Improve neighbor relevance
● Minimize invalid nodes & relationships
Similarity Hashing
● Improve validity of relationships
● Increase graph connectedness
Porter Stemming ● Improve graph connectedness
Lesson #3 - Understand Cypher
● Cypher experimentation opens up the possible
● SQL users will be at home - tabular results, similar
syntax
● Start without parameters, check with Neo4j shell,
move to parameterized queries for security &
performance (caching)
● Don't forget Lucene syntax
● Continues to evolve for the better - check new release
changes (http://docs.neo4j.org/refcard/1.9/)
● Let Cypher do the work
Useful Cypher Syntax
START with an index
MATCH defines your universe
WHERE filters it down
WITH combines multiple statements
HAS checks if a property exists
AS lets you name your return values
IN checks against an array
COLLECT aggregates into an array
ORDER just like SQL
LIMIT for performance
Prototype highlights
● 4 people & 4 months (first version)
● Data Stores - Neo4J, MongoDB, Redis, Postgres
● Visuals - D3.js, Vivagraph.js, Twitter Bootstrap
● Key Languages/Libraries - Ruby, Rails, Cypher,
Knockout.js, Amplify.js, HTML5, CSS3, jQuery,
Neography gem, Resque gem
● 3rd Party - Alchemy, OpenCalais, RSS feeds,
Wikipedia
● Concepts - natural language processing, named
entity extraction, text cleansing & de-duplication
(map/reduce), similarity hashing, large-scale
information retrieval
● 1M+ nodes, 3M+ relationships, 6M+ properties after
6 months
Emerging Tech Radar Demo
Tag Cloud / Search
DOC C
K
K
C
DOC
C
K
K
DOC
DOC
DOC
DOC
● Index keywords and search across keywords (tip: use Lucene syntax)
● Identify documents with strong relationships to keywords
● Locate concepts with strongest relationships to relevant documents
● Popularity based on number of incoming relationships
Emerging Index / Popularity / Doc List
DO
C
CDOC
(E)
OC
DOC
(NE)
DOC
(E)
DOC
(E)
DOC
(NE)
DOC
(E)
DOC
(NE)
DOC
(E)
Cloud computing (Concept) and Google (Org)
● Strong relationships with documents shared between concepts to filter
and rank relevant content
● Ratio and strength of relationships to quantify emerging index
● Popularity based on number of incoming relationships of each type of
document (emerging versus non-emerging)
Node Graph
DO
C
CK DOC OC
DOC
DOC
DOC
DOC DOC
DOC
● Existing relationships with documents shared between concepts to
filter relevant neighbors
● Strength of relationships based on # and weight for ranking relevance
(color)
C
The Takeaway
Final Thoughts
● Graphs makes it simple to generate complex insights - you don't
need to be a data scientist
● Graphs are a natural fit for anything connected...which is most
things (e.g. social media, internet of things, sensor data)
● Experimentation is the best way to learn the power of graphs
● Make graph databases a first class citizen in your technology
toolkit - many things can be solved better with a graph
The best way to discover emerging technologies is to try
them out
Thanks for Listening - Q & A
Special thanks to Max De Marzi for his neography gem (https://github.
com/maxdemarzi/neography) and ongoing advice, suggestions,
troubleshooting

Weitere ähnliche Inhalte

Was ist angesagt?

Approaches to text analysis
Approaches to text analysisApproaches to text analysis
Approaches to text analysisSigmoid
 
Indexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .netIndexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .netStephen Lorello
 
A view from the ivory tower: Participating in Apache as a member of academia
A view from the ivory tower: Participating in Apache as a member of academiaA view from the ivory tower: Participating in Apache as a member of academia
A view from the ivory tower: Participating in Apache as a member of academiaMichael Mior
 
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jWilliam Lyon
 
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Databricks
 
Sprint_1_Python_vs_R
Sprint_1_Python_vs_RSprint_1_Python_vs_R
Sprint_1_Python_vs_RBobSmith712
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelTrey Grainger
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkSimon Hughes
 
R programming for psychometrics
R programming for psychometricsR programming for psychometrics
R programming for psychometricsDiane Talley
 
Interleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904LabsInterleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904LabsJohn T. Kane
 
User behaviour modeling for data prefetching in web applications
User behaviour modeling for data prefetching in web applicationsUser behaviour modeling for data prefetching in web applications
User behaviour modeling for data prefetching in web applicationsKacper Łukawski
 

Was ist angesagt? (12)

Approaches to text analysis
Approaches to text analysisApproaches to text analysis
Approaches to text analysis
 
Indexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .netIndexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .net
 
Neo4jrb
Neo4jrbNeo4jrb
Neo4jrb
 
A view from the ivory tower: Participating in Apache as a member of academia
A view from the ivory tower: Participating in Apache as a member of academiaA view from the ivory tower: Participating in Apache as a member of academia
A view from the ivory tower: Participating in Apache as a member of academia
 
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
 
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
 
Sprint_1_Python_vs_R
Sprint_1_Python_vs_RSprint_1_Python_vs_R
Sprint_1_Python_vs_R
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis Panel
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
R programming for psychometrics
R programming for psychometricsR programming for psychometrics
R programming for psychometrics
 
Interleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904LabsInterleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904Labs
 
User behaviour modeling for data prefetching in web applications
User behaviour modeling for data prefetching in web applicationsUser behaviour modeling for data prefetching in web applications
User behaviour modeling for data prefetching in web applications
 

Ähnlich wie Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013

SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 CareerBuilder.com
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
Data science nlp_resume-2018-abridged
Data science nlp_resume-2018-abridgedData science nlp_resume-2018-abridged
Data science nlp_resume-2018-abridgedRangarajan Chari
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge GraphTrey Grainger
 
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge WorkersExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge WorkersCarlos Toxtli
 
Getting started with Graph Databases & Neo4j
Getting started with Graph Databases & Neo4jGetting started with Graph Databases & Neo4j
Getting started with Graph Databases & Neo4jSuroor Wijdan
 
The Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge GraphThe Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge GraphTrey Grainger
 
General introduction to AI ML DL DS
General introduction to AI ML DL DSGeneral introduction to AI ML DL DS
General introduction to AI ML DL DSRoopesh Kohad
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesKonstantinos Xirogiannopoulos
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesPyData
 
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...VMware Tanzu
 
Elasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalElasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalJoachim Draeger
 
SEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentationSEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentationSemLib Project
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systemsTrey Grainger
 
Multiplaform Solution for Graph Datasources
Multiplaform Solution for Graph DatasourcesMultiplaform Solution for Graph Datasources
Multiplaform Solution for Graph DatasourcesStratio
 
The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphTrey Grainger
 
Which Questions We Should Have
Which Questions We Should HaveWhich Questions We Should Have
Which Questions We Should HaveOracle Korea
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"Rob Winters
 

Ähnlich wie Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013 (20)

SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Data science nlp_resume-2018-abridged
Data science nlp_resume-2018-abridgedData science nlp_resume-2018-abridged
Data science nlp_resume-2018-abridged
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge WorkersExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
 
L15.pptx
L15.pptxL15.pptx
L15.pptx
 
Getting started with Graph Databases & Neo4j
Getting started with Graph Databases & Neo4jGetting started with Graph Databases & Neo4j
Getting started with Graph Databases & Neo4j
 
The Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge GraphThe Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge Graph
 
General introduction to AI ML DL DS
General introduction to AI ML DL DSGeneral introduction to AI ML DL DS
General introduction to AI ML DL DS
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
 
Elasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalElasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ Signal
 
SEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentationSEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentation
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
 
Multiplaform Solution for Graph Datasources
Multiplaform Solution for Graph DatasourcesMultiplaform Solution for Graph Datasources
Multiplaform Solution for Graph Datasources
 
RamaRaju_Profile
RamaRaju_ProfileRamaRaju_Profile
RamaRaju_Profile
 
The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge Graph
 
Which Questions We Should Have
Which Questions We Should HaveWhich Questions We Should Have
Which Questions We Should Have
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
 

Mehr von Neo4j

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansNeo4j
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...Neo4j
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosNeo4j
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Neo4j
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j
 

Mehr von Neo4j (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 

Kürzlich hochgeladen

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Kürzlich hochgeladen (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013

  • 1. Discovering Emerging Technology Through Graph Analysis GraphConnect | Chicago June 2013
  • 2. About Me henry74@gmail.com || henry.hwangbo@us.pwc.com @henry74 henry74 Founder / Director of PwC's Emerging Tech Lab
  • 3. What is the Emerging Tech Lab? We build stuff to help people get smart about applying technology to solve problems ● Founded 3 years ago to identify and experiment with new technologies relevant to but not widely adopted by the Enterprise ● Focuses on rapid prototyping & MVP build-outs for both tactical internal projects and more creative, exploratory ideas ● Permanent core team, but operates a rotational program for staff to provide them an opportunity for hands-on technical experience, learning agile & lean principles, and exposure to a startup-like environment
  • 5. It usually starts with an idea… “Build a platform to help discover emerging technologies.”
  • 6. …followed by some pretty mock-ups… …to raise expectations.
  • 7. Envisioning success ● What are some emerging technologies? ● How are they being used to solve real problems? ● Who is talking about them? ● Who are the players? ● Are there related technologies? ● Get up to speed quickly ● Discover related topics ● Understand what is trending ● Find interesting applications ● See what's possible
  • 8. What makes technology “emerging”? ● Cannot already be mainstream technology ● Needs to be more than a single event to be an emerging trend ● Must be growing in popularity, but not yet popular ● "Technology" could be a thing (e.g. nanotubes), but also an aggregation or application of technologies (e.g. cloud computing, quantified self)
  • 10. Initial design Data Feeds (RSS) Pull & Store Raw Data MongoDB Analyze VisualizeSource ? Postgres
  • 11. Breaking ground ● Natural Language Processing ● Named Entity Recognition ● ??? ● ??? ● ??? ● ??? ● ??? Extract Text Understand Text Discover Insights
  • 12. A bit more clarity Data Feeds (RSS) Pull & Store Raw Data MongoDB Analyze VisualizeSource ? 3rd Party APIs Tag & Update Postgres
  • 13. Digging a little deeper ● Natural Language Processing ● Named Entity Recognition ● Collocation? ● K-means clustering? ● Information Ontology? ● ??? ● ??? Extract Text Understand Text Discover Insights
  • 14. The Eureka moment... …took a bit longer than it should have Graphs are everywhere
  • 15. Final design Data Feeds (RSS) Pull & Store Raw Data MongoDB Analyze VisualizeSource 3rd Party API Tag & Update Neo4j Postgres
  • 16.
  • 17. Lesson #1 - Graph data modeling is iterative What should be a node, relationship, or a property? Depends on: ● What will you search on? ● How do you start your searches? ● How much data do you expect to have? What data? Expect to change your graph based on: ● Experimentation ● Query syntax available to extract and aggregate graph data ● Query performance TIP: Plan to reload your graph many times - save the raw data, start small, use batch loading until you get it right …but more flexible than traditional data modeling
  • 18. Modeling the data DO C P P C K K C T C DOC P P C K K O T Document are described by its entities, concepts, and keywords through relationships This means: ● Document are related to other documents through shared entities, concepts, and keywords ● Concepts and entities are related to each other through shared documents ● Incoming relationships measures # of referring documents Simple, yet powerful TAGGED_AS RELATES_TO REFERS_TO CONTAINS REFERS_TO
  • 19. Lesson #2 - Connections are important Highly connected data creates richer graphs and increases potential for discovering greater insights BUT unnecessary connections can create noise & extra work Don't create artificial connections, but clean up data before importing when it makes sense (e.g. networking, networks, network) Prevent duplication which can impact your insights based on aggregation (e.g. # of relationships) or certain patterns
  • 20. Keeping it clean Techniques Graph Benefits Text extraction with readability scoring ● Better named entity extraction ● Improve neighbor relevance ● Minimize invalid nodes & relationships Similarity Hashing ● Improve validity of relationships ● Increase graph connectedness Porter Stemming ● Improve graph connectedness
  • 21. Lesson #3 - Understand Cypher ● Cypher experimentation opens up the possible ● SQL users will be at home - tabular results, similar syntax ● Start without parameters, check with Neo4j shell, move to parameterized queries for security & performance (caching) ● Don't forget Lucene syntax ● Continues to evolve for the better - check new release changes (http://docs.neo4j.org/refcard/1.9/) ● Let Cypher do the work
  • 22. Useful Cypher Syntax START with an index MATCH defines your universe WHERE filters it down WITH combines multiple statements HAS checks if a property exists AS lets you name your return values IN checks against an array COLLECT aggregates into an array ORDER just like SQL LIMIT for performance
  • 23. Prototype highlights ● 4 people & 4 months (first version) ● Data Stores - Neo4J, MongoDB, Redis, Postgres ● Visuals - D3.js, Vivagraph.js, Twitter Bootstrap ● Key Languages/Libraries - Ruby, Rails, Cypher, Knockout.js, Amplify.js, HTML5, CSS3, jQuery, Neography gem, Resque gem ● 3rd Party - Alchemy, OpenCalais, RSS feeds, Wikipedia ● Concepts - natural language processing, named entity extraction, text cleansing & de-duplication (map/reduce), similarity hashing, large-scale information retrieval ● 1M+ nodes, 3M+ relationships, 6M+ properties after 6 months
  • 25. Tag Cloud / Search DOC C K K C DOC C K K DOC DOC DOC DOC ● Index keywords and search across keywords (tip: use Lucene syntax) ● Identify documents with strong relationships to keywords ● Locate concepts with strongest relationships to relevant documents ● Popularity based on number of incoming relationships
  • 26. Emerging Index / Popularity / Doc List DO C CDOC (E) OC DOC (NE) DOC (E) DOC (E) DOC (NE) DOC (E) DOC (NE) DOC (E) Cloud computing (Concept) and Google (Org) ● Strong relationships with documents shared between concepts to filter and rank relevant content ● Ratio and strength of relationships to quantify emerging index ● Popularity based on number of incoming relationships of each type of document (emerging versus non-emerging)
  • 27. Node Graph DO C CK DOC OC DOC DOC DOC DOC DOC DOC ● Existing relationships with documents shared between concepts to filter relevant neighbors ● Strength of relationships based on # and weight for ranking relevance (color) C
  • 29. Final Thoughts ● Graphs makes it simple to generate complex insights - you don't need to be a data scientist ● Graphs are a natural fit for anything connected...which is most things (e.g. social media, internet of things, sensor data) ● Experimentation is the best way to learn the power of graphs ● Make graph databases a first class citizen in your technology toolkit - many things can be solved better with a graph The best way to discover emerging technologies is to try them out
  • 30. Thanks for Listening - Q & A Special thanks to Max De Marzi for his neography gem (https://github. com/maxdemarzi/neography) and ongoing advice, suggestions, troubleshooting