SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
GraphAware®
SIGNALS FROM OUTER
SPACE
Vlasta Kůs, Data Scientist @ GraphAware
graphaware.com
@graph_aware, @VlastaKus
How NASA Benefits from Graph-Powered NLP
‣ Database of learned knowledge across NASA’s programs & projects
‣ Unstructured text with basic metadata
‣ Collected since late 1950s (100s of millions of documents)
‣ Public dataset: ~1600 documents
NASA’s Lessons Learned
GraphAware®
"1406",420,"Roberts, J “,
"VO'75 Pressure Regulator Leakage and Work-Around
Procedures (~1976)”,
"The pressure regulator in the Viking Orbiter Propulsion
Subsystem started leaking following a pyro firing that
occurred prior to the near-Mars TCM. Likely causes were
corrosion or residue from propellant migration or pyro
valve blowby, or particulate contamination. Recommendations
included using separate regulators for the fuel and
oxidizer sides, incorporating a bellows in the pyro valve
to eliminate blowby, and adding a isolation valve between
the regulator and propellant tank.“,
" The micro-scale effects of long-term propellant exposure
should be investigated in order to better critique
regulator design. “,
"JPL",1996-07-08,"",TRUE,"",1460,7,NA,"https://
nen.nasa.gov/web/11/viewall/-/viewall/420"
NASA’s Lessons Learned Database
GraphAware®
“673",1326,"Relvini, Kristine “,
"Lessons Learned Not Being Inputted Into Lessons
Learned Information System (LLIS) Database”,
“",
"If you don't document the lessons learned, you loose
knowledgeable, shared information and tracking capacity
across programs.“,
"KSC",2002-10-11,"Aeronautics Research, Science,
Exploration Systems, Space Operations, ",FALSE,"",
702,6,NA,"https://nen.nasa.gov/web/11/viewall/-/
viewall/1326"
NASA’s Lessons Learned
GraphAware®
Graph database = isolated data silos -> connected knowledge
‣ Efficient search
‣ Relationships among various areas
Apollo, Space Shuttle, Orion, …
‣ Pattern recognition (clusters, communities, correlations, …)
Example: correlation between corrosion of valves & topics involving batteries
‣ Useful for planning future projects and preventing/solving issues
NASA’s Lessons Learned
GraphAware®
What is a Graph?
GraphAware®
G = (V, E)
WHY NEO4J?
GraphAware®
It is a proper graph database
It is a proper database
Graph-Based Architecture: Knowledge Graph
GraphAware®
EXAMPLE
GraphAware®
‣ NLP = machine learning tools allowing computers to process - and
perhaps understand - human languages
‣ Basic steps
Sentence segmentation
Tokenisation
Lemmatisation
Part of Speech (POS) tagging
Parsing
Named Entities Recognition (NER)
Sentiment analysis
…
Natural Language Processing
GraphAware®
Currently supported toolkits for human language processing
‣ Stanford CoreNLP
‣ developed at Stanford University
‣ fast, robust, production ready
‣ many pre-built models
‣ license: GPL v3+
‣ Apache OpenNLP
‣ developed by volunteers
‣ many pre-built models
‣ license: Apache License v2.0
NLP: Text Processors
GraphAware®
‣ Named Entity Recognition (NER) = classification of words into predefined
classes
‣ Examples: Dr. Who -> Person, May 2018 -> Date, EU -> Country …
‣ Stanford NLP default entities: Person, Location, Date, Organisation,
Number, Money, Percentage
‣ Custom NE classes -> training on large tokenised & labeled corpus
‣ Wikipedia, Wikidata - rich sources of multilingual training data that can
be extracted automatically
Named Entity Recognition
GraphAware®
Custom Named Entities based on Wikipedia
GraphAware®
NASA use case: identify names of space missions
Training - crawling Wikipedia & identifying relevant information
EXAMPLE
GraphAware®
Universal Dependencies: cross-linguistically consistent grammatical relations
among words in a sentence
Examples:
‣ amod (adjectival modifier)
Matt likes red wine.
‣ appos (appositional modifier)
Mars Global Surveyor (MGS) was an American robotic spacecraft …
‣ conj (conjunct)
It failed to respond to messages and commands.
‣ …
Universal Dependencies
GraphAware®
‣ Stanford CoreNLP: Dependency & Part of Speech analysis of a single sentence
Source: http://nlp.stanford.edu:8080/corenlp/process
Either find an efficient representation in some traditional database, or …
Graph-Powered NLP
GraphAware®
Graph-Powered NLP
GraphAware®
NLP and property graphs: natural fit
… use a property graph!
EXAMPLE
GraphAware®
Unsupervised techniques tend to be underestimated, while …
‣ No need for time & money to get massive labeled training datasets
‣ Often faster to train & faster to predict
‣ Unsupervised deep learning
Unsupervised ML Algorithms
GraphAware®
PageRank
GraphAware®
PageRank = a measure of importance of a web page based on the quality
of links from other pages
The formula reflects a model of a random surfer.
Source: https://en.wikipedia.org/wiki/PageRank
Keyword Extraction: TextRank
GraphAware®
Keywords = words/phrases that capture the semantic essence of a text
Graph-Based Unsupervised Algorithm:
‣ Construct a graph of word co-occurrences
‣ Asses the importance of words by PageRank algorithm
‣ Use top 1/3 of words as keyword candidates
‣ Use universal dependencies to construct key phrases
GraphAware®
Rada Mihalcea, Paul Tarau. TextRank: Bringing Order into Texts. Proceedings of EMNLP 2004, pages 404–411, Barcelona,
Spain. Association for Computational Linguistics. http://www.aclweb.org/anthology/W04-3252.
Keyword Extraction: TextRank
Despite its simplicity, TextRank provides
state of the art results on wide range of
unstructured texts.
Leveraging universal dependencies allowed
us to surpass precision & recall of the
original TextRank paper.
NASA examples: “space shuttle”, “flight hardware”, “launch vehicle”, …
Automatic text summarisation
‣ Abstractive
‣ Extractive
TextRank can be adapted for efficient
sentence ranking for extractive summarisation.
Summarisation: TextRank
GraphAware®
EXAMPLE
GraphAware®
ConceptNet 5 = semantic network for understanding the meaning of words
‣ Relational knowledge from MIT’s Open Mind Common Sense project
‣ DBPedia (information from Wikipedia info-boxes)
‣ Wiktionary (free multilingual dictionary)
‣ …
Knowledge Enrichment: ConceptNet 5
GraphAware®
Microsoft Concept Graph = semantic network introducing knowledge
about concepts
‣ harnessed from billions of web pages and years’ worth of search logs
Expand the knowledge from external or other internal sources.
Knowlege Enrichment
GraphAware®
‣ Latent Dirichlet Allocation (LDA) - generative statistical model that
describes documents as a probabilistic mixture of a small number of topics
‣ Each topic described by a list of most relevant words
‣ Sample of topics from the NASA dataset
[“design”, "failure", "test", "result", "flight", "hardware", "mission", “testing”, “system”,
“due”]
[“pressure", "system", "cause", "valve", "propellant", "leak", "operation", “shuttle”,
“space”, “gas”]
[“space”, "shuttle", "NASA", "operation", "safety", "iss", "crew", "ISS", "astronaut", "progr
am"]
Topic Extraction: Latent Dirichlet Allocation
GraphAware®
EXAMPLE
GraphAware®
‣ Word embeddings = representation of words as multi-dimensional
semantic vectors which encode linguistic patterns
‣ Use cases: word sense disambiguation, new distance functions between
documents, starting point for further ML (e.g. NN classification)
‣ Word2vec = shallow two-layer neural network model for producing word
embeddings
‣ ConceptNet Numberbatch - consists of state-of-the-art word embeddings
Word Embeddings
GraphAware®
Word Embeddings: word2vec
GraphAware®
Tomas Mikolov et al.: https://arxiv.org/abs/1301.3781
Word Embeddings: word2vec
GraphAware®
Kusner et al.: http://mkusner.github.io/publications/WMD.pdf
Document distance: min. cumulative distance that all words need to travel
Semantic patterns representable as linear translations:
distance(Oslo -> Norway) similar to distance(Berlin -> Germany)
vec(Germany) - vec(Berlin) + vec(Oslo) = vec(Norway)
Document Embeddings
GraphAware®
Q. Le, T. Mikolov: Distributed representations of sentences and documents, arXiv:1405.4053v2
Paragraph Vector (doc2vec): extension of word2vec
The additional paragraph node represents context (topic) of the current document.
Paragraph vectors have the same behaviour towards linear vector translations as
word vectors.
Document Embeddings
GraphAware®
doc2vec vectors of dimension 300, NASA sentences -> dimensionality reduction (PCA + t-SNE)
Document Embeddings
GraphAware®
doc2vec vectors of dimension 2000, 30k Wikipedia pages -> dimensionality reduction (PCA + t-SNE)
Some of the neural networks applicable to text processing
‣ Shallow networks (word & document embeddings)
‣ Deep Auto-Encoders
‣ Convolutional Neural Networks
‣ Recurrent Neural Networks (LSTMs)
Deep Learning for Text Processing
GraphAware®
Self-supervised Auto-Encoders: useful for vector embeddings (images, texts)
DeepLearning4J - Java-based deep learning library
Example of auto-encoder (e.g. stacked RBMs) …
Deep Learning: Auto-encoders
GraphAware®
Works well for images, but problematic for texts (sparsity).
Convolutional Neural Networks
GraphAware®
Y. Zhang, B. Wallace: arXiv:1510.03820
Classification of documents based on word embeddings and CNN
Deep Learning: Summarisation
GraphAware®
S. Narayan et al.: Ranking Sentences for Extractive Summarisation with Reinforcement learning, arXiv:1802.08636
Deep Learning: Summarisation
GraphAware®
Extractive summarisation (sentence ranking) notably outperforms abstractive.
S. Narayan et al.: Ranking Sentences for Extractive Summarisation with Reinforcement learning, arXiv:1802.08636
Knowledge Graphs are a powerful problem-solving tool
‣ Augmented search
‣ Actionable knowledge
‣ Machine Learning
‣ Chatbots and Question answering systems
‣ Foundational to AI
Conclusion
GraphAware®
www.graphaware.com @graph_aware

Weitere ähnliche Inhalte

Was ist angesagt?

Unparalleled Graph Database Scalability Delivered by Neo4j 4.0
Unparalleled Graph Database Scalability Delivered by Neo4j 4.0Unparalleled Graph Database Scalability Delivered by Neo4j 4.0
Unparalleled Graph Database Scalability Delivered by Neo4j 4.0GraphAware
 
Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Doug Needham
 
Graph Database Prototyping made easy with Graphgen
Graph Database Prototyping made easy with GraphgenGraph Database Prototyping made easy with Graphgen
Graph Database Prototyping made easy with GraphgenGraphAware
 
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur DaveGraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur DaveSpark Summit
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphXAndy Petrella
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXKrishna Sankar
 
GraphAware Framework Intro
GraphAware Framework IntroGraphAware Framework Intro
GraphAware Framework IntroMichal Bachman
 
Graph Analytics: Graph Algorithms Inside Neo4j
Graph Analytics: Graph Algorithms Inside Neo4jGraph Analytics: Graph Algorithms Inside Neo4j
Graph Analytics: Graph Algorithms Inside Neo4jNeo4j
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseMo Patel
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big dataSigmoid
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)Ankur Dave
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and GiraphDoug Needham
 
GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLSpark Summit
 
Spark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with SparkSpark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with SparkDatabricks
 
Interpreting Relational Schema to Graphs
Interpreting Relational Schema to GraphsInterpreting Relational Schema to Graphs
Interpreting Relational Schema to GraphsNeo4j
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkKenny Bastani
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™Databricks
 
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold XinUnifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold XinDatabricks
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaData Science Thailand
 

Was ist angesagt? (20)

Unparalleled Graph Database Scalability Delivered by Neo4j 4.0
Unparalleled Graph Database Scalability Delivered by Neo4j 4.0Unparalleled Graph Database Scalability Delivered by Neo4j 4.0
Unparalleled Graph Database Scalability Delivered by Neo4j 4.0
 
Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
 
Graph Database Prototyping made easy with Graphgen
Graph Database Prototyping made easy with GraphgenGraph Database Prototyping made easy with Graphgen
Graph Database Prototyping made easy with Graphgen
 
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur DaveGraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphX
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX
 
GraphAware Framework Intro
GraphAware Framework IntroGraphAware Framework Intro
GraphAware Framework Intro
 
Graph Analytics: Graph Algorithms Inside Neo4j
Graph Analytics: Graph Algorithms Inside Neo4jGraph Analytics: Graph Algorithms Inside Neo4j
Graph Analytics: Graph Algorithms Inside Neo4j
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and Giraph
 
GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQL
 
Spark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with SparkSpark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with Spark
 
Interpreting Relational Schema to Graphs
Interpreting Relational Schema to GraphsInterpreting Relational Schema to Graphs
Interpreting Relational Schema to Graphs
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™
 
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold XinUnifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
 

Ähnlich wie GraphAware Signals from Space: How NASA Benefits from Graph-Powered NLP

The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...University of California, San Diego
 
Maria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data streamMaria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data streamPyData
 
Towards efficient processing of RDF data streams
Towards efficient processing of RDF data streamsTowards efficient processing of RDF data streams
Towards efficient processing of RDF data streamsAlejandro Llaves
 
Towards efficient processing of RDF data streams
Towards efficient processing of RDF data streamsTowards efficient processing of RDF data streams
Towards efficient processing of RDF data streamsAlejandro Llaves
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowKaxil Naik
 
Project Presentation
Project PresentationProject Presentation
Project Presentationbutest
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"Pinar Alper
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Herman Wu
 
Runtime Behavior of JavaScript Programs
Runtime Behavior of JavaScript ProgramsRuntime Behavior of JavaScript Programs
Runtime Behavior of JavaScript ProgramsIRJET Journal
 
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013anpawlik
 
Scientific
Scientific Scientific
Scientific marpierc
 
Connecting Stream Reasoners on the Web
Connecting Stream Reasoners on the WebConnecting Stream Reasoners on the Web
Connecting Stream Reasoners on the WebJean-Paul Calbimonte
 
RDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactRDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactJean-Paul Calbimonte
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLAnubhav Jain
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsJean-Paul Calbimonte
 
20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogsandrea huang
 
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and CascadingBoulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and CascadingPaco Nathan
 

Ähnlich wie GraphAware Signals from Space: How NASA Benefits from Graph-Powered NLP (20)

The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
 
Maria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data streamMaria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data stream
 
Towards efficient processing of RDF data streams
Towards efficient processing of RDF data streamsTowards efficient processing of RDF data streams
Towards efficient processing of RDF data streams
 
Towards efficient processing of RDF data streams
Towards efficient processing of RDF data streamsTowards efficient processing of RDF data streams
Towards efficient processing of RDF data streams
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
 
3rd presentation
3rd presentation3rd presentation
3rd presentation
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
Runtime Behavior of JavaScript Programs
Runtime Behavior of JavaScript ProgramsRuntime Behavior of JavaScript Programs
Runtime Behavior of JavaScript Programs
 
Bio2RDF@BH2010
Bio2RDF@BH2010Bio2RDF@BH2010
Bio2RDF@BH2010
 
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
 
Scientific
Scientific Scientific
Scientific
 
Connecting Stream Reasoners on the Web
Connecting Stream Reasoners on the WebConnecting Stream Reasoners on the Web
Connecting Stream Reasoners on the Web
 
RDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactRDF Stream Processing: Let's React
RDF Stream Processing: Let's React
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNL
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of Semantics
 
20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs
 
Text mining and Visualizations
Text mining  and VisualizationsText mining  and Visualizations
Text mining and Visualizations
 
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and CascadingBoulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
 

Mehr von GraphAware

Challenges in knowledge graph visualization
Challenges in knowledge graph visualizationChallenges in knowledge graph visualization
Challenges in knowledge graph visualizationGraphAware
 
Social media monitoring with ML-powered Knowledge Graph
Social media monitoring with ML-powered Knowledge GraphSocial media monitoring with ML-powered Knowledge Graph
Social media monitoring with ML-powered Knowledge GraphGraphAware
 
To be or not to be.
To be or not to be. To be or not to be.
To be or not to be. GraphAware
 
It Depends (and why it's the most frequent answer to modelling questions)
It Depends (and why it's the most frequent answer to modelling questions)It Depends (and why it's the most frequent answer to modelling questions)
It Depends (and why it's the most frequent answer to modelling questions)GraphAware
 
How Boston Scientific Improves Manufacturing Quality Using Graph Analytics
How Boston Scientific Improves Manufacturing Quality Using Graph AnalyticsHow Boston Scientific Improves Manufacturing Quality Using Graph Analytics
How Boston Scientific Improves Manufacturing Quality Using Graph AnalyticsGraphAware
 
When privacy matters! Chatbots in data-sensitive businesses
When privacy matters! Chatbots in data-sensitive businessesWhen privacy matters! Chatbots in data-sensitive businesses
When privacy matters! Chatbots in data-sensitive businessesGraphAware
 
Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine LearningGraphAware
 
(Big) Data Science
 (Big) Data Science (Big) Data Science
(Big) Data ScienceGraphAware
 
Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)GraphAware
 
Intro to Neo4j (CZ)
Intro to Neo4j (CZ)Intro to Neo4j (CZ)
Intro to Neo4j (CZ)GraphAware
 
Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)GraphAware
 
GraphAware Framework Intro
GraphAware Framework IntroGraphAware Framework Intro
GraphAware Framework IntroGraphAware
 
Advanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware FrameworkAdvanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware FrameworkGraphAware
 
Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)GraphAware
 
Knowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe Willemsen
Knowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe WillemsenKnowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe Willemsen
Knowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe WillemsenGraphAware
 
The power of polyglot searching
The power of polyglot searchingThe power of polyglot searching
The power of polyglot searchingGraphAware
 
Neo4j-Databridge
Neo4j-DatabridgeNeo4j-Databridge
Neo4j-DatabridgeGraphAware
 
Spring Data Neo4j: Graph Power Your Enterprise Apps
Spring Data Neo4j: Graph Power Your Enterprise AppsSpring Data Neo4j: Graph Power Your Enterprise Apps
Spring Data Neo4j: Graph Power Your Enterprise AppsGraphAware
 
Voice-driven Knowledge Graph Journey with Neo4j and Amazon Alexa
Voice-driven Knowledge Graph Journey with Neo4j and Amazon AlexaVoice-driven Knowledge Graph Journey with Neo4j and Amazon Alexa
Voice-driven Knowledge Graph Journey with Neo4j and Amazon AlexaGraphAware
 
Relevant Search Leveraging Knowledge Graphs with Neo4j
Relevant Search Leveraging Knowledge Graphs with Neo4jRelevant Search Leveraging Knowledge Graphs with Neo4j
Relevant Search Leveraging Knowledge Graphs with Neo4jGraphAware
 

Mehr von GraphAware (20)

Challenges in knowledge graph visualization
Challenges in knowledge graph visualizationChallenges in knowledge graph visualization
Challenges in knowledge graph visualization
 
Social media monitoring with ML-powered Knowledge Graph
Social media monitoring with ML-powered Knowledge GraphSocial media monitoring with ML-powered Knowledge Graph
Social media monitoring with ML-powered Knowledge Graph
 
To be or not to be.
To be or not to be. To be or not to be.
To be or not to be.
 
It Depends (and why it's the most frequent answer to modelling questions)
It Depends (and why it's the most frequent answer to modelling questions)It Depends (and why it's the most frequent answer to modelling questions)
It Depends (and why it's the most frequent answer to modelling questions)
 
How Boston Scientific Improves Manufacturing Quality Using Graph Analytics
How Boston Scientific Improves Manufacturing Quality Using Graph AnalyticsHow Boston Scientific Improves Manufacturing Quality Using Graph Analytics
How Boston Scientific Improves Manufacturing Quality Using Graph Analytics
 
When privacy matters! Chatbots in data-sensitive businesses
When privacy matters! Chatbots in data-sensitive businessesWhen privacy matters! Chatbots in data-sensitive businesses
When privacy matters! Chatbots in data-sensitive businesses
 
Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine Learning
 
(Big) Data Science
 (Big) Data Science (Big) Data Science
(Big) Data Science
 
Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)
 
Intro to Neo4j (CZ)
Intro to Neo4j (CZ)Intro to Neo4j (CZ)
Intro to Neo4j (CZ)
 
Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)
 
GraphAware Framework Intro
GraphAware Framework IntroGraphAware Framework Intro
GraphAware Framework Intro
 
Advanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware FrameworkAdvanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware Framework
 
Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)
 
Knowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe Willemsen
Knowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe WillemsenKnowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe Willemsen
Knowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe Willemsen
 
The power of polyglot searching
The power of polyglot searchingThe power of polyglot searching
The power of polyglot searching
 
Neo4j-Databridge
Neo4j-DatabridgeNeo4j-Databridge
Neo4j-Databridge
 
Spring Data Neo4j: Graph Power Your Enterprise Apps
Spring Data Neo4j: Graph Power Your Enterprise AppsSpring Data Neo4j: Graph Power Your Enterprise Apps
Spring Data Neo4j: Graph Power Your Enterprise Apps
 
Voice-driven Knowledge Graph Journey with Neo4j and Amazon Alexa
Voice-driven Knowledge Graph Journey with Neo4j and Amazon AlexaVoice-driven Knowledge Graph Journey with Neo4j and Amazon Alexa
Voice-driven Knowledge Graph Journey with Neo4j and Amazon Alexa
 
Relevant Search Leveraging Knowledge Graphs with Neo4j
Relevant Search Leveraging Knowledge Graphs with Neo4jRelevant Search Leveraging Knowledge Graphs with Neo4j
Relevant Search Leveraging Knowledge Graphs with Neo4j
 

Kürzlich hochgeladen

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sectoritnewsafrica
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 

Kürzlich hochgeladen (20)

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 

GraphAware Signals from Space: How NASA Benefits from Graph-Powered NLP

  • 1. GraphAware® SIGNALS FROM OUTER SPACE Vlasta Kůs, Data Scientist @ GraphAware graphaware.com @graph_aware, @VlastaKus How NASA Benefits from Graph-Powered NLP
  • 2. ‣ Database of learned knowledge across NASA’s programs & projects ‣ Unstructured text with basic metadata ‣ Collected since late 1950s (100s of millions of documents) ‣ Public dataset: ~1600 documents NASA’s Lessons Learned GraphAware®
  • 3. "1406",420,"Roberts, J “, "VO'75 Pressure Regulator Leakage and Work-Around Procedures (~1976)”, "The pressure regulator in the Viking Orbiter Propulsion Subsystem started leaking following a pyro firing that occurred prior to the near-Mars TCM. Likely causes were corrosion or residue from propellant migration or pyro valve blowby, or particulate contamination. Recommendations included using separate regulators for the fuel and oxidizer sides, incorporating a bellows in the pyro valve to eliminate blowby, and adding a isolation valve between the regulator and propellant tank.“, " The micro-scale effects of long-term propellant exposure should be investigated in order to better critique regulator design. “, "JPL",1996-07-08,"",TRUE,"",1460,7,NA,"https:// nen.nasa.gov/web/11/viewall/-/viewall/420" NASA’s Lessons Learned Database GraphAware®
  • 4. “673",1326,"Relvini, Kristine “, "Lessons Learned Not Being Inputted Into Lessons Learned Information System (LLIS) Database”, “", "If you don't document the lessons learned, you loose knowledgeable, shared information and tracking capacity across programs.“, "KSC",2002-10-11,"Aeronautics Research, Science, Exploration Systems, Space Operations, ",FALSE,"", 702,6,NA,"https://nen.nasa.gov/web/11/viewall/-/ viewall/1326" NASA’s Lessons Learned GraphAware®
  • 5. Graph database = isolated data silos -> connected knowledge ‣ Efficient search ‣ Relationships among various areas Apollo, Space Shuttle, Orion, … ‣ Pattern recognition (clusters, communities, correlations, …) Example: correlation between corrosion of valves & topics involving batteries ‣ Useful for planning future projects and preventing/solving issues NASA’s Lessons Learned GraphAware®
  • 6. What is a Graph? GraphAware® G = (V, E)
  • 7. WHY NEO4J? GraphAware® It is a proper graph database It is a proper database
  • 10. ‣ NLP = machine learning tools allowing computers to process - and perhaps understand - human languages ‣ Basic steps Sentence segmentation Tokenisation Lemmatisation Part of Speech (POS) tagging Parsing Named Entities Recognition (NER) Sentiment analysis … Natural Language Processing GraphAware®
  • 11. Currently supported toolkits for human language processing ‣ Stanford CoreNLP ‣ developed at Stanford University ‣ fast, robust, production ready ‣ many pre-built models ‣ license: GPL v3+ ‣ Apache OpenNLP ‣ developed by volunteers ‣ many pre-built models ‣ license: Apache License v2.0 NLP: Text Processors GraphAware®
  • 12. ‣ Named Entity Recognition (NER) = classification of words into predefined classes ‣ Examples: Dr. Who -> Person, May 2018 -> Date, EU -> Country … ‣ Stanford NLP default entities: Person, Location, Date, Organisation, Number, Money, Percentage ‣ Custom NE classes -> training on large tokenised & labeled corpus ‣ Wikipedia, Wikidata - rich sources of multilingual training data that can be extracted automatically Named Entity Recognition GraphAware®
  • 13. Custom Named Entities based on Wikipedia GraphAware® NASA use case: identify names of space missions Training - crawling Wikipedia & identifying relevant information
  • 15. Universal Dependencies: cross-linguistically consistent grammatical relations among words in a sentence Examples: ‣ amod (adjectival modifier) Matt likes red wine. ‣ appos (appositional modifier) Mars Global Surveyor (MGS) was an American robotic spacecraft … ‣ conj (conjunct) It failed to respond to messages and commands. ‣ … Universal Dependencies GraphAware®
  • 16. ‣ Stanford CoreNLP: Dependency & Part of Speech analysis of a single sentence Source: http://nlp.stanford.edu:8080/corenlp/process Either find an efficient representation in some traditional database, or … Graph-Powered NLP GraphAware®
  • 17. Graph-Powered NLP GraphAware® NLP and property graphs: natural fit … use a property graph!
  • 19. Unsupervised techniques tend to be underestimated, while … ‣ No need for time & money to get massive labeled training datasets ‣ Often faster to train & faster to predict ‣ Unsupervised deep learning Unsupervised ML Algorithms GraphAware®
  • 20. PageRank GraphAware® PageRank = a measure of importance of a web page based on the quality of links from other pages The formula reflects a model of a random surfer. Source: https://en.wikipedia.org/wiki/PageRank
  • 21. Keyword Extraction: TextRank GraphAware® Keywords = words/phrases that capture the semantic essence of a text Graph-Based Unsupervised Algorithm: ‣ Construct a graph of word co-occurrences ‣ Asses the importance of words by PageRank algorithm ‣ Use top 1/3 of words as keyword candidates ‣ Use universal dependencies to construct key phrases
  • 22. GraphAware® Rada Mihalcea, Paul Tarau. TextRank: Bringing Order into Texts. Proceedings of EMNLP 2004, pages 404–411, Barcelona, Spain. Association for Computational Linguistics. http://www.aclweb.org/anthology/W04-3252. Keyword Extraction: TextRank Despite its simplicity, TextRank provides state of the art results on wide range of unstructured texts. Leveraging universal dependencies allowed us to surpass precision & recall of the original TextRank paper. NASA examples: “space shuttle”, “flight hardware”, “launch vehicle”, …
  • 23. Automatic text summarisation ‣ Abstractive ‣ Extractive TextRank can be adapted for efficient sentence ranking for extractive summarisation. Summarisation: TextRank GraphAware®
  • 25. ConceptNet 5 = semantic network for understanding the meaning of words ‣ Relational knowledge from MIT’s Open Mind Common Sense project ‣ DBPedia (information from Wikipedia info-boxes) ‣ Wiktionary (free multilingual dictionary) ‣ … Knowledge Enrichment: ConceptNet 5 GraphAware® Microsoft Concept Graph = semantic network introducing knowledge about concepts ‣ harnessed from billions of web pages and years’ worth of search logs
  • 26. Expand the knowledge from external or other internal sources. Knowlege Enrichment GraphAware®
  • 27. ‣ Latent Dirichlet Allocation (LDA) - generative statistical model that describes documents as a probabilistic mixture of a small number of topics ‣ Each topic described by a list of most relevant words ‣ Sample of topics from the NASA dataset [“design”, "failure", "test", "result", "flight", "hardware", "mission", “testing”, “system”, “due”] [“pressure", "system", "cause", "valve", "propellant", "leak", "operation", “shuttle”, “space”, “gas”] [“space”, "shuttle", "NASA", "operation", "safety", "iss", "crew", "ISS", "astronaut", "progr am"] Topic Extraction: Latent Dirichlet Allocation GraphAware®
  • 29. ‣ Word embeddings = representation of words as multi-dimensional semantic vectors which encode linguistic patterns ‣ Use cases: word sense disambiguation, new distance functions between documents, starting point for further ML (e.g. NN classification) ‣ Word2vec = shallow two-layer neural network model for producing word embeddings ‣ ConceptNet Numberbatch - consists of state-of-the-art word embeddings Word Embeddings GraphAware®
  • 30. Word Embeddings: word2vec GraphAware® Tomas Mikolov et al.: https://arxiv.org/abs/1301.3781
  • 31. Word Embeddings: word2vec GraphAware® Kusner et al.: http://mkusner.github.io/publications/WMD.pdf Document distance: min. cumulative distance that all words need to travel Semantic patterns representable as linear translations: distance(Oslo -> Norway) similar to distance(Berlin -> Germany) vec(Germany) - vec(Berlin) + vec(Oslo) = vec(Norway)
  • 32. Document Embeddings GraphAware® Q. Le, T. Mikolov: Distributed representations of sentences and documents, arXiv:1405.4053v2 Paragraph Vector (doc2vec): extension of word2vec The additional paragraph node represents context (topic) of the current document. Paragraph vectors have the same behaviour towards linear vector translations as word vectors.
  • 33. Document Embeddings GraphAware® doc2vec vectors of dimension 300, NASA sentences -> dimensionality reduction (PCA + t-SNE)
  • 34. Document Embeddings GraphAware® doc2vec vectors of dimension 2000, 30k Wikipedia pages -> dimensionality reduction (PCA + t-SNE)
  • 35. Some of the neural networks applicable to text processing ‣ Shallow networks (word & document embeddings) ‣ Deep Auto-Encoders ‣ Convolutional Neural Networks ‣ Recurrent Neural Networks (LSTMs) Deep Learning for Text Processing GraphAware®
  • 36. Self-supervised Auto-Encoders: useful for vector embeddings (images, texts) DeepLearning4J - Java-based deep learning library Example of auto-encoder (e.g. stacked RBMs) … Deep Learning: Auto-encoders GraphAware® Works well for images, but problematic for texts (sparsity).
  • 37. Convolutional Neural Networks GraphAware® Y. Zhang, B. Wallace: arXiv:1510.03820 Classification of documents based on word embeddings and CNN
  • 38. Deep Learning: Summarisation GraphAware® S. Narayan et al.: Ranking Sentences for Extractive Summarisation with Reinforcement learning, arXiv:1802.08636
  • 39. Deep Learning: Summarisation GraphAware® Extractive summarisation (sentence ranking) notably outperforms abstractive. S. Narayan et al.: Ranking Sentences for Extractive Summarisation with Reinforcement learning, arXiv:1802.08636
  • 40. Knowledge Graphs are a powerful problem-solving tool ‣ Augmented search ‣ Actionable knowledge ‣ Machine Learning ‣ Chatbots and Question answering systems ‣ Foundational to AI Conclusion GraphAware®