Graph-based machine learning is becoming a very important trend in Artificial Intelligence, transcending a lot of other techniques. The world's largest companies are promoting this trend. For instance Google Expander's platform combines semi-supervised machine learning with large-scale graph-based learning by building a multi-graph representation of the data with nodes corresponding to objects or concepts and edges connecting concepts that share similarities.
Using graphs as basic representation of data for machine learning purposes has several advantages: (i) the data is already modelled for further analysis, explicitly representing connections and relationships between things and concepts; (ii) graphs can easily combine multiple sources into a single graph representation and learn over them, creating Knowledge Graphs; (iii) a lot of machine learning algorithms exploit graphs for improving computation performances and results quality.
The presentation shows the advantages above presenting also some applications like recommendation engine and natural language processing that use machine learning over a graph. Concrete scenarios, models and end-to-end infrastructure will be discussed.
2. “We firmly believe that it's at the intersection of machine learning
and graph technology where the next evolution lies and where new
disruptive companies are emerging,”
Ash Damle, Founder and CEO @ Lumiata
GraphAware®
3. “There are a variety of use cases where a graph database is a better
fit than other database management systems including relational or
general NoSQL database systems.”
Matthias Broecheler, Chief Technologist @ DataStax
GraphAware®
4. “Machine learning algorithms help data scientists discover meaning
in data sets […]. Graph databases enable efficient storage and
traversal of information about relationships. Therefore, graph data
can either be the input or the output of machine learning
processing.”
Jim Webber, Chief Scientist @ Neo4J
GraphAware®
9. Data Source Issues
- Storing large amounts of labeled and unlabeled data
- Guarantee data quality
- Managing several data sources
Algorithms Issues:
- Results quality
- Computational efficiency
- Real-time (Continuous model update)
Model Issues:
- Storing the model built
- Provide fast access to the model
Machine Learning Challenges
GraphAware®
14. -Machine Learning enables computer systems to solve complex
real-world problems
-Deep Learning models demonstrate high predictive capacity
when trained on large amounts of labeled data
-Graph-based machine learning is becoming a very important
trend in Artificial Intelligence
-The world’s largest companies are promoting this trend.
-Using graphs as basic representation of data for machine
learning purposes has several advantages
ML and Graphs Facts
GraphAware®
15. Some usage patterns for graphs in machine learning applications:
-Storing data source in a suitable way
-Tensors
-Centralize multiple data sources (raw or not)
-Lambda Architecture
-Knowledge Graph
-Graph-Based algorithms
-Storing Models produced
Graphs in Machine Learning
GraphAware®
16. Storing data sources: Lambda Architecture
GraphAware®
Lambda Architecture is a scalable and fault-tolerant data processing
architecture suitable for fast data streaming
17. Storing data sources: Lambda Architecture (2)
GraphAware®
Continuous Cellular Tower Data Analysis
Eagle N., Quinn J.A., Clauset A. (2009) Methodologies for Continuous Cellular Tower Data Analysis. In: Tokuda H., Beigl M.,
Friday A., Brush A.J.B., Tobe Y. (eds) Pervasive Computing. Pervasive 2009. Lecture Notes in Computer Science, vol 5538.
Springer, Berlin, Heidelberg
18. Storing data sources: Lambda Architecture (2)
GraphAware®
Continuous Cellular Tower Data Analysis
Eagle N., Quinn J.A., Clauset A. (2009) Methodologies for Continuous Cellular Tower Data Analysis. In: Tokuda H., Beigl M.,
Friday A., Brush A.J.B., Tobe Y. (eds) Pervasive Computing. Pervasive 2009. Lecture Notes in Computer Science, vol 5538.
Springer, Berlin, Heidelberg
19. Storing data sources: Lambda Architecture (2)
GraphAware®
Continuous Cellular Tower Data Analysis
Eagle N., Quinn J.A., Clauset A. (2009) Methodologies for Continuous Cellular Tower Data Analysis. In: Tokuda H., Beigl M.,
Friday A., Brush A.J.B., Tobe Y. (eds) Pervasive Computing. Pervasive 2009. Lecture Notes in Computer Science, vol 5538.
Springer, Berlin, Heidelberg
21. Storing data sources: Tensor
GraphAware®
Simple Recommendation
f: User x Item -> Relevance Score
22. Storing data sources: Tensor
GraphAware®
Simple Recommendation
f: User x Item -> Relevance Score
23. Storing data sources: Tensor
GraphAware®
Simple Recommendation
f: User x Item -> Relevance Score
Context Aware Recommendation
f: User x Item x Context1 x Context2 x Context3 -> Relevance Score
24. Storing data sources: Tensor
GraphAware®
Simple Recommendation
f: User x Item -> Relevance Score
Context Aware Recommendation
f: User x Item x Context1 x Context2 x Context3 -> Relevance Score
25. Storing data sources: Tensor
GraphAware®
Simple Recommendation
f: User x Item -> Relevance Score
Context Aware Recommendation
f: User x Item x Context1 x Context2 x Context3 -> Relevance Score
27. Some graph-theoretical algorithms that are relevant to machine
learning processes:
-Random Walk
-Page Rank
-Graph Matching
-Shortest Path
-Depth-First Graph Traversal
-Breadth-First Graph Traversal
-Minimum Spanning Tree
-Node2vec
Graph-Based ML algorithms
GraphAware®
28. Graph-Based ML algorithms
GraphAware®
Keywords Extraction
Rada Mihalcea, Paul Tarau. 2004. TextRank: Bringing Order into Texts. Proceedings of EMNLP 2004, pages 404–411, Barcelona,
Spain. Association for Computational Linguistics. http://www.aclweb.org/anthology/W04-3252.
29. Graph-Based ML algorithms
GraphAware®
Keywords Extraction
Rada Mihalcea, Paul Tarau. 2004. TextRank: Bringing Order into Texts. Proceedings of EMNLP 2004, pages 404–411, Barcelona,
Spain. Association for Computational Linguistics. http://www.aclweb.org/anthology/W04-3252.
30. The results of machine learning process can be stored in a graph as
well. Some examples are:
-Similarity (k-Nearest Neighbors)
-Cluster
-Spanning Tree
-Decision Tree
-Random forest
-Markov Chain
Storing Models
GraphAware®
36. Application: NLP and Graphs
GraphAware®
- Natural Language Processing applications find efficient solutions
within graph-theoretical frameworks.
- This idea is not new (Freud 1901, Schvaneveldt 1989)
- Text has a lot of structure - it’s just that most of it isn’t explicit.
- Tokens, events, relationships, and references are extracted from
the text provided;
- The information related could be extended by introducing new
sources of knowledge like ontologies or further processed.
- A suitable model for representing them is in the form of a:
Graph Model
37. Application: NLP and Graphs
GraphAware®
GraphAware NLP Framework
- End-to-end framework: from low to high level set of functionalities,
services and applications
- Suitable storage schema based on graph
- Distributed processing using Apache Spark
- Integrated with other software/services