This document discusses a customer project for developing a trend analytics platform. It begins with an introduction and overview of the project goals. Then it covers data analytics techniques like word embeddings, topic modeling, and clustering. It demonstrates the trend analytics platform. Next it discusses semantic web technologies and knowledge graphs. It demonstrates entity extraction and semantic data enrichment using DBpedia Spotlight and Wikidata. Finally, it discusses next steps of understanding data through knowledge graphs and developing domain-specific ontologies.
4. Introduction
Customer Project "Trend-Analytics"4 14.09.2018
Research project for the automatic search,
analysis, and evaluation of sector specific
trend indicators in online publications
Cooperation with the Fraunhofer Institute
and the Technische Hochschule Nuremberg
Develop a Trend-Analytics platform that enables users to
â follow developments in the field of new technologies in a targeted manner
â react adequately to trends and market changes
martechtoday.com, New B2B analytics platform called Proof
5. Challenges
Customer Project "Trend-Analytics"5 14.09.2018
Research on modern machine-
learning algorithms & Semantic Web
Technologies
Identify trends and innovation
opportunities in different domains
Start by using a relatively small
number of high quality data, e.g.
RSS feeds
Permanently optimizing quantity and
quality of analytics
https://irishadvantage.com/news/irish-companies-making-iot-opportunity/
6. Overview Trend-Analytics Platform
Customer Project "Trend-Analytics"6 14.09.2018
Data Collection
â Implement crawling mechanism
â Implement database (e.g. in Azure)
â Create administration interface (web)
â Identify (manually) high quality data (RSS-Feeds & Articles)
Data Analysis
â Evaluate existing machine learning techniques
â E.g. Unsupervised Learning, Supervised, âŠ
â Investigate Semantic Web Technologies
Data Presentation & Interpretation
â Create a Web Interface and illustrate the results
â Interpret results
8. Word Embeddings
Customer Project "Trend-Analytics"8 14.09.2018
Word embedding is a technique that treats words as vectors
whose relative similarities correlate with semantic similarity.
Measuring similarity between vectors is possible using
measures such as cosine similarity.
So, when we subtract the vector of the word man from
the vector of the word woman, then its cosine distance
would be close to the distance between the word queen
minus the word king
https://www.oreilly.com/learning/capturing-semantic-meanings-using-deep-learning
9. Word2vec
Customer Project "Trend-Analytics"9 14.09.2018
Word2vec
Continuous Bag of Words (CBOW)
Model predicts the current word from
a window of surrounding context
words.
Same approach as recommender
systems with collaborative filtering.
(Customer => Item)
Instead of computing and storing large amounts of data, we create a neural network
model that will be able to learn the relationship between the words and do it efficiently.
https://www.oreilly.com/learning/capturing-semantic-meanings-using-deep-learning
10. Topic model
Customer Project "Trend-Analytics"10 14.09.2018
statistical model for discovering the abstract "topics" that occur in a collection of
documents. It captures intuition about frequency of words in a mathematical framework.
Li et al. Energies 2017, 10(11), 1913
11. Cluster of Term Frequencies
Customer Project "Trend-Analytics"11 14.09.2018
Latent Dirichlet Allocation
TF*IDF (inverse document freq.)
Reduced to a vector space of only
1000 dimensions
https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation
15. Semantic Web â a vision in 2001
Customer Project "Trend-Analytics"15 14.09.2018
Semantic Web, a visionary concept proposed in 2001 by
Sir Tim Berners-Lee, the inventor of the World Wide Web
â âI have a dream for the Web [in which computers] become
capable of analyzing all the data on the Web â the content, links,
and transactions between people and computers. ..â
Sir Tim Berners-Lee, fossbytes.com, 2016
Amazon illustrates Knowledge
Graph in Neptune
16. Semantic Web now
Customer Project "Trend-Analytics"16 14.09.2018
Linking Open Data cloud
diagram showing 1.163 highly
interconnected datasets
(http://lod-cloud.net)
17. Semantic Data Enrichment for Trend-Analytics
Customer Project "Trend-Analytics"17 14.09.2018
headlines,
abstracts,
articles
Storage Entity detection
Spotting Candidate
Selection
Disambi-
guation
Filtering
Sets of
matched
resources
Semantic
Querying
Federated
Querying
Filtering
Storage
RDBMS RDBMS
Specific metadata
or neighborhood
Knowledge
Graph
18. Semantic Data Enrichment for Trend-Analytics
Customer Project "Trend-Analytics"18 14.09.2018
headlines,
abstracts,
articles
Storage Entity detection
Spotting Candidate
Selection
Disambi-
guation
Filtering
Sets of
matched
resources
Semantic
Querying
Federated
Querying
Filtering
Storage
RDBMS RDBMS
Specific metadata
or neighborhood
Knowledge
Graph
20. Next Steps
Customer Project "Trend-Analytics"20 14.09.2018
Understanding data using Knowledge Graphs
â Pull identified entities through to the user interface
and provide additional meta information on displayed
terms (e.g. as popups, hover-links, âŠ)
â Objective: Improve the user's understanding of the
results, e.g. what is meant by a specific term?
Complex Ontology-based Data Enrichment
â Develop a domain-specific ontology which integerates with Knowledge Graphs
â Objective: Restrict search space in Knowledge Graphs & hide not relevant entities
â Requires extensive use of Semantic Web technologies and semantic data
modeling (TripleStores, RDF, RDF-S, OWL, âŠ)
https://www.w3.org/TR/rdf11-primer/
22. Demo: Semantic Data Enrichment
Customer Project "Trend-Analytics"22 14.09.2018
DBpedia Spotlight
https://www.dbpedia-spotlight.org/demo/
Nokia and Zain Saudi Arabia have taken a significant step towards the creation of an
IoT ecosystem in the Kingdom of Saudi Arabia with the successful trial of NB-IoT
technology at a live site in Mina area of Makkah Province.
Nokia and Zain Saudi Arabia have taken a significant step towards the creation of
an IoT ecosystem in the Kingdom of Saudi Arabia with the successful trial of NB-
IoT technology at a live site in Mina area of Makkah Province.
24. Questions & AnswersâŠ
Dr. Olaf Nimz
Principal Consultant
Olaf.Nimz@trivadis.com
14.09.2018 Customer Project "Analytical Data Lake"24
Dr. Martin Zablocki
Consultant
Martin.Zablocki@trivadis.com