5. We started to be Polyglot
Big data architecture is not a vision
We hired Data Scientists
We started to index things (Lucene)
We started to use Solr, ElasticSearch, etc
It became the part of our Big Data architecture
We introduced Search Infrastructure
Evolution in corporate search
GraphAware®
7. They are aggregate oriented databases, they have limitations
when it comes to connected data
Typical setup: Two users searching for the same thing will get the
same results
They are in the search 3.0-4.0 phase
They are superstars of Full text search
We need to extend this with Graph-aided search
We have to boost some Search Hit (c`mon It is a
recommender system)
We have to filter out or degrade the score
We need Things, not Strings!!444!!!négy!!!
Challenges
GraphAware®
9. “A knowledge graph is a multi-relational graph
composed of entities as nodes and relationships as
edges with different types that describe facts in the
world."
Knowledge graph
GraphAware®
It is about “understanding the world as you and I do”.
10.
11.
12. Search infrastructure should be easily integrated
into existing architecture
New data sources should be easily added
Should support the strategic goals
e.g. Search driven e-commerce
Scalable
Should provide personalised results
Simple interface
Requirements of searching and KG
GraphAware®
13. Take a graph database (Neo4j, Cayley, OntoText GraphDB, etc.)
Graph construction:
Knowledge extraction
from the internet
open data
grabbing
from text (NLP)
from current databases (Master Data)
from logs
Knowledge Graph Construction
Have a good graph model
Connect the things together
Steps to build KG
GraphAware®
14.
15.
16. Apache Kafka for streaming pipelines
Product topic
Search topic
Feedback topic
Spark on the processing side
Neo4j on the consuming side
CQRS (Command Query Responsibility Segregation) pattern
Push to ElasticSearch with GraphAware plugin
Neo4j Transaction Handler (afterCommit)
You can define mappings to ES
Parts of the architecture
GraphAware®
17. Success story 1.
• Sharing Tribal Knowledge inside the company
• >20 offices
• >3000 employees
• Data sources:
• Tableau dashboards (4000)
• Knowledge posts (>1000)
• Superset charts and dashboards (>6000)
• Experiments and metrics (>5000)
GraphAware®https://www.slideshare.net/ChristopherWilliams24/20170108scaling-tribalknowledge
18. Success story 2.
•Half-century of collective NASA engineering knowledge
•It is called Lessons Learned database
•They use it in Mars mission project
GraphAware®
Impact: “Neo4j saved well over two years of work and one
million dollars of taxpayers funds.”
“When we had the [Apollo 1] fire, we took a step back and said okay,
what lessons have we learned from this horrible tragedy?
Now let’s be doubly sure that we are going to do it right the next time.
And I think that fact right there is what allowed us to
get Apollo done in the ‘60s.”
—Dr. Christopher C. Kraft, Jr., Director of Flight Operations