Sebastian talks about how they use neo4j to protect data in business critical services running in production. The talk covers both high-level architecture, and detailed technical considerations.
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor - Sebastian Verheughe @ GraphConnect London 2013
1. Using Graph Databases in Real-Time to
Solve Resource Authorization at Telenor
Graph Connect London – 19 Nov 2013
by Sebastian Verheughe
2. Telenor Norway
Subsidiary of the Telenor Group
~1 billion GBP in mobile revenues 2012
Sebastian Verheughe
Lead Developer for Neo4j solution
Coding Architect
3. Disclaimer
The presentation is not identical to the implementation
due to security reasons but shows how we have
modeled and solved the problem in general.
However, all presented data (numbers & charts) are
real, unfiltered and extracted from the production logs
5. Telenor Norway Middleware Services
Channel
Channel
used by 42 channels
calls 35 sub-systems
10,000 code classes
500 requests/second
20,000 orders/day
Backend
Backend
Channel
Channel
MOBILE
MW
Channel
Channel
Providing business logic
and data for all channels
in the mobile value chain
BUSINESS
LOGIC
& DATA
Backend
Handles users with
access to X00,000
resources
Backend
Backend
Backend
6. Our Problem
20 minutes to calculate all accessible resources
1500 lines of SQL to implement the authorization logic
“solved” by caching data going stale
and the solution did not scale…
7. Why a Graph Database?
Access
Parent Company
User: R. Branson
Which resources
does the user
have access to?
Part of Company
Sales
Finance Production
HR
Sub
The questions we wanted answered
required traversal of tree structures.
Tablet
Subscription
Owner
Sub
Tablet
Uses
Subscription
Phone
8. Tailored Read Model
The Model makes read queries
as simple and efficient as possible.
First find your questions
then model your graph
graph model
=
relational model
10. Conditional Rules
ACCESS is given with the following include parameters:
access to subsidiaries and access to content
Only find children of PARENT COMPANY
given access to subsidiaries is allowed
User
Only look at PART OF COMPANY
given access to content is allowed
Only look at SUBSCRIPTION OWNER
given access to content is allowed
12. Graph Algorithm
Prerequisite: The user node
1. Follow all ACCESS relationships and
read the access parameters on the relationship
2. Follow all PARENT COMPANY relationships given
access to subsidiaries is allowed
3. Follow all PART OF COMPANY relationships given
access to content is allowed
4. Follow all SUBSCRIPTION OWNER relationships given
access to content is allowed
13. Solution Value
1. Performance optimized from minutes to seconds.
2. Simplicity of writing and understanding business rules
for the query traversal.
3. Scalability by performance allowing us to onboard
more corporate customers (project business case)
Autonomous Service
with it’s own life-cycle and data repository.
14. Authorization Complexity
• Not a collection of isolated customer trees *
• Not all users of a customer have equal access
• Not a fixed schema, form or size for all
customers
• Real-time updated with customer & product
data
The data form a highly connected living graph
* Covered later in Technical Details
15. How we Started with Neo4j
1. Searched the internet for articles about graph
database and different solutions.
2. Downloaded and quickly prototyped the solution we
liked that matched our requirements (Neo4j).
3. Workshop with Neo4j and our project developers to
quickly gain competence and ensure design QA.
4. Solution QA with Neo4j before production and help
with performance issues / tuning.
16. Lessons Learned
• Choose a solution/technology that fits your problem
• New way of thinking – build competence in org.
• Profile your java code to make it really fast
• Don’t put everything into the graph (functional creep)
• Need to know how traversal works (e.g. shortest
path)
• Benchmark the graph to evaluate your traversal
speed
17. Alternative In-Memory RDBMS
Option 1: Use existing database
- Performance issues due to shared data / suboptimal
structure
- Complexity since SQL not designed for traversal
Option 2: Separate database
+ Might reach same performance as graph db
+ Familiar technology
- Complexity since SQL not designed for traversal
Decided to go with our instinct
Graph Database
18. Different Graph Structures
get all accessible subscriptions
2 000
1 000
1 700 ms
Company X: 147 000
750 ms
Company Y: 52 000
1300 ms
Company Z: 95 000
Data from test – repeated prod sampling gave ~2.4 sec for 215,000 subscriptions
19. Different Graph Structures
check access to single subscription
2 000
1 000
1 ms
Company X: 147 000
1 ms
Company Y: 52 000
1 ms
Company Z: 95 000
20. Production Performance
retrieve all accessible resources
RDBMS Disk
RDBMS Mem Cached
Graph In-Heap
Company X
12 min
18 sec
< 2 sec
Company Y
22 min
58 sec
< 2 sec
Company Z
3 min
15 sec
< 2 sec
Check single resource access
1 ms
No operational problems in production
25. Implementing the Algorithm
Lets look at the Neo4j Traversal Framework
Iterable<Node> getAccessibleResources(…) {
Evaluator myEvaluator = …
Expander myExpander = …
return Traversal.description()
.evaluator(myEvaluator)
.expander(myExpander)
.traverse(startNode).nodes();
}
26. Implementing the Algorithm
Evaluator is a simple filter, e.g. for Node
type
class MyEvaluator implements Evaluator {
public Evaluation evaluate(Path path) {
if <I am interested in this node>
return Evaluation.INCLUDE_AND_CONTINUE;
else
return Evaluation.EXCLUDE_AND_CONTINUE;
}
}
27. Implementing the Algorithm
The custom Expander contains business
rules!
class ResAuthExpander implements PathExpander<PathExpander> {
…
public … expand(Path path, BranchState<…> state) {
if (path.lastRelationship rel == ACCESS)
accToSub = rel.getProperty(ACCESS_TO_SUBSIDIARIES);
accToCont = rel.getProperty(ACCESS_TO_CONTENT);
state.set( getExpander(accToSub, accToCont) );
}
return state.get().expand(…)
}
Single expander class to control business
28. Implementing the Algorithm
Generates the valid relationships to traverse.
public getExpander(boolean accToSub, boolean accToCont) {
PathExpander exp = StandardExpander.DEFAULT.add(ACCESS,…);
if (accToSub)
exp.add(PARENT_COMPANY,…)
if (accToCont)
exp.add(PART_OF_COMPANY,…).add(SUBSCRIPTION_OWNER,…);
return exp;
}
}
29. U-Turn Strategy
4.
Access
User
Does the user
have access to
subscription X?
3.
5.
Up to find path quickly
Down to check access
6.
2.
7.
1.
8.
X
Subscription
Reversing the traversal increases performance from n/2 to
2d where n and d are tree size and depth (we went from 1s to
30. The Zigzag Problem
What if we also have reversed access to the subscription
payer?
User
Op
IT
E
d
Jo
Subscriptions
Solvable by adding state to the traversal (or check path)
31. The Many-to-Many Problem
The nodes Op & IT may be connected through many
subscriptions
Does the user
have access to
department Op?
Op
IT
Access
User
Subscription
Traversal becomes time consuming (e.g. M2M market)
However, we only needed to implement the rule for direct access to
sub.
32. Deployment View
• Two equal instances of Neo4j embedded in Tomcat
• Access through Java API due to need for custom logic
• Using Neo4j 1.8 without HA (did not like ZooKeeper)
Resource
Authorization
Neo4j
tx log
RDBMS
Message Queue
Resource
Authorization
Neo4j
33. Dual Model Cost
There are some drawbacks with dual models
also
• Not possible to simply join the ACL with resource
tables in the relational database - queries needed
redesign
• The complexity added by code and infrastructure
necessary to manage an additional model.
• Not ordinary competence (in Norway at least)
34. Unexplored Areas
Combining Access Control List & Graph
• Best of both worlds (simple logic, fast lookup)
Algorithm
–
–
–
–
Find all affected users when the graph is updated
Invalidate users access control list
Calculate all accessible resources for each user
Store result in users access control list
Could then skip the U-turn and many-to-many problem.
35. Was is worth it?
Yes!
The user experience is important in
Telenor
37. Web References
• Telenor Norway
• The Project - How NOSQL Paid off for
Telenor
• JavaWorld - Graphs for Security
Hinweis der Redaktion
Why: access to secret numbers, access to modify/delete subscriptions, possibility to send/receive messages
We use Neo4j for our business critical services, both customer/product services , but also operational services.A channel is client type, e.g. the web solution for corporate customer, or helpdesk solution, or app, and may consist of many clients
The project business case was based on a future point in time where we could not any more onboard any large corporate customers
Drawing on the white board the required logic made us understand that a graph database might be a good solution
Take the hit on write, and make read easy! (for us, read performance is the problem – not write performance)Also, don’t blindly copy tables/foreign keys into nodes and relationships – drop what’s not needed and remember that relationships may have properties in graph
RDBMS is still mastering the data as it is used in many different use-cases where that is beneficial.
TIME LEFT: 30 minutes (10 used)
The last part is important to us. It was really hard to extract the resource authorization out of the relational database, but not we can much more easy replace the current implementation with another one in the future if neccesary.
Production logs does not contain user data, so just one big organization was sampled to get production data for a specific customer
TIME LEFT: 20 minutes (20 used)Graphperformance based on test environment, see charts in the technical section for production numbers not specific for a unique customer
We only have detailed logs for a short while back – so we cannot review all data since production.First production two years ago with limited traffic, full production since spring 2013
We always continue, since we also have our custom expander. This way, we have a clean separation of concern in our code.We also have more advanced filters peaking around the node before it decides to include or exclude the node
This is the most important part of the code, the one place where we now are able to write down the business logic in a simple and natural way.Note that we only have ONE class containing the business rules independently of which use-case we are running.
The relationships and directions that are allowed to traverse given the different switch parameters.
This is possible since we have a tree graph. Demonstrates the importance of understanding how a graph works, because than you may greatly improve performance by smart traversal strategies.
TIME LEFT: 10 minutes (30 used)
Extra knowledge, such as which subscription you are