CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

HDAP:
A Breakthrough in Directory Technology
Bringing Together LDAP, Context, and Big Data

• What Is HDAP?
• Why HDAP?
• Why even LDAP?
• Evaluating the models for structured data
• Hierarchical model and LDAP
• The requirements/ drivers for more scalability
• Using Identity and Context Virtualization to build a Federated Identity Service (FID)
• Why FID is essential
• Powering a new use case: Contextual Search
• How HDAP works/ Performance.
What We’ll Cover Today

• This highly-available version of LDAP offers better performance and
increased scalability.
• Now, you may be thinking:
• LDAP is already very fast and scalable.
• And who needs LDAP anyway? Shouldn’t we do as Ian Glazer says, and
“kill IdM in order to save it”?
• But HDAP goes beyond LDAP, delivering much more and doing it all
much faster.
A Next-Gen LDAP Directory Driven by
Hadoop and Search Technology
7/15/2013 4

• Identity remains essential to IT because people are often the center
of activities.
• While there are multiple use cases, one of the key functions of
identity is to act as an integration point.
• As such, identity management is at the center of application
integration.
• We need a way to store identities and their attributes, but is LDAP
still relevant?
• Do we really need a hierarchical system, when the world is moving
toward these models?
• Path
• Graph
• Directed Graph
• Relational
To Bring New Life to the Heart of IT:
People and What They Do

Roadmap:
The Role of Identity and Context Virtualization
in the Technology Food Chain
Company Confidential

Are the Hierarchies of LDAP Still
Necessary?
• The Protocol
• The Schema
• The Storage: Hierarchy
• Searching and Navigation: Traversing the Tree
• Searching by Attributes
• Navigation: One level or sub-tree. There are not many ways to navigate
a tree:
• First, you enumerate the children.
• Then you reiterate for each child node.
• So you either believe that a hierarchical system is sufficient, or you don’t.
• The storage

The World of Data
Structured
(SQL)
Unstructured
(Search)

Relational
Structured Data: The Three Models and
Their Respective Installed Bases
Network/Graph
Graph
Database
Hierarchical
Database
SQL
Database

• These three models are similar in terms of what you can represent
with them. But they are optimized for different functions.
• Relational (SQL) is the most ubiquitous for good reasons:
• The most complete model and extremely flexible
• ACID properties make it great for capturing and updating information,
and it’s optimized for non-redundant write
• But it’s also slow to navigate and perform ad-hoc query and search
• Graphs and hierarchies belong to the same family; after all, trees
are “DAG” or “directed acrylic graphs:
• Slow for write and update (NO ACID properties in general)
• Fast in navigation and ad hoc query and search
The Three Models

Object/Entity, Attribute, Value/Keyword
Attribute 1 Attribute 3Attribute 2
Keyword/Value Keyword/Value Keyword/Value
Attribute 4
Keyword/Value Keyword/Value Keyword/Value

Object, Relationship, Data Model
Object
Relationship

Hierarchical Data Model
1
2
3
1
2
3

Relational Data Model (ERM, ORM, & UML)
Tables/Entities/Object & Relations

From Graph to Functions to E/R

From E/R to Semantic Model
Verb
Verb
Verb
Subject Object

How The Models Stack Up
Relational
Graph/Hierarchy
FasterSlower
Slower
Faster
Write
Update
Query
Search
Navigation/Traversal

SQL is the Workhorse for Modern
Data Management
Data Management
ETLMDM/CDI
Data Warehouse
Analytics/BISearch
Big Data
SQL
IntegrationUnstructured Data

LDAP is Key to Identity Management
Identity Management
(ETL)
Sync engine
Provisioning
MDM
Metadirectory
Analytics/SIEMSearch
Big Data
(along with
Web Services
and SQL)
Integration
LDAP
Virtualization

Why Should Identity Management be
Separate from the Rest of the Chain?
Identity Management
ETLMDM/CDI
Data Warehouse
Analytics/BISearch
Big Data (SIEM)
Directory
Web Services
SQL
Integration

Identity and Context Virtualization Process

Foundation for an Identity Service:
Building a Global Virtual Identifier
and Global Virtual Registry

Solution:
Building a Global List with No Duplicates

Link Identity to Context, Regrouping Objects into
Sentences and Sentences into Contexts

Solution: Gather Attributes and Join Them
to Build a Virtualized Global Profile

• A system made of two parts
• Integration layer based on virtualization
• Storage layer (Persistent Cache)
• LDAP (up to R1 V 6.1)
• HDAP (based on Hadoop/Lucene/Solr, V 7.0)
Integration and Cache/Storage Layer

Why We Need a Federated Identity
That’s Based on Virtualization and
Stored in HDAP Directories

The World of Access Keeps Expanding
App sourcing and hosting
User
populations
App access
channels
SasS apps
Apps in public clouds
Partner apps
Apps in private clouds
On-premise enterprise apps
Enterprise computers
Enterprise-issued devices
Public computers
Personal devices
Employees
Contractors
Customers
Partners
Members

The Challenges of implementing an Enterprise IdP:
How to Handle Different Internal Security Domains?
Federation
Cloud Apps
IdP
Authentication and SSO
Enterprise Identity
Data Sources
? ??
Implementation

A Federated Identity Hub Manages Authentication
and Attributes to Support the IdP
AD
Forest/Domain A
AD
Forest/Domain B Databases
Internal
Enterprise
Apps
Directories
Federation
Cloud Apps
Identity
Sources
IdP

Federated Identity Service and Provisioning
Legacy Applications
(and respective stores)
AD Sun LDAP
Cloud Apps
LDAP/
SQL/
SPML
FID
as reference store
SPML
SCIM
Internal
Systems
External
Systems

Virtual View Based on Org Chart
Top Manager
Full
Management
Hierarchy

Virtual View Based on Location
Country
State
City

Virtual View Based on Role, Location,
and Territory
Role
Location
Territory

New Use Case: Contextual Search

Webster’s Definition of “Context”
Latin Contextus: a joining together, origin pp of contexere “to weave
together.”
1.The parts of a sentence, paragraph, discourse immediately next
to or surrounding a specified word or passage and determining
its exact meaning [to quote a remark out of context] (Language
Representation)
2.The whole situation, background, or environment relevant to a
particular event, personality, creation, etc…(Perception)

Trees as a Representation of Sentences

Trees as a Way to Represent Sentences
and Context

Diving into one sentence from the
contextual search result

Navigating the different sentences returned in the
context search:
Account the Great Outdoors purchased Order 21

Navigating sentences returned in the search:
SalesRep Nancy Davolio has account The Great
Outdoors

HDAP:
RadiantOne High-Availability LDAP
Based on Lucene/ZooKeeper
(Sub-components of Hadoop)

• An LDAP directory is a hierarchical database with this architecture:
• A set of entries, indexed by a main index: the directory tree
• A set of indexes to support attribute search (one per attribute).
• The core technology over the last 10 years was to implement the tree as
a set of B-tree indexes. B-trees can scale to 100’s of millions of entries.
Current Implementation of LDAP Servers
is Based on B-Tree Indexation
Entries
B Tree

From Lucene to Hadoop to ZooKeeper
• Hadoop is an offshoot of the Lucene/Nutch project, aimed at
creating an open source search engine.
• Lucene is the search and index part of the search engine.
• Hadoop is the distributed storage (HDFS) and compute
(Map/Reduce batch-oriented) engine, offering very sizable
throughput on a large cluster of commoditized servers.
• There are many components and sub-projects that came out of the
Hadoop project.
• ZooKeeper is a low-level component for managing configuration and
replication for a large number of nodes in a Hadoop cluster.

Millions of
Entries
Millions of
Users
Node management
LDAP Front-End
Components
(BER encoding etc…...)
Distributed
Configuration Manager
Add Node, Define new
leader, SWAP in and
SWAP out dynamically.
Scale Out
Add more VDS for faster
queries and more
documents
Replication
(Leader/Followers)
Add more replicas
(followers) for better
throughput (queries/sec)
and fault tolerance
Hard commit
(Flushed to
disk)
configures
Manage
Configuration
and State
Per Node
We are getting
60000 LDAP q/sec
before VDS,
30000q/sec after
VDS
LDAP Front End
functions)
One Core per JVM
Java Web App
VDS Core
LDAP Processing
add/update/del
LDAP
Query Processing
and Caching
Schema
etc….xml
<fields>
<types>
VDS Config
Distributed VDS + Lucene Index on each node
Soft commit
(in memory)
Near Real-Time
Replica n
Follower
replica1
cluster of commodity
servers
Zookeeper
For VDS
LDAP and Other
Protocols: Front-End
XML/JSON/HTTP
Indexing Queries
Leader Follower

• HDAP (VDS + Lucene)/10M entries
• 1 node: 30k/sec
2 nodes: 65k/sec
3 nodes: 95k/sec
4 nodes: 130k/sec
5 nodes: 149k/sec
• Google daily average load: 3 million q/minute or 50,000 q/sec
Initial Performance Tests (LDAP Search)
0
20000
40000
60000
80000
100000
120000
140000
160000
1 2 3 4 5
Series1
Series2

The Architecture of the
RadiantOne Federated Identity Service:
• Acting as an abstraction layer between applications and the underlying identity
silos, virtualization isolates applications from the complexity of backends.
Aggregation
Correlation
Integration
Virtualization by model
Population
C
Population
B
Population
A
Groups Roles
LDAP
SQL
Web
Services
/SOA
App A
App B
App C
App D
App E
App F
Contexts
Services
REST

• Everything is automatically indexed in HDAP so you can search the
directory the same way you search Google…
• An inverted tree is not necessarily balanced; you could have some
paths that are very shallow, while some are very deep.
HDAP Uses a Key/Value System Based on
Search Technology: Inverted Tree
Inverted Tree

CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (7)

Ähnlich wie CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Ähnlich wie CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data (20)

Mehr von CloudIDSummit

Mehr von CloudIDSummit (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data