Michel Prompt, Chairman & CEO, Radiant Logic
There's a sea of change coming in terms of scaling identity and access management. This session will look at what's next in directory technology, scalability and possibility.
2. • What Is HDAP?
• Why HDAP?
• Why even LDAP?
• Evaluating the models for structured data
• Hierarchical model and LDAP
• The requirements/ drivers for more scalability
• Using Identity and Context Virtualization to build a Federated Identity Service (FID)
• Why FID is essential
• Powering a new use case: Contextual Search
• How HDAP works/ Performance.
What We’ll Cover Today
4. • This highly-available version of LDAP offers better performance and
increased scalability.
• Now, you may be thinking:
• LDAP is already very fast and scalable.
• And who needs LDAP anyway? Shouldn’t we do as Ian Glazer says, and
“kill IdM in order to save it”?
• But HDAP goes beyond LDAP, delivering much more and doing it all
much faster.
A Next-Gen LDAP Directory Driven by
Hadoop and Search Technology
7/15/2013 4
6. • Identity remains essential to IT because people are often the center
of activities.
• While there are multiple use cases, one of the key functions of
identity is to act as an integration point.
• As such, identity management is at the center of application
integration.
• We need a way to store identities and their attributes, but is LDAP
still relevant?
• Do we really need a hierarchical system, when the world is moving
toward these models?
• Path
• Graph
• Directed Graph
• Relational
To Bring New Life to the Heart of IT:
People and What They Do
7. Roadmap:
The Role of Identity and Context Virtualization
in the Technology Food Chain
Company Confidential
8. Are the Hierarchies of LDAP Still
Necessary?
• The Protocol
• The Schema
• The Storage: Hierarchy
• Searching and Navigation: Traversing the Tree
• Searching by Attributes
• Navigation: One level or sub-tree. There are not many ways to navigate
a tree:
• First, you enumerate the children.
• Then you reiterate for each child node.
• So you either believe that a hierarchical system is sufficient, or you don’t.
• The storage
9. The World of Data
Structured
(SQL)
Unstructured
(Search)
10. Relational
Structured Data: The Three Models and
Their Respective Installed Bases
Network/Graph
Graph
Database
Hierarchical
Database
SQL
Database
11. • These three models are similar in terms of what you can represent
with them. But they are optimized for different functions.
• Relational (SQL) is the most ubiquitous for good reasons:
• The most complete model and extremely flexible
• ACID properties make it great for capturing and updating information,
and it’s optimized for non-redundant write
• But it’s also slow to navigate and perform ad-hoc query and search
• Graphs and hierarchies belong to the same family; after all, trees
are “DAG” or “directed acrylic graphs:
• Slow for write and update (NO ACID properties in general)
• Fast in navigation and ad hoc query and search
The Three Models
18. From E/R to Semantic Model
Verb
Verb
Verb
Subject Object
19. How The Models Stack Up
Relational
Graph/Hierarchy
FasterSlower
Slower
Faster
Write
Update
Query
Search
Navigation/Traversal
20. SQL is the Workhorse for Modern
Data Management
Data Management
ETLMDM/CDI
Data Warehouse
Analytics/BISearch
Big Data
SQL
IntegrationUnstructured Data
21. LDAP is Key to Identity Management
Identity Management
(ETL)
Sync engine
Provisioning
MDM
Metadirectory
Analytics/SIEMSearch
Big Data
(along with
Web Services
and SQL)
Integration
LDAP
Virtualization
22. Why Should Identity Management be
Separate from the Rest of the Chain?
Identity Management
ETLMDM/CDI
Data Warehouse
Analytics/BISearch
Big Data (SIEM)
Directory
Web Services
SQL
Integration
28. • A system made of two parts
• Integration layer based on virtualization
• Storage layer (Persistent Cache)
• LDAP (up to R1 V 6.1)
• HDAP (based on Hadoop/Lucene/Solr, V 7.0)
Integration and Cache/Storage Layer
29. Why We Need a Federated Identity
That’s Based on Virtualization and
Stored in HDAP Directories
30. The World of Access Keeps Expanding
App sourcing and hosting
User
populations
App access
channels
SasS apps
Apps in public clouds
Partner apps
Apps in private clouds
On-premise enterprise apps
Enterprise computers
Enterprise-issued devices
Public computers
Personal devices
Employees
Contractors
Customers
Partners
Members
31. The Challenges of implementing an Enterprise IdP:
How to Handle Different Internal Security Domains?
Federation
Cloud Apps
IdP
Authentication and SSO
Enterprise Identity
Data Sources
? ??
Implementation
32. A Federated Identity Hub Manages Authentication
and Attributes to Support the IdP
AD
Forest/Domain A
AD
Forest/Domain B Databases
Internal
Enterprise
Apps
Directories
Federation
Cloud Apps
Identity
Sources
IdP
33. Federated Identity Service and Provisioning
Legacy Applications
(and respective stores)
AD Sun LDAP
Cloud Apps
LDAP/
SQL/
SPML
FID
as reference store
SPML
SCIM
Internal
Systems
External
Systems
38. Company Confidential
Webster’s Definition of “Context”
Latin Contextus: a joining together, origin pp of contexere “to weave
together.”
1.The parts of a sentence, paragraph, discourse immediately next
to or surrounding a specified word or passage and determining
its exact meaning [to quote a remark out of context] (Language
Representation)
2.The whole situation, background, or environment relevant to a
particular event, personality, creation, etc…(Perception)
46. • An LDAP directory is a hierarchical database with this architecture:
• A set of entries, indexed by a main index: the directory tree
• A set of indexes to support attribute search (one per attribute).
• The core technology over the last 10 years was to implement the tree as
a set of B-tree indexes. B-trees can scale to 100’s of millions of entries.
Current Implementation of LDAP Servers
is Based on B-Tree Indexation
Entries
B Tree
47. From Lucene to Hadoop to ZooKeeper
• Hadoop is an offshoot of the Lucene/Nutch project, aimed at
creating an open source search engine.
• Lucene is the search and index part of the search engine.
• Hadoop is the distributed storage (HDFS) and compute
(Map/Reduce batch-oriented) engine, offering very sizable
throughput on a large cluster of commoditized servers.
• There are many components and sub-projects that came out of the
Hadoop project.
• ZooKeeper is a low-level component for managing configuration and
replication for a large number of nodes in a Hadoop cluster.
48. Millions of
Entries
Millions of
Users
Node management
LDAP Front-End
Components
(BER encoding etc…...)
Distributed
Configuration Manager
Add Node, Define new
leader, SWAP in and
SWAP out dynamically.
Scale Out
Add more VDS for faster
queries and more
documents
Replication
(Leader/Followers)
Add more replicas
(followers) for better
throughput (queries/sec)
and fault tolerance
Hard commit
(Flushed to
disk)
configures
Manage
Configuration
and State
Per Node
We are getting
60000 LDAP q/sec
before VDS,
30000q/sec after
VDS
LDAP Front End
functions)
One Core per JVM
Java Web App
VDS Core
LDAP Processing
add/update/del
LDAP
Query Processing
and Caching
Schema
etc….xml
<fields>
<types>
VDS Config
Distributed VDS + Lucene Index on each node
Soft commit
(in memory)
Near Real-Time
Replica n
Follower
replica1
cluster of commodity
servers
Zookeeper
For VDS
LDAP and Other
Protocols: Front-End
XML/JSON/HTTP
Indexing Queries
Leader Follower
50. The Architecture of the
RadiantOne Federated Identity Service:
• Acting as an abstraction layer between applications and the underlying identity
silos, virtualization isolates applications from the complexity of backends.
Aggregation
Correlation
Integration
Virtualization by model
Population
C
Population
B
Population
A
Groups Roles
LDAP
SQL
Web
Services
/SOA
App A
App B
App C
App D
App E
App F
Contexts
Services
REST
51. • An LDAP directory is a hierarchical database with this architecture:
• A set of entries, indexed by a main index: the directory tree
• A set of indexes to support attribute search (one per attribute).
• The core technology over the last 10 years was to implement the tree as
a set of B-tree indexes. B-trees can scale to 100’s of millions of entries.
Current Implementation of LDAP Servers
is Based on B-Tree Indexation
Entries
B Tree
52. • Everything is automatically indexed in HDAP so you can search the
directory the same way you search Google…
• An inverted tree is not necessarily balanced; you could have some
paths that are very shallow, while some are very deep.
HDAP Uses a Key/Value System Based on
Search Technology: Inverted Tree
Inverted Tree