Videos from this conference can be found here: https://vimeo.com/album/2012142
This talk combines a brief presentation with panel Q&A session with a number of key experts in the field of open source content acquisition and security. We'll start by familiarizing Solr users with the capabilities of Apache Manifold Connector Framework, concentrating on how Manifold CF (MCF) can be used to project a repository's security into Solr search results through the use of Manifold CF's Authority Service and a custom Solr search component. We'll then transition to a panel discussion designed to explore case studies of how this security architecture has worked out when deployed in the field, and take questions from the audience. If you have questions in advance you would like us to consider for the panel discussion, we'd welcome them. You may submit questions ranging from 'how-to' to the MCF roadmap to kwright(at)apache.org.
2. What I Will Cover
§ď§âŻ What ManifoldCF does and the problem it is
designed to solve
§ď§âŻ ManifoldCFâs way of mapping repository
security to documents indexed by Solr/
Lucene
§ď§âŻ A Q&A panel session describing real-world
usage of the ManifoldCF security projection
model
3
3. Who am I?
§ď§âŻ I am:
â˘âŻ Karl Wright (kwright@apache.org)
â˘âŻ Principal Software Engineer at Nokia, Inc.
â˘âŻ Formerly Principal Software Engineer at
MetaCarta, Inc.
§ď§âŻ What I do:
â˘âŻ Work at Nokia on making location search better
â˘âŻ Designer and original implementer of
ManifoldCF
â˘âŻ Author of âManifoldCF in Actionâ
â˘âŻ Committer for ManifoldCF
â˘âŻ Other interests include musical composition,
quantum mechanics, and evolutionary biology
4
4. Letâs search our repository using Solr!
§ď§âŻ But first, we have to get our repository
documents indexed by Solr
§ď§âŻ And then⌠thereâs another obstacle⌠VINNY
5
5. Who is this Vinny guy??
§ď§âŻ Chances are, you already know him
§ď§âŻ âVinnyâ protects your organizationâs content
§ď§âŻ âVinnyâ prevents unauthorized users from
seeing what they arenât supposed to see
§ď§âŻ âVinnyâ isnât going to let you index his content
unless you can control access in the same way
6
6. ManifoldCF to the Rescue!
§ď§âŻ Plug-in architecture allows connectors
to easily be written, if they donât exist
already
§ď§âŻ Existing repository connectors for web,
RSS, JDBC, CIFS (shared file
system), SharePoint, Meridio, FileNet,
LiveLink, Documentum, CMIS
§ď§âŻ Existing output connectors for Solr,
GTS, and OpenSearchServer
§ď§âŻ Includes a user-facing UI, an API, and
an Authorization Service
7
8. How ManifoldCF Implements
Query Restriction
§ď§âŻ Document access tokens are sent to the search
index along with the document content
§ď§âŻ Separate bins for âallowâ tokens, âdenyâ tokens
â for âfileâ, multiple âfolderâ, and âshareâ levels
§ď§âŻ In practice, only âfileâ and âshareâ levels are
needed
§ď§âŻ ManifoldCF Authority Service maps user names
to a userâs access tokens
§ď§âŻ Solr SearchComponent or QParserPlugin
communicates with the MCF Authority Service
and performs the query modification
9
10. What does the Pull-Agent
daemon do?
§ď§âŻ Pulls documents from various repositories,
continuously or on a schedule, and hands them
to the output search engine
§ď§âŻ Incremental â does as little work as possible
§ď§âŻ Also fetches and indexes each documentâs
access tokens
11
12. Ok, what does the Authority
Service REALLY do?
§ď§âŻ User names go in (user@domain)
§ď§âŻ Access tokens come out â for all active
authority connections currently defined in that
ManifoldCF instance
§ď§âŻ HTTP based, line-by-line output, with helpful
hints:
curl http://localhost:8345/mcf-authority-service/
UserACLs?username=foo@bar.com!
UNREACHABLEAUTHORITY:The+Spanish+Inquisition!
TOKEN:My+Authority:DEAD_AUTHORITY!
AUTHORIZED:Null+authority!
TOKEN:Null:foo%40bar.com!
13
13. What do you have to do to Solr
to make this all work?
§ď§âŻ Add fields to the schema to contain
document access tokens
â˘âŻ A field for document-level âallowâ
tokens
â˘âŻ A field for document-level âdenyâ tokens
â˘âŻ A field for share-level âallowâ tokens
â˘âŻ A field for share-level âdenyâ tokens
§ď§âŻ Add something that authenticates a
user and obtains a user name
§ď§âŻ Add a SearchComponent or Query
Parser to restrict incoming query
14
14. The Solr component is
NOT where the magic isâŚ
§ď§âŻ Each access token returned by
the Authority Service adds a
clause to a BooleanQuery
§ď§âŻ It is rare for a user to have more
than one hundred access tokens
â except for Documentum!!
§ď§âŻ ManifoldCF in Action provides an
example Solr SearchComponent
§ď§âŻ dist/solr-integration provides
a Solr SearchComponent and
QParserPlugin (MCF trunk)
15
15. How are the four token types
related?
§ď§âŻ Share and document levels computed
independently; an included document must
pass both
§ď§âŻ For each level, DENY tokens exclude and
ALLOW tokens permit, but DENY tokens
always win over ALLOW
§ď§âŻ Special meaning for no tokens at all at a level â
no ALLOW nor DENY tokens means âpublicâ â
handled by a default token in Solr
§ď§âŻ Active Directory does it exactly the same way,
oddly enough, using SIDs for tokens
16
16. Example
Document Share allow Share deny Doc allow Doc deny
Look_at_me (empty) (empty) (empty) (empty)
Very_secret (empty) (empty) (empty) T1
Not_picky (empty) (empty) T1, T2, T3 T4
Really_picky (empty) (empty) T1 (empty)
Insane T1, T2 T3 T3, T2 T1
Share_ctrlâd T1, T2, T3 T4 (empty) (empty)
§ď§âŻ âNot_pickyâ and âShare_ctrlâdâ seen by the
same people
§ď§âŻ âVery_secretâ seen by nobody
§ď§âŻ âInsaneâ seen by people with T2 only
17
17. What is still missing from the
picture?
§ď§âŻ Well, getting documents and authorization info
into Solr is coveredâŚ
§ď§âŻ Getting authorization information for a user is
coveredâŚ
§ď§âŻ Modifying the search to enforce authorization is
coveredâŚ
§ď§âŻ Authentication is NOT covered!
â˘âŻ ManifoldCF does not help you with this problem
â yet
â˘âŻ Consider JAAS in Tomcat
â˘âŻ Apache web serverâs mod-auth-kerb also works
18
18. Do you think these people
care about security?
19
19. Wrap Up
§ď§âŻ ManifoldCF provides a great way to project
repository security into Solr
§ď§âŻ ManifoldCF effectively converts repository
security into an AD-like token model
§ď§âŻ As long as you can provide the authentication,
MCF and Solr can provide the rest
§ď§âŻ Nobody ever expects the Spanish Inquisition
20