SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Securing Solr Documents with
         ManifoldCF

     How to Enforce Repository
      Authorization with Solr




                 2
What I Will Cover
§  What ManifoldCF does and the problem it is
    designed to solve
§  ManifoldCF’s way of mapping repository
    security to documents indexed by Solr/
    Lucene
§  A Q&A panel session describing real-world
    usage of the ManifoldCF security projection
    model




                      3
Who am I?
§  I am:
   •  Karl Wright (kwright@apache.org)
   •  Principal Software Engineer at Nokia, Inc.
   •  Formerly Principal Software Engineer at
      MetaCarta, Inc.
§  What I do:
   •  Work at Nokia on making location search better
   •  Designer and original implementer of
      ManifoldCF
   •  Author of ‘ManifoldCF in Action’
   •  Committer for ManifoldCF
   •  Other interests include musical composition,
      quantum mechanics, and evolutionary biology

                                 4
Let’s search our repository using Solr!

§  But first, we have to get our repository
    documents indexed by Solr
§  And then… there’s another obstacle… VINNY




                       5
Who is this Vinny guy??
§  Chances are, you already know him
§  “Vinny” protects your organization’s content
§  “Vinny” prevents unauthorized users from
    seeing what they aren’t supposed to see
§  “Vinny” isn’t going to let you index his content
    unless you can control access in the same way




                          6
ManifoldCF to the Rescue!
§  Plug-in architecture allows connectors
    to easily be written, if they don’t exist
    already
§  Existing repository connectors for web,
    RSS, JDBC, CIFS (shared file
    system), SharePoint, Meridio, FileNet,
    LiveLink, Documentum, CMIS
§  Existing output connectors for Solr,
    GTS, and OpenSearchServer
§  Includes a user-facing UI, an API, and
    an Authorization Service
                           7
Query Restriction Model




(From ManifoldCF in Action, Chapter 4. Reprinted with permission.)


                                 8
How ManifoldCF Implements
       Query Restriction
§  Document access tokens are sent to the search
    index along with the document content
§  Separate bins for “allow” tokens, “deny” tokens
    – for “file”, multiple “folder”, and “share” levels
§  In practice, only “file” and “share” levels are
    needed
§  ManifoldCF Authority Service maps user names
    to a user’s access tokens
§  Solr SearchComponent or QParserPlugin
    communicates with the MCF Authority Service
    and performs the query modification
                           9
ManifoldCF Architecture




            10
What does the Pull-Agent
           daemon do?
§  Pulls documents from various repositories,
    continuously or on a schedule, and hands them
    to the output search engine
§  Incremental – does as little work as possible
§  Also fetches and indexes each document’s
    access tokens




                        11
What does the Authority
     Service do?




            12
Ok, what does the Authority
       Service REALLY do?
§  User names go in (user@domain)
§  Access tokens come out – for all active
    authority connections currently defined in that
    ManifoldCF instance
§  HTTP based, line-by-line output, with helpful
    hints:
curl http://localhost:8345/mcf-authority-service/
UserACLs?username=foo@bar.com!
UNREACHABLEAUTHORITY:The+Spanish+Inquisition!
TOKEN:My+Authority:DEAD_AUTHORITY!
AUTHORIZED:Null+authority!
TOKEN:Null:foo%40bar.com!


                            13
What do you have to do to Solr
   to make this all work?
    §  Add fields to the schema to contain
        document access tokens
       •  A field for document-level “allow”
          tokens
       •  A field for document-level “deny” tokens
       •  A field for share-level “allow” tokens
       •  A field for share-level “deny” tokens
    §  Add something that authenticates a
        user and obtains a user name
    §  Add a SearchComponent or Query
        Parser to restrict incoming query
                      14
The Solr component is
     NOT where the magic is…
§  Each access token returned by
    the Authority Service adds a
    clause to a BooleanQuery
§  It is rare for a user to have more
    than one hundred access tokens
    – except for Documentum!!
§  ManifoldCF in Action provides an
    example Solr SearchComponent
§  dist/solr-integration provides
    a Solr SearchComponent and
    QParserPlugin (MCF trunk)
                          15
How are the four token types
             related?
§  Share and document levels computed
    independently; an included document must
    pass both
§  For each level, DENY tokens exclude and
    ALLOW tokens permit, but DENY tokens
    always win over ALLOW
§  Special meaning for no tokens at all at a level –
    no ALLOW nor DENY tokens means “public” –
    handled by a default token in Solr
§  Active Directory does it exactly the same way,
    oddly enough, using SIDs for tokens
                          16
Example
Document       Share allow Share deny   Doc allow Doc deny
Look_at_me     (empty)      (empty)     (empty)   (empty)
Very_secret    (empty)      (empty)     (empty)   T1
Not_picky      (empty)      (empty)     T1, T2, T3 T4
Really_picky   (empty)      (empty)     T1        (empty)
Insane         T1, T2       T3          T3, T2    T1
Share_ctrl’d   T1, T2, T3   T4          (empty)   (empty)


§  “Not_picky” and “Share_ctrl’d” seen by the
    same people
§  “Very_secret” seen by nobody
§  “Insane” seen by people with T2 only

                                 17
What is still missing from the
              picture?
§  Well, getting documents and authorization info
    into Solr is covered…
§  Getting authorization information for a user is
    covered…
§  Modifying the search to enforce authorization is
    covered…
§  Authentication is NOT covered!
   •  ManifoldCF does not help you with this problem
      – yet
   •  Consider JAAS in Tomcat
   •  Apache web server’s mod-auth-kerb also works
                          18
Do you think these people
  care about security?




             19
Wrap Up
§  ManifoldCF provides a great way to project
    repository security into Solr
§  ManifoldCF effectively converts repository
    security into an AD-like token model
§  As long as you can provide the authentication,
    MCF and Solr can provide the rest
§  Nobody ever expects the Spanish Inquisition




                         20
Our Panel Today
§  Karl Wright
§  Eric Pugh
§  Shinichiro Abe




                     21
Sources
§  ManifoldCF in Action
   •  http://www.manning.com/wright
   •  http://manifoldcfinaction.googlecode.com/svn/
      trunk/edition_1/security_example




                           22
Contacts
§  Shinichiro Abe
   •  shinichiro@apache.org
   •  http://www.rondhuit.com/apache-
      manifoldcf.html (In Japanese)
§  Eric Pugh
   •  epugh@opensourceconnections.com
§  Karl Wright
   •  kwright@apache.org
   •  http://manifoldcfinaction.blogspot.com



                           23

Weitere ähnliche Inhalte

Andere mochten auch

Apache ManifoldCF
Apache ManifoldCFApache ManifoldCF
Apache ManifoldCFShinichiro Abe
 
Web scraping with nutch solr
Web scraping with nutch solrWeb scraping with nutch solr
Web scraping with nutch solrMike Frampton
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-WebinarEdureka!
 
Besoin de rien Envie de Search - Presentation Lucene Solr ElasticSearch
Besoin de rien Envie de Search - Presentation Lucene Solr ElasticSearchBesoin de rien Envie de Search - Presentation Lucene Solr ElasticSearch
Besoin de rien Envie de Search - Presentation Lucene Solr ElasticSearchfrancelabs
 
Presentation Lucene / Solr / Datafari - Nantes JUG
Presentation Lucene / Solr / Datafari - Nantes JUGPresentation Lucene / Solr / Datafari - Nantes JUG
Presentation Lucene / Solr / Datafari - Nantes JUGfrancelabs
 

Andere mochten auch (6)

Apache ManifoldCF
Apache ManifoldCFApache ManifoldCF
Apache ManifoldCF
 
Web scraping with nutch solr
Web scraping with nutch solrWeb scraping with nutch solr
Web scraping with nutch solr
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-Webinar
 
Besoin de rien Envie de Search - Presentation Lucene Solr ElasticSearch
Besoin de rien Envie de Search - Presentation Lucene Solr ElasticSearchBesoin de rien Envie de Search - Presentation Lucene Solr ElasticSearch
Besoin de rien Envie de Search - Presentation Lucene Solr ElasticSearch
 
Presentation Lucene / Solr / Datafari - Nantes JUG
Presentation Lucene / Solr / Datafari - Nantes JUGPresentation Lucene / Solr / Datafari - Nantes JUG
Presentation Lucene / Solr / Datafari - Nantes JUG
 
Engineering Drawing
Engineering DrawingEngineering Drawing
Engineering Drawing
 

Mehr von lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

Mehr von lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

KĂźrzlich hochgeladen

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

KĂźrzlich hochgeladen (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Securing Documents in Solr with Manifold CF - Karl Wright

  • 1. Securing Solr Documents with ManifoldCF How to Enforce Repository Authorization with Solr 2
  • 2. What I Will Cover §  What ManifoldCF does and the problem it is designed to solve §  ManifoldCF’s way of mapping repository security to documents indexed by Solr/ Lucene §  A Q&A panel session describing real-world usage of the ManifoldCF security projection model 3
  • 3. Who am I? §  I am: •  Karl Wright (kwright@apache.org) •  Principal Software Engineer at Nokia, Inc. •  Formerly Principal Software Engineer at MetaCarta, Inc. §  What I do: •  Work at Nokia on making location search better •  Designer and original implementer of ManifoldCF •  Author of ‘ManifoldCF in Action’ •  Committer for ManifoldCF •  Other interests include musical composition, quantum mechanics, and evolutionary biology 4
  • 4. Let’s search our repository using Solr! §  But first, we have to get our repository documents indexed by Solr §  And then… there’s another obstacle… VINNY 5
  • 5. Who is this Vinny guy?? §  Chances are, you already know him §  “Vinny” protects your organization’s content §  “Vinny” prevents unauthorized users from seeing what they aren’t supposed to see §  “Vinny” isn’t going to let you index his content unless you can control access in the same way 6
  • 6. ManifoldCF to the Rescue! §  Plug-in architecture allows connectors to easily be written, if they don’t exist already §  Existing repository connectors for web, RSS, JDBC, CIFS (shared file system), SharePoint, Meridio, FileNet, LiveLink, Documentum, CMIS §  Existing output connectors for Solr, GTS, and OpenSearchServer §  Includes a user-facing UI, an API, and an Authorization Service 7
  • 7. Query Restriction Model (From ManifoldCF in Action, Chapter 4. Reprinted with permission.) 8
  • 8. How ManifoldCF Implements Query Restriction §  Document access tokens are sent to the search index along with the document content §  Separate bins for “allow” tokens, “deny” tokens – for “file”, multiple “folder”, and “share” levels §  In practice, only “file” and “share” levels are needed §  ManifoldCF Authority Service maps user names to a user’s access tokens §  Solr SearchComponent or QParserPlugin communicates with the MCF Authority Service and performs the query modification 9
  • 10. What does the Pull-Agent daemon do? §  Pulls documents from various repositories, continuously or on a schedule, and hands them to the output search engine §  Incremental – does as little work as possible §  Also fetches and indexes each document’s access tokens 11
  • 11. What does the Authority Service do? 12
  • 12. Ok, what does the Authority Service REALLY do? §  User names go in (user@domain) §  Access tokens come out – for all active authority connections currently defined in that ManifoldCF instance §  HTTP based, line-by-line output, with helpful hints: curl http://localhost:8345/mcf-authority-service/ UserACLs?username=foo@bar.com! UNREACHABLEAUTHORITY:The+Spanish+Inquisition! TOKEN:My+Authority:DEAD_AUTHORITY! AUTHORIZED:Null+authority! TOKEN:Null:foo%40bar.com! 13
  • 13. What do you have to do to Solr to make this all work? §  Add fields to the schema to contain document access tokens •  A field for document-level “allow” tokens •  A field for document-level “deny” tokens •  A field for share-level “allow” tokens •  A field for share-level “deny” tokens §  Add something that authenticates a user and obtains a user name §  Add a SearchComponent or Query Parser to restrict incoming query 14
  • 14. The Solr component is NOT where the magic is… §  Each access token returned by the Authority Service adds a clause to a BooleanQuery §  It is rare for a user to have more than one hundred access tokens – except for Documentum!! §  ManifoldCF in Action provides an example Solr SearchComponent §  dist/solr-integration provides a Solr SearchComponent and QParserPlugin (MCF trunk) 15
  • 15. How are the four token types related? §  Share and document levels computed independently; an included document must pass both §  For each level, DENY tokens exclude and ALLOW tokens permit, but DENY tokens always win over ALLOW §  Special meaning for no tokens at all at a level – no ALLOW nor DENY tokens means “public” – handled by a default token in Solr §  Active Directory does it exactly the same way, oddly enough, using SIDs for tokens 16
  • 16. Example Document Share allow Share deny Doc allow Doc deny Look_at_me (empty) (empty) (empty) (empty) Very_secret (empty) (empty) (empty) T1 Not_picky (empty) (empty) T1, T2, T3 T4 Really_picky (empty) (empty) T1 (empty) Insane T1, T2 T3 T3, T2 T1 Share_ctrl’d T1, T2, T3 T4 (empty) (empty) §  “Not_picky” and “Share_ctrl’d” seen by the same people §  “Very_secret” seen by nobody §  “Insane” seen by people with T2 only 17
  • 17. What is still missing from the picture? §  Well, getting documents and authorization info into Solr is covered… §  Getting authorization information for a user is covered… §  Modifying the search to enforce authorization is covered… §  Authentication is NOT covered! •  ManifoldCF does not help you with this problem – yet •  Consider JAAS in Tomcat •  Apache web server’s mod-auth-kerb also works 18
  • 18. Do you think these people care about security? 19
  • 19. Wrap Up §  ManifoldCF provides a great way to project repository security into Solr §  ManifoldCF effectively converts repository security into an AD-like token model §  As long as you can provide the authentication, MCF and Solr can provide the rest §  Nobody ever expects the Spanish Inquisition 20
  • 20. Our Panel Today §  Karl Wright §  Eric Pugh §  Shinichiro Abe 21
  • 21. Sources §  ManifoldCF in Action •  http://www.manning.com/wright •  http://manifoldcfinaction.googlecode.com/svn/ trunk/edition_1/security_example 22
  • 22. Contacts §  Shinichiro Abe •  shinichiro@apache.org •  http://www.rondhuit.com/apache- manifoldcf.html (In Japanese) §  Eric Pugh •  epugh@opensourceconnections.com §  Karl Wright •  kwright@apache.org •  http://manifoldcfinaction.blogspot.com 23