Presentation given by Rene Wiermer and Jeffrey van der Hoeven at ELAG2017 about automating rights decisions for digital content at the KB, National Library of the Netherlands
2. The dream: In reality:
Open access to
everything
for
everybody!
Limited access
due to
copyright
&
contracts
3. Examples of restrictions (1)
1600 1930 1945 1980 2017
open closed
1400 1900 2017
open restricted
1995
Time ->
digitized
newspapers
digitized
books
no
download
4. Examples of restrictions (2)
Publisher AReading room only
Journal titels ->
open API key account
datasets
Scientific articles
Publisher B
Publisher Z
12. Needs 1: more information to the end user
- How do I get access ?
- What can I do with it ?
Improve UX with standardization of rights decisions
13. Needs 2: One system for multiple applications
- Several websites: Delpher, Geheugen van Nederland, Staten
Generaal Digitaal
- Several API’s: URN-Resolver, OAI-PMH, Search services …
Centralize access decisions for better compliance, management
and reporting
One change = immediately visible in each application
14. Needs 3: reducing our digitization backlog
- We have a lot of digital content that requires certain restrictions
- How can we make this accessible to anybody who is allowed to
see it ?
- We had an “on/off” infrastructure for most of our content
- Either accessible for everybody or not at all
- Not flexible enough, blocked workflows
Automation of rights decisions based on
- Metadata (Publication date, authors, publisher, type of
material..)
- Location (e.g. reading room)
- Type of user (e.g. researcher)
15.
16. Simple approach: extra metadata field ?
- For example
- <rights> FREE|RESTRICTED|CLOSED|... </rights>
- <license> CC0|CustomContract|... </license>
- Make decision based on the value of that field
- Works probably fine in a lot scenarios
- But:
- Does not scale with variation depending on context
- “Free for users of type researcher and visitors to the reading room, but not outside
of it”
- Needs maintenance over time
-Missing: why was this decision made ?
17. Instead: policies as code
- Policy: formalized set of rules regarding a collection of objects
- Decided at runtime -> decisions can change over time
- Follows general lines of thought of the organization: legal
obligations, contracts with publishers, management decisions
19. Still simple policy
Role-based access (from API-key, username/password auth…)
if (context.roles.contains("DS_METADATA_DTS"))
return Decision.permit();
Access based on publication date
static GregorianCalendar metadataFreeDate=new GregorianCalendar(1940,Calendar.JANUARY,1);
if (attributes.getMetadata().getPublicationDate()?.before(metadataFreeDate.getTime())) {
return Decision.permit();
}
Fallback
return Decision.denied();
20. Example: Books
Check for location
if (context.location.equals("READING_ROOM")) {
...
}
Demand measures to prevent downloads from frontend
if (attributes.listContainsValue("boeken-leeszaal-kopieerbeveiliging", "ppn",
attributes.getMetadata().getPpn()) ) {
return Decision.permit(new Obligation("DoNotDownload"),usageRights);
}
Check for death dates of all contributors
if (DateChecks.allAuthorsDeadLongerThan(attributes.getMetadata(),authorDeathDateLimit)) {
return Decision.permit(usageRights);
}
21. Decisions
Input: Identifier, Metadata, Location, Authorization
End result of a policy decisions:
- PERMIT
- DENIED
- NOT APPLICABLE
additional attributes:
- obligations: things the endpoint has to enforce
- advices: things the endpoint might need to improve UX
Ex: PERMIT (obligation:”DoNotDownload”, advice:”OnlyInReadingRoom”)
22. Diagram by David Brossard under a CC-BY 3.0 license
Enforce
Decide
Administer Metadata
Context
24. Architecture: XACML (sort of)
- Attribute Based Access Control (ABAC)
- Follows XACML reference architecture
- … but not the language (cumbersome, slow and restricted)
25. Technology
- Write the policies in an embedded scripting language (Groovy)
- Fast (in comparison to XACML language implementations)
- Able to be adopted/managed outside of core development team
- still: reuse of existing development toolchain
- Automated testing !
- Deployed as central REST service
- Serves multiple applications
28. Limitations
- Search filtering on access: combination with dynamic decisions
- Which objects am I allowed to use ?
- Export of access information to other systems (e.g. WorldCat)
Possible mitigations
- Compromises on dynamic decisions (short term)
- Move from slow ETL to event-based architectures (longer term)
29. Current status & results
- Stepwise in production since Mid 2016
- New objects are becoming available
- Copyright claims are easier to handle
- Clearer insight into current status of collection
- Better insight into needs for partnership contracts
- Impulses for better metadata storage/access infrastructure
175M requests per month
+/- 6 million a day
60+ million pages
under control by
access management
32. About
- Managing digital collections with multiple licenses and access
policies
- Technical choices that fit our organisational needs
Not about
- DRM and copy protection
- Usage of closed proprietary systems
33. Motivation
- As a public service organisation we want: access as far as
possible
- Limit of possibilities
- Licenses
- Contractual obligations
- Governmental and organisational policies
- Copyright status
- A simple yes or no is not always enough; we need
- a clear guideline for the user: what can I do with it and how do I get
access ?
- automation of management: we want to be able to scale and still be
compliant
34. Crossing the domains: communication
- Define your terms: Collection, policy, decision … make sure to
communicate them clearly
- Make sure contracts and managerial decisions can be translated to
the technical reality.
- Offer protection and guarantee options for future contracts
- Make compliance easier through monitoring + reporting
- Use of examples + flow diagrams
36. Our problems
- Multiple applications give access to collections
- ideally centralised decision making and reporting
- Decisions depend on context: user, location, time
- Flexible to allow for individual interventions
- Clearer insight necessary why things are hidden away