The presentation was given as part of a SCAPE Training event on ‘Effective Evidence-Based Preservation Planning’ in Aarhus, Denmark, 13-14 November 2013.
Catherine Jones, Science and Technology Facilities Council, presented the concept of control policies and what is needed to produce machine understandable control policies.
Handwritten Text Recognition for manuscripts and early printed texts
Control policy formulation
1. Control Policy formulation
The why and how
Catherine Jones
Science and Technology Facilities Council
SCAPE Training
Statsbiblioteket, Aarhus, 13-14 November 2013
2. Format of this session
• 11:15 – 11:40 Presentation on creating control
policies
• 11:40 – 12:25 Practical Exercise (small groups)
• 12:25 - 12:45 Discussion about the practical exercise
and the topic of policy in general
2
3. What is digital preservation policy about?
• The organisation’s aims and objectives about the
long term care of digital objects:
• Preservation strategies and acceptable actions
• Decision about the digital objects (formats, significant
properties etc)
• Who the material is being preserved for
• Resourcing
• Responsibilities
3
5. SCAPE Policy Levels - recap
Guidance
Preservation
Procedure
Control
High level
More detailed level
Specific, measurable
objectives
General objectives
General approaches
Applies to specific
collections or formats
Applies to all parts of
the organisation and
collections
Written in natural
language to be read
by a human being
In two forms: natural
language and
machine readable
form (RDF)
Written in natural
language to be read
by a human being
5
6. Why two forms of control policies?
• Natural language policy needed for humans and may
(should) already exist – in procedures/collection
management policy/implicit understanding etc.
• Need machine understandable form to use
automated tools
6
7. What is special about SCAPE
machine understandable control policies?
• Related to a specific set of circumstances – the
collection of digital objects; the people who will use
them and a purpose. Known as a preservation case
• Need to be specific so that they can be measured or
assessed.
• File format must be TIFF
• There must be 3 copies of each object
• Not all control policies may be machine actionable
• There must be 3 members of staff who have qualification X
7
8. SCAPE Control Policy model
links a particular content set (collection) with a particular user community
(specific requirements) with specific measurable objectives which can be
tested automatically
8
16. What do you need to create machine understandable
control policies?
• Some written policy – either at the Preservation
Procedure level, or at the more detailed control
level.
• An understanding of the goals of preservation
• Knowledge of the collection and who uses it &
manages it & any procedures in place.
• Some appreciation of what topics you are likely to
need Planning & Watch activities for
16
17. Creating Control policy statements
Stage 1: Whole policy
activities
Stage 2: Policy statements
within the whole policy
Stage 3: Review and
rationalise
17
18. Creating Control policy statements
Stage 1: Whole policy activities
These are activities considering the policy as a whole
1. Identify the content set the policy addresses
•
What type of material is being preserved in this case?
2. Identify the user communities/roles required by the policy
•
Who will be using the material or interacting with the material?
3. Map policy statements to high level concepts.
• In general what type of activities are the statements referring to?
18
19. Creating Control policy statements
Stage 2: Policy statements within the whole policy
For each statement or section in the policy undertake:
1. Clarification of implicit meaning
•
Are there hidden meanings/context that needs to be stated explicitly?
2. Identification of control policy preservation case
•
What issue is the statement addressing?
3. Identification of objectives
•
What are the measureable statements which embody the policy
statement?
4. Generate control statements
•
Use of a tool or knowledge of RDF to create machine understandable
statements
19
20. Creating Control policy statements
Stage 3: Review & Rationalise
For preservation cases and associated objectives review:
1. Are there any objectives which are in every preservation
case?
•
These are candidates for organisation related objectives
2. Do some of the preservation cases overlap/are the same?
•
You need to consider whether fewer but broader preservation cases or
multiple specific ones is the most appropriate. This depends on what
you intend to use them for, and what overheads there are in
maintaining the optimal number
20
21. Worked Example
Using raw data will 2011 LET in well-defined formats for which of means
“3.1.1 Allthe contentsetbe curatedCalibration and a user community the domainof
specific researchers
reading the data will be made available by the Facility”
i. The file reader MUST be available to the designated user community
Express some of the implicit information and rewrite to:
Goals/Objectives:
Using the contentset be of an approved format for the community of ISIS data
1. File format must 2011 LET Calibration and a user contentset
• “All data curated will be in well-defined formats”
managers format should have documentation
2. The file
• “Approved well-defined formats will be able to be read”
3. Any instrument specific schema should be documented
i. File format MUST be NeXus
• “The reader will be supplied by at least the ISIS Facility”
4. There should be at least one piece of software which can read the files
Also need to express what “curated” means
ii. The file format MUST haveavailable from the organisation holding the data
5. This file reader should be documentation
iii. Any instrument specific be able to be useddocumented
6. This file reader should schema MUST be by the designated user
iv. Nexus File reader software available > 1
community
7. The file format should be located at STFC
v. NeXus file reader MUSTbe able to be validated
8. Fixity checks should be able to be
vi. The file format MUSTbe undertakenvalidated
vii. Fixity checks MUST be able to be undertaken
21
22. Conclusion
• Having explicit policy in natural language is important
• Expressing policy in machine testable ways is more
complex but can bring benefit through use of tools
• Natural language policy defines statements of acceptable
states; machine understandable control level asks
measureable questions
• Implicit information understood by human audience which
needs explicitly expressing for computers
• Written policy is at a fairly abstract level and practicalities
may be addressed in implementation plan/job procedure
document or one-off project plan
22
23. Next – a practical exercise
• You should have:
• The example scenario
• Sheets with possible attributes and measures
• Control Policy worksheets
• In pairs or small groups try converting the scenario
into control policy statements
23