The document describes the PeriCAT framework, which provides a mechanism for capturing user scenarios and suggesting the best information encapsulation technique. It allows integrating various encapsulation techniques from different domains. The framework includes a Java application that can encapsulate and decapsulate digital objects and associated metadata. It provides tutorials and guides for using the tool and its API, as well as information for developers on integrating new techniques and criteria into the framework.
1. GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3
Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics
[Digital Preservation]
“This project has received funding from the European Union’s Seventh
Framework Programme for research, technological development and
demonstration under grant agreement no601138”.
The PeriCAT Framework
Anna-Grit Eggers (University of Goettingen)
4. • PeriCAT - The PERICLES Content Aggregation Tool - is a
framework for Information Encapsulation techniques.
• It integrates a set of information encapsulation techniques from
various domains, which can be used from within the framework.
• It provides a mechanism to capture the scenario of the user, and to
suggest the best fitting information encapsulation technique for
a given scenario.
About
https://github.com/pericles-project/PeriCAT
5. • PeriCAT is an Java application which runs on Linux, OS X and
Windows systems.
• It requires a Java 7 installation.
• The tool itself needs no installation.
• Just download the .jar file and execute it with your operating system
specific method (Windows: double-click, Linux/OS X: java -jar
PeriCAT.jar).
Installation
7. • The PERICLES tools PET and PeriCAT can be used together in a Sheer
Curation scenario.
Sheer Curation
• Sheer Curation means to start curation activities at the environment in
which digital objects are created and altered.
• The PERICLES Extraction Tool PET can be used to extract information
significant for long term preservation and object re-use from this creation
context.
• The extracted information can be encapsulated together with the related
digital object, to ensure that it will be available after the object leaves the
creation environment.
8. Example: the PERICLES sheer curation scenario.
The PERICLES Extraction Tool (PET) monitors the environment and reacts to changes in this
environment. It extracts this information, and temporarily stores it locally. The extracted information is then
appraised by the person creating and altering digital objects. The appraisal allows for intervention and
filtering of problematic information, such as private or confidential information, not to be encapsulated.
10. • Start the tool. You will see three main tabs.
• Open the information encapsulation tab.
• Decide which are the carrier and which
are the payload files; In most cases the
carrier consists of the “DO” files, and the
payload encompasses all metadata.
Add the files to the tool.
User guide – get started
11. User guide – create scenario
1. Switch to the scenario tab. Using the
scenario tab is optional and can be skipped, if
you know which algorithm you want to use.
2. Create a new profile. Enter a profile name
and add a meaningful description.
3. Complete the user scenario questionnaire.
4. Exclude criteria which are not relevant for
your scenario.
5. Adjust the sliders to weigh the criteria.
6. Save your scenario.
7. Investigate the high score, especially the IE
techniques with the lowest distance to your
scenario.
8. Choose one of the suggested techniques,
and click the “use this technique” button.
12. User guide – select algorithm
9. Your chosen IE technique is preselected. If it is displayed in red, it can’t be used for the dataset.
Consider to choose another algorithm.
10.If the IE technique has configuration options, a configuration interface will be shown. Configure the
algorithm, if necessary.
11.Be sure that the encapsulation
option “Use selected algorithm”
is selected.
12.Press the encapsulate button.
The results will be shown in an
output panel at the bottom of the
encapsulation tab.
13. User guide - decapsulation
13. For decapsulation switch to the decapsulation tab.
14. Your encapsulated information are shown here as
option to be decapsulated. (If the tool was closed,
you have to add the file manually with the add button.)
Select them.
15. Select the algorithm, which you used
for encapsulation. This will save a lot
of time, which the tool would need
to guess the used algorithm.
16. Press the decapsulation button. The
original files should be shown at the
output panel of the Decapsulation tab.
17. Select one of the output files,
to get some further information.
18. Close the tool. Done!
15. • PeriCAT can be integrated into other tools and used via its API, which is
accessible through the PeriCAT class.
• Alternatively PeriCAT’s API can be called via command line parameters.
The PeriCAT API
• The API provides:
•access to the IE algorithms with PeriCAT.ALGORITHM_NAME;
•a method to encapsulate a carrier and a list of payload files with a specific
algorithm;
•a method to encapsulate a carrier and a list of payload files with the best fitting
algorithm for a scenario - the scenario is passed as configuration file, which can
be created by PeriCAT beforehand;
•a method to decapsulate encapsulated files with a specific algorithm;
•a method to decapsulate encapsulated files without the knowledge about the used
algorithm - in this case the used algorithm will be guessed, which needs time for
the calculations.
17. • PeriCAT is developed using Eclipse.
• We used the m2e Eclipse plugin for maven integration
• and EGit for git integration.
• To set up the project in Eclipse you will need a Java 7 installation, or higher.
Setting up an Eclipse project
18. • Following source code packages with corresponding test sources to be found at src/test/java:
• main: contains the main class, which handles also command line start parameters, furthermore
classes for configuration, logging, the API for the integration into external tools (PeriCAT class),
and finally the PET-Adapter.
• model: contains basic data structures for data sets and the user scenario and classes to
construct the bytestream of a payload sequence, including restoration metadata.
• view: contains all the graphical user interface classes. The GUI class refers to three GUITabs,
which contain different GUIPanels.
• controller: contains the main controller of the application, the “PeriCATController” together with
its builder which is called once at tool start. The other controlling classes (except decision
mechanism) take care of more specific controlling tasks, e.g. handling of load and save,
encapsulation, decapsulation etc.
• decisionMechanism: contains all controlling classes for the decision mechanism.
• algorithm: contains a wrapper class for each integrated IE technique.
The package structure
19. • How to integrate new IE techniques into the tool?
• Navigate to the algorithm package
• Create a new class for your algorithm which extends the
AbstractAlgorithm class.
• Let Eclipse add all unimplemented methods automatically.
The other algorithm classes can be used as implementation
examples of these methods.
Integrating new techniques
20. Encapsulate:
• This method gets a list of carrier and payload files and returns mostly one file which
is the encapsulated file. The method can return more than one file, if necessary, but
this is not common.
• Some algorithms, mostly packaging techniques, will handle carrier and payload files
as equal.
• The distinction is for embedding algorithms which need to handle carrier and payload
in a different way.
• The algorithm can use a model.PayloadSegment to store the payload together with
restoration metadata in a well defined way.
Algorithm classes
21. Restore:
• This method gets one encapsulated file and returns the restored original files, as far
as practicable.
• Some algorithms are only able to restore carrier or payload file.
• Compare the model.RestoredFile class. It contains information from the restoration
metadata which are provided to the user.
• There is an option for adding a note for the user, e.g. to explain why one of the files
can’t be restored.
• A lot of information that has to be stored in the RestoredFile can be received from the
model.PayloadSegment’s restoration metadata, if the algorithm has used this class.
Algorithm classes (2)
22. getName:
• This method returns a name string for the algorithm for the graphical user interface.
getDescription:
• This method returns a description string for the algorithm for the graphical user
interface. Don’t skimp on explanations!
defineScenario:
• This method defines an ideal use scenario in which the algorithm should be used
(part of the decision mechanism). It suggests an algorithm based on a user scenario.
•Create a Scenario at the method with a name sting like “my algorithm scenario”
•check with the model.Criterion class to see a list of currently 11 criteria which have
to be defined for a algorithm scenario
•use scenario.setCriterionValue (CRITERION, YES/NO); for each of the listed
criteria to characterise a algorithm
•return the scenario
Algorithm classes (3)
23. configureCarrierFileFilter/configurePayloadFileFilter/configureDecapsulationFile
Filter:
• We use the apache commons SuffixFileFilter to define which file types are allowed as
carrier files, payload files, or as input for the decapsulation.
• The file filters are used automatically before encapsulation or decapsulation.
fulfilledTechnicalCriteria:
• This method is called automatically similar to the file filters before encapsulation.
• Any technical constraints for encapsulation for the carrier and payload files can be
defined here, e.g. if the payload size fulfils the constraints to use the algorithm.
• Algorithms for which the user’s data set can’t fulfil the technical criteria or the file type
check are displayed as red at the graphical user interface.
• In parallel to your algorithm class, create a unit test class at src/test/java/algorithm.
This is highly recommended.
Algorithm classes (4)
24. • The maven pom.xml file can be used to add external sources to the project, if these
sources use maven.
• If you want to add external sources which are not using maven, create a directory at
the ExternalTools directory and copy the sources there.
• Once the algorithm is finished open the main.Configuration class
• Add your algorithm to the list of available algorithms at the getAlgorithms() method.
• This is also the place where you can comment out techniques, which you want to
exclude from your PeriCAT instance.
Algorithm classes (5)
25. Integrating new decision criteria
• Use your new criterion at the ideal scenarios of the algorithms. This has to be done
manually for all algorithms. The default weighting value of a criterion is 50 (middle),
if it is not set for a scenario.
• If the number of integrated algorithms increases, the necessity to distinguish between
the algorithm characteristics becomes more urgent to refine the decision mechanism.
• The introduction of new decision criteria is quite easy.
• Open the model.Criterion class: Add a static string to the top of this class
which identifies your algorithm, similar to the existing ones
• Open the model.Scenario class: Create a new Criterion at the Scenario constructor
using the ID which you defined at the Criterion class, similar to the existing criteria.
The description is used as tool tip at the graphical user interface.
• Add your criterion at the bottom of the constructor with addCriterion(criterion);