Archonnex is a new software architecture developed by ICPSR for digital assets management systems. Built using modern technology stack to meet the current and emerging needs of social science research.
1. Welcome to
Technical Data Infrastructure Frameworks
Archonnex @ ICPSR
Data Science Management For All
Harsha Ummerpillai, Architect / Software Lead
Tom Murphy, Director of Computing and Network Services
2. About ICPSR
Mission
ICPSR advances and expands social and behavioral research, acting as a
global leader in data stewardship and providing rich data resources and
responsive educational opportunities.
About
An international consortium of more than 700 academic institutions and
research organizations, ICPSR provides leadership and training in data
access, curation, and methods of analysis for the social science research
community.
ICPSR maintains a data archive of more than 500,000 files of research in
the social sciences. It hosts 16 specialized collections of data in
education, aging, criminal justice, substance abuse, terrorism and other
areas of social research.
3. Introduction
Archonnex is a Digital Asset Management Systems (DAMS)
architecture defined to transition ICPSR to a newer
technology stack meeting core and emerging business needs
of the organization. It aims to build a digital technology
platform that leverages ICPSR expertise and open source
technologies that are proven and well supported by Open
Source communities.
4. Guiding Principles
Comprehensive Digital Asset Management Platform.
Open Archival Information Systems (OAIS model) compliant.
Multi-tenancy. ICPSR needs to support multiple archives and agencies.
Secure. Privacy and Security are primary concerns for social research data.
Service Oriented and Modular.
Scalable; Ability to handle large datasets and peak activity spikes.
Open Source technologies with good community engagement;
Enable standards based metadata harvesting and data exports.
Cohesive technology choices.
Flexible UI components that can be re-used and enables faster development.
5. Message based Integration
Apache ActiveMQ is the messaging server.
Apache Camel provides a simplified implementation of most common
Enterprise Integration patterns.
Figure: High level view of Camel's architecture (from Camel in Action).
9. Multi-tenancy
All service components support multi tenants.
Supports tenant specific configuration & preferences.
Web aspects of service components are embeddable
within respective tenant Web apps
Workspace Manager and Search Manager are two
examples of UI Plugins that are embeddable.
10. Single Sign On & ID Management
Central Authentication
and Identification System.
Supports ORCID and social IDs Google,
Facebook & LinkedIn.
Authorization management will support role
based access controls
11. Deposit Manager
Supports (SIP) data ingest & storage,
coordinate virus scanning,
statistical file validation, variable extraction,
image processing &
metadata extraction.
Easy to use Workspace & file management
Ability to publish at granular level.
Embeddable UI Plugin supporting
tenant specific configuration
Supported protocols for ingest, HTTP, SFTP and Email
Integrates with BPM/Workflow Engine
12. Preservation Manager
Implements transcoding of data specific to MIME types
Generates Archival Information Package (AIP) &
Dissemination Information Package (DIP).
Replicates AIP to storages for long term preservation.
Performs Fixity checks periodically.
13. Search Manager
Full featured text search using
Apache Solr.
Embeddable UI Plugin supporting
tenant specific configuration
Coverage includes but not limited to Keywords, Metadata,
Text and Geospatial fields.
Exploring GeoBlacklight for search and dissemination
of geographical data.
14. Anti-Virus Scanner
Anti virus scanning as a service.
Supports ClamAV and Sophos.
Capable of expanding to
multiple nodes allowing horizontal scalability
to support scanning large data sets.
15. SPSS Processor
Performs additional processing of IBM SPSS files.
Analyze and report potential missing
variables and inconsistencies.
Extract variables and store
for online analysis tools.
16. Open API
RESTful services to
enable metadata harvesting and exports
using industry wide standards and formats.
Example: RDF, JSON-LD, DDI XML…
17. Workflow Engine
Central workflow management providing unified
action list for users.
Ability to model business process flows and
Integrate with technical components.
Activiti is the chosen technology.
18. Reports & Analytics
Captures all system & user activities within the components
enabling effective provenance data collection.
Central consolidated storage for all the logs.
Ability to discover, visualize and report on data collected.
ElasticSearch & Kibana
Google Analytics (Client & Server side)
19. Content Specific Processors
Add on modules that can derive and extract
custom attributes. These modules can be
invoked using messages and added to the processing
pipelines.
For example image files can produce
thumbnails for easier display on GUI.
Image Processor module performs this function.
Geospatial data published to an
Apache Geoserver.
20. Geo Tagger
Add on module that can derive and process
geographical information from inputs
like street address, IP address, shapes on a map
or markers on a map.
Will generate geo tag information for display
and support search capabilities.