Moving data from the point of creation or initial use to preservation may be done in a number of ways. The key aim is to ensure that information about the digital objects, which almost certainly only the creators may have, is transferred. For example the Representation Information – not just the formats (structure) but also the semantics and other RepInfo such as the software needed to use the digital objects. In OAIS terminology, Archival Informational Packages (AIP) must be created.
It should be noted that one of the reasons to create AIPs is that if at some point the repository cannot continue in its preservation activities for this information, for example if the repository closes down, then it is possible to hand on the AIPs to the next in the chain of preservation. This ensures that all the information needed to allow the digitally encoded information to be preserved is handed on and nothing is forgotten.
An important decision to be taken by the repository is the definition of the Designated Community – those types of users for whom the repository guarantees usability.
Asset base |
|||
Issue | WP/Project/Tools/Services | Asset |
Evidence |
Planning hand-over from creators |
PAIS and PAIMAS standards |
Templates for defining the hand-over process |
|
APARSEN deliverable [Download not found] |
The report explains the importance of the mutually dependent goals annotation, reputation and data quality. Annotations do not necessarily have to be available at the time of ingestion but it is probably the time when most annotations are added and thus a major issue of ingest. The report gives an overview of issues which should be considered when developing a research data repository as well as when annotating research data. |
||
Definition of Designated Community |
APARSEN WP25 |
CASPAR deliverables and published papers |
|
SCIDIP-ES GIS service |
Tool to define Designated Community and associated user feedback |
||
Creation of AIPs |
SCIDIP-ES Packaging, plus Preservation Strategy Toolkit and RepInfo Toolkit. |
Tools for creating AIPs. Support from evidence collected by CASPAR. SCIDIP-ES user feedback |
|
Preservica (Ingest workflow and standalone packaging application) and KoLibRI, RODA, Rosetta, etc. |
XIP metadata schema based SIPs, AIPs (& DIPs). |
Demonstrable ingest workflows available in preservation systems. (See [Download not found] for screenshots of ingest workflows from partner test environments). |
|
Association of Persistent Identifiers (PI) for people and digital objects on ingest |
APARSEN WP22 |
PI Interoperability Framework |
|
SCIDIP-ES HAPPI toolkit |
PI creation |
||
Automation of the extraction and harmonization of the embedded (or implicit) metadata from the various file formats |
Tool PreScan that was developed in the context of the CASPAR project |
The Pre-Scan tool itself. Experience in building tools that extract the embedded metadata in from digital files, and produce harmonized warehouses of metadata. |
It has been tested. The results are reported in the publication Yannis Marketakis, Makis Tzanakis, Yannis Tzitzikas: PreScan: towards automating the preservation of digital objects. MEDES 2009: 404-411 |
Capability to ingest a great variety of data |
PANGEA Data Publisher for Earth and Environmental Science |
PANGEA has long standing experience in ingesting data from a wide variety of customers. |
Gaps |
No common shared technical implementation for how to package up SIPs (or AIPs) for ingest in to any repository. All ingest processes use their own bespoke implementation to create SIPs, resulting in minimal if any integration with other repository systems. Typically requires bespoke software engineering tasks to export / transformation / import AIPs from one repository system into another. No simple interchange of AIPs between systems. |
Automation of the extraction and harmonization of the embedded (or implicit) metadata from the various file formats. Although this approach can produce automatically big warehouses of harmonized metadata, at no cost, it is not widely known or used. |