Preservation-Ingest

Moving data from the point of creation or initial use to preservation may be done in a number of ways. The key aim is to ensure that information about the digital objects, which almost certainly only the creators may have, is transferred. For example the Representation Information – not just the formats (structure) but also the semantics and other RepInfo such as the software needed to use the digital objects. In OAIS terminology, Archival Informational Packages (AIP) must be created.

It should be noted that one of the reasons to create AIPs is that if at some point the repository cannot continue in its preservation activities for this information, for example if the repository closes down, then it is possible to hand on the AIPs to the next in the chain of preservation. This ensures that all the information needed to allow the digitally encoded information to be preserved is handed on and nothing is forgotten.

An important decision to be taken by the repository is the definition of the Designated Community – those types of users for whom the repository guarantees usability.

 

Asset base

Issue WP/Project/Tools/Services Asset

Evidence

Planning hand-over from creators

PAIS and PAIMAS standards

Templates for defining the hand-over process

APARSEN deliverable D26.1 Report and Strategy on Annotation, Reputation and Data Quality (1054)

The report explains the importance of the mutually dependent goals annotation, reputation and data quality. Annotations do not necessarily have to be available at the time of ingestion but it is probably the time when most annotations are added and thus a major issue of ingest.

The report gives an overview of issues which should be considered when developing a research data repository as well as when annotating research data.

Definition of Designated Community

APARSEN WP25

CASPAR deliverables and published papers

SCIDIP-ES GIS service

Tool to define Designated Community and associated user feedback

Creation of AIPs

SCIDIP-ES Packaging, plus Preservation Strategy Toolkit and RepInfo Toolkit.

Tools for creating AIPs.

Support from evidence collected by CASPAR.

SCIDIP-ES user feedback

Preservica (Ingest workflow and standalone packaging application) and KoLibRI, RODA, Rosetta, etc.

XIP metadata schema based SIPs, AIPs (& DIPs).

Demonstrable ingest workflows available in preservation systems. (See D14.1 Report on testing environments (896) for screenshots of ingest workflows from partner test environments).

Association of Persistent Identifiers (PI) for people and digital objects on ingest

APARSEN WP22

PI Interoperability Framework

SCIDIP-ES HAPPI toolkit

PI creation

Automation of the extraction and harmonization of the embedded (or implicit) metadata from the various file formats

Tool PreScan that was developed in the context of the CASPAR project

The Pre-Scan tool itself.

Experience in building tools that extract the embedded metadata in from digital files, and produce harmonized warehouses of metadata.

It has been tested. The results are reported in the publication Yannis Marketakis, Makis Tzanakis, Yannis Tzitzikas: PreScan: towards automating the preservation of digital objects. MEDES 2009: 404-411

Capability to ingest a great variety of data

PANGEA Data Publisher for Earth and Environmental Science

PANGEA has long standing experience in ingesting data from a wide variety of customers.

 

Gaps

No common shared technical implementation for how to package up SIPs (or AIPs) for ingest in to any repository. All ingest processes use their own bespoke implementation to create SIPs, resulting in minimal if any integration with other repository systems. Typically requires bespoke software engineering tasks to export / transformation / import AIPs from one repository system into another. No simple interchange of AIPs between systems.

Automation of the extraction and harmonization of the embedded (or implicit) metadata from the various file formats. Although this approach can produce automatically big warehouses of harmonized metadata, at no cost, it is not widely known or used.

 

Leave a Reply