WP32: Cost/benefit data collection and modelling

Objectives

Create models for the cost of preservation of digital objects.

Description of work and role of partners

Task 3210 Cost parameters

The data holders in the consortium encompass a wide variety of repositories and cost drivers. Many of the cost models which have been published, such as LIFE [37] and [38], are based on a library model with a limited number of parameters collected. The same is true of the work by Beagrie et al [39]. |Fontaine et al [40] collected a very large number of cost parameters; while this limited the ability to produce an explicit models of cost dependencies because there were many more parameters collected for each repository than there were repositories providing data and therefore the number of degrees of freedom was too limited, nevertheless there used a technique of finding similarities for any new proposed repository. Even so, limited attention was paid to do anything other than bit preservation and access; the requirements for preserving the usability of the information were not part of the costings.

There are many related points of view. For example CERN could provide an interesting insight to a discipline where the costs of producing data are enormous, into the billions of Euros, but scant, if any, examples of data preservation and re-use exist. This scenario could stress-test any kind of economic model in extreme conditions. Other data, such as observations of the Earth at particular times, cannot be reproduced because the earth constantly changes.

Task 3220 Cost data collection

Based on the list of cost parameters from task 3210, we will collect cost information, with appropriate anonymisation, from the consortium members and others. If we have had enough data we may be able to model certain aspects of the costs, failing that we will use techniques such as those of Fountain at al, to make cost predictions and compare those to a controlled set of repository data.