WP25: Interoperability and intelligibility

Objectives

Research and development of techniques to support interoperability of data between organisations and disciplines.

Background

According to the IEEE definition interoperability refers to “the ability of two or more systems or components to exchange information and to use the information that has been exchanged”. Various aspects or layers of interoperability have been identified, mainly: Syntactic interoperability. If two or more systems are capable of communicating and exchanging data, they are exhibiting syntactic interoperability, which is required for any attempt s of further interoperability. Specified data formats, communication protocols and the like are fundamental. For instance, XML or SQL standards provide syntactic interoperability. This is also true for lower-level data formats, such as ensuring alphabetical characters are stored in ASCII format in both of the communicating systems. Semantic interoperability. Beyond the ability of two or more computer systems to exchange information, semantic interoperability is the ability to automatically interpret the information exchanged meaningfully and accurately in order to produce useful results as defined by the end users of both systems. To achieve semantic interoperability, both sides must defer to a common information exchange reference model. The content of the information exchange requests should be unambiguously defined: what is sent is the same as what is understood.

Focus of this WP

Digital preservation has been termed “interoperability with the future”. Regarding syntactic interoperability, special attention will be dedicated to the metadata and standard protocols in the sector with specific reference to analyse significant properties according to OAIS model. Case studies will be developed in specific domains (like the case of the Italian universities’ networks where interoperability services based on syntactic framework will be planned also with reference to the preservation issues). Regarding semantic interoperability, we will address techniques and issues related to the use of ontology to identify and qualify information sources, including a) character set or representation (b) language interoperability, and the issues described within Task 2530.

Furthermore, we will also investigate collaborative methods for tackling such issues. We should mention however that the crux of the interoperability problem is that digital objects and services have various dependencies (syntactic, semantic, etc). We cannot achieve interoperability when the nvolved parties are not aware of the dependencies of the exchanged artefacts. One general approach to tackle this problem is standardization. From the dependency point of view, standardization essentially reduces the dependencies or makes them widely known (it does not vanish dependencies). Apart from developing standards, it is worth investigating more flexible methods for tackling the interoperability problem. A rising question is whether we could tackle the interoperability problem without having to rely to several and possibly discrepant standards. It is worth investigating whether we could establish a protocol for aiding interoperability on the basis of explicit dependency management.

To facilitate practical interoperability we also need to share ideas and reach common views on virtualisation of different types of data, particularly those outlined in the Warwick workshop

Description of work and role of partners

Task 2510 Research and development of common services and models to support interoperability.

In this task we will gather the conceptual models, services and formats that are used by the partners and try to develop common conceptual models, virtualisation of data, management, storage etc to facilitate practical interoperability, services and formats that tackle the indentified discrepancies. This includes conceptual models for exchanging provenance metadata (e.g. CRM Digital and OPM). We will establish collaborations with relevant standardization bodies and stakeholder communities on new standards.

Task 2520 Intelligibility Modelling and Reasoning

There is a need for services that help archivists in checking whether the archived digital artefacts remain intelligible and functional, in identifying hazards and obsolescence risks and investigating the consequences of probable losses. To tackle these requirements [48] [T, DEXA’07] showed how such services can be reduced to dependency management services, while [47] [TF, ECDL’07] extended that model with disjunctive dependencies. Central notions of these works is the notion of module, dependency and profile. In a nutshell, a module can be a software/hardware component or even a knowledge base expressed either formally or informally, explicitly or tacitly, that we want to preserve. A module may require the availability of other modules in order to function, be understood or managed (e.g. OAIS RepInfo). A profile is the set of modules that are assumed to be known (available or intelligible) by a user (or community of users), so this is an explicit representation of the concept of OAIS KB. Based on this model, a number of services have been defined for checking whether a module is intelligible by a community, or for computing the intelligibility gap of a module. GapMgr is a system which has been developed based on this model, and has been applied in the context of the EU project CASPAR.

In the context of this NoE we will attempt to extend the framework of task-based dependencies. One promising direction is to found the extended framework on Horn Rules. The proposed framework and methodology, apart from simplifying the disjunctive dependencies of [47] [TF, ECDL’07], is expected to be more expressive and flexible as it will allow expressing the various properties of dependencies (e.g. transitivity, symmetry) straightforwardly. Subsequently we plan to elaborate on the inference services required for task-performability, risk-detection and computing intelligibility gaps. In addition we will evaluate various implementation approaches, e.g. implementations over ORDBMS (Datalog queries through Recursive SQL), Semantic Web (Ontologies and Rules SWRL). It is worth noting that due to disjunction there is not a unique way to fill an intelligibility gap. To tackle this problem we will elaborate on abductive reasoning for computing intelligibility gaps.

Task 2530 Semantic Interoperability and Scientific Data

Activities related to semantic interoperability, ontologies and knowledge bases have been growing in relevance within Earth Observation (EO), and other disciplines. Within the EO domain there is a clear need to cope with needs ranging from knowledge capture (e.g.: for the description of Ground Segment components) to support semantic access to EO resources (e.g.: for the identification of relevant EO products) to preservation attributesm identification. Different information organisation techniques are employed ( like thesauri, ontologies, topic maps), and various thesauri / dictionaries have been developed by a number of institutions: General Multilingual Environmental Thesaurus (GMET) by the EEA, Wiktionary by Wikipedia, Eurovoc by the EC Publications Office, Semantic Web for Earth and Environmental Terminology (SWEET) by NASA, are some of the high relevant European and international initiatives. To support semantic access to EO resources relevant for a particular application domain, we can identify suitable tools and information organisation techniques, but there are often unbreakable barriers, for various and different reasons, which prevent reusing existing thesauri / dictionaries, issue which is exacerbated when preservation issues need to be taken into account. Within this task we will address the limitations and barriers, establishing a networking capability with the objective to overcome them, taking into account a set of common high level objectives and requirements to be agreed upon. Such semantic interoperability high level objectives should permit application experts to easily identify within the archive the EO missions, sensors or products useful for their activity, using familiar semantic terms pertaining to their application domain and to follow-up and identify relevant preservation attributes. The baseline objectives to be given as input to the task will be agreed upon with experts via workshops and networking events. We will use as seed discussion elements the objective: to permit an easy, semantic identification from non-EO domains of relevant EO resources and of their preservation attributes; to keep ontology and architecture as simple as possible; support multiple application domains and limit dependencies from evolution / changes, taking into account the long lasting objective of long term data preservation.