What is CLOCKSS?
The Mersenne Centre uses the services of CLOCKKS to perpetuate the archives of its articles.
CLOCKSS means “Controlled LOCKSS (Lots of Copies Keep Stuff Safe)”. CLOCKSS is a permanent archiving solution using a data replication system.
CLOCKSS hosts, as of November 2021, 46 million articles, 25,000 journal titles and 260,000 books, complete with metadata. Normally, CLOCKSS only archives the data to which the publisher has given it access. Twelve mirror repositories from major academic institutions ensure that the stored data is always current: if one of the repositories does not match the mirror sites, it will receive corrections from the others. When a trigger occurs, CLOCKSS makes the documents available to all, in an open access model.
Example of the triggering of the document availability system by CLOCKSS following the disappearance of two titles from the SAGE platform.
Objectives of sustainable archiving
Perennial digital archiving has, according to the CINES, three main objectives:
- “Keep the document,
- make it accessible,
- to preserve its intelligibility.”
These objectives are designed for the very long term, i.e. over 30 years.
A classic backup meets the first two objectives: to preserve the document and to be able to communicate it. Preserving the intelligibility of documents is central to long-term archiving. Indeed, the CINES highlights the four major risks that threaten a file:
- “material obsolescence,
- software obsolescence,
- the obsolescence of the file format,
- the loss of the meaning of the content.”
Material obsolescence is simple to understand. First of all, the medium itself can deteriorate: for example, an old CD will deteriorate over time. The medium can also become obsolete and unreadable if it requires a drive: who nowadays still has a floppy disk drive on their computer?
Against software obsolescence and file format obsolescence, the use of standard, normalised formats is to be preferred in order to avoid being dependent on a proprietary solution. Indeed, there is no indication that the software used to create the document will still exist in 10 years time. The CINES publishes a list of archivable formats on its archiving platform, which gives a good overview of sustainable formats. If, however, the format used in the archive disappears, a format conversion should be considered, ensuring that the integrity of the data is maintained.
The fight against the loss of meaning of documents is specific to long-term archiving. Unlike backup, which only retains the document, archiving also retains metadata associated with the document. The first, general level of metadata in Dublin Core is used to describe the document: title, creator, subject, date, format, language, copyright, etc. It is necessary to add technical metadata, which is necessary to ensure the durability of the document.
The Mersenne Centre has chosen the JATS metadata format.
For more information on sustainability metadata: https://www.cines.fr/archivage/un-concept-des-problematiques/les-metadonnees-de-perennisation/