Preservation Events Controlled Vocabulary



Questions for Consideration

  1. What are your thoughts on the terms introduced?
    1. Are they consistent with your organisation’s view of digital preservation? If not, what is wrong?
    2. What is missing?
    3. What should not be included?
  2. What are your thoughts on the notion of “core” preservation events?
  3. If you utilise a different controlled list, can this be mapped consistently to it?
  4. The current controlled vocabulary uses the term “Validation”. The draft contains the term “Format Validation”, which is more specific. What are the ramifications for you if the new term is kept in the final version?
  5. Is there anything else that should be supplied with the final version to make it more useable?


Timescale for Comment

This review process is now open and will close on 29th February 2016

For more information or Peter McKinney (Peter.McKinney[@]



# Term Core Event? Definition/description Link Comment
1 accession   The process of adding an object to the inventory of a repository. This provides a clear delineation point for the assumption of responsibility for the digital content’s preservation.    Peter McKinney [Institution]: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. 
2 adding emulation information    Adding to an object's metadata, information necessary in order to emulate that object (such as the application used to create the object). This is a special case of "Information package modification".   

Libor Coufal [National Library of Australia]: The definition needs clarification as it is not the object which is emulated but rather the object is rendered in an emulated environment. Also, I am not sure that the emulation information would be added to the object's metadata, e.g. if you wanted to use emulation to render objects in the MS Works format, would you be adding (the same) emulation information to each object in the format? Probably not... I am also not sure what the use case for this event metadata is, possibly other than knowing how old/up-to-date the information is...?


Evelyn McLellan [Artefactual Systems]: I agree this term is problematic. In particular, I think it's too specific & wonder whether "metadata modification" would serve as an umbrella term for adding emulation information (see term #32, below), if a user feels some kind of event is needed at all.

3 appraisal   The process of assessing whether a package of digital material should be included in the repository. It is recorded when an organisation accepts or rejects curatorial responsibility for a package, usually due to verification failure or a failure to meet expected standards. [eventOutcome being accepted or rejected]  

Bertram Lyons [AVPreserve]: Should event outcome include more than two options here? Some appraisals result in partial accessions - e.g., "We'll take some but not all."

Evelyn McLellan [Artefactual]: Agree. Maybe add partially accepted as a possible eventOutcome?

Lina Bountouri [EU Publications Office]: Agree.


Libor Coufal [National Library of Australia]: Sometimes appraisals may happen after ingest (for practical reasons or because of an archive or institution's retention policy which may stipulate that records need to be retained for certain time, e.g. 30 years, after which they may (but may not) be disposed of. Could the definition be updated to reflect this? E.g. "should be included or retained in the repository"...?

Lina Bountouri [EU Publications Office]: Agree.

4 capture core The process whereby a repository actively obtains an object. Includes the notion of "receipt"

Libor Coufal [National Library of Australia]: I am confused by the second sentence - what does it mean in this context? Isn't the notion of "receipt" passive, i.e. the opposite of active capture?


Evelyn McLellan [Artefactual Systems]: I agree, I think that sentence should be removed from the definition.

5 compression   The process of coding data to save storage space or transmission time.

Bertram Lyons [AVPreserve]: I know this is an old vocab item, but should compression and decompression distinguish clearly in the definition that this is about lossless approaches (e.g., zip, tar, etc.)? One could imagine an organization applying compression or decompression in a lossy way as well (even if one would not want that -- e.g., WAV -> MP3). 


Evelyn McLellan [Artefactual Systems]: It may be better to capture information about the lossiness of the compression in the eventDetailInformation or eventOutcomeInformation instead.

6 creation   The act of creating a new object.  
7 data carrier migration   A transformation of an object resulting from the creation of a copy on a more contemporary data carrier.   Bertrand Caron (BnF): I am not very keen on the word "transformation", which suggests that the Packaging information, Content information or PDI were modified, which doesn't seem the case (I am just trying to figure out which OAIS migration.type it corresponds to: duplication? transformation?) Couldn't it just say "The act of copying an object on a more contemporary data carrier"?
8 deaccession   The formal process of removing an object from the inventory of a repository. This may be by transfer to another repository, return to the depositor or by permanent deletion.  
9 decompression   The process of reversing the effects of compression.  
10 decryption core The process of converting encrypted data to plain text.

Evelyn McLellan [Artefactual]: Can we also add encryption as an Event? It may be desirable to describe this as a pre-ingest Event performed outside the repository, or some institutions may want to encrypt AIPs for storage or transfer to another repository.

Libor Coufal [National Library of Australia]: I second Evelyn in that encryption should also be added.

11 deletion core The process of permanently destroying an object in a repository. Libor Coufal [National Library of Australia]: Could "destroying" be replaced, e.g. with "removing" (which btw seems to be commonly used in this sense in other parts of this document and elsewhere, such as the referenced LOC definition). Also, does "soft" and "hard" deletion need to be distinguished?
12 deselection   A file or representation which is described and defined in the packaging information but is NOT ingested on purpose. This might happened if e.g. a file is migrated prior to ingest and only the migrated copy is kept. To provide a complete audit trail the original file has to be defined and has its own PREMIS record.    
13 digital signature validation   The process of determining that a decrypted digital signature matches an expected value.  
14 dissemination   The process of transforming one or more archival information packages (AIP) into a dissemination information package (DIP) for use outside of the preservation repository    
15 file extension change   Assignment of a new filetype extension to a file object; typically done only if the existing extension was found to be incorrect.    
16 file system analysis   The process of analysing one or more filesystems from raw or forensically packaged images  
17 file system extraction   The process of extracting one or more filesystems from raw or forensically packaged images    
18 filename change   Removal of prohibited characters from file and directory names or other changes to conform with best practice file-naming conventions.

Bernadette Houghton [Deakin]: Many organisations also require filenames to conform to a specific pattern.

Bertrand Caron (Bibliothèque nationale de France): I'd prefer a more generic definition for "filename change": at the BnF, our choice for renaming is an entire replacement of the original filename by a sequential series of numbers, recording the original filename and original extension in a premis:originalName. I would suggest something like "A modification of a filename, either a removal of prohibited character, or a partial or entire replacement of the original filename".

Lina Bountouri [EU Publications Office]: Agree with Bertrand, different rules may be applied for file naming and we shouldn't specify in the definition specific file naming policies. 


19 fixity check core The process of verifying that an object has not been changed in a given period.  
20 forensic feature analysis   The process of forensically analysing raw bitstreams    
21 format identification core Identification of the object's file format and version (note: this event is different from 'validation' which compares the object to known format specifications)    
22 format validation   The process of comparing an object with a standard and noting compliance or exceptions.  
23 identifier assignment core Assignment of an identifier – a special case of information package modification    
24 imaging   The process of extracting a disk image from physical media    
25 Information Package merging    Recorded when Information Packages (SIP, AIP, DIP) are merged together.    Bernadette Houghton [Deakin]: not sure what this means. When the packages are merged into one file, or into one package?
26 Information Package splitting    Recorded when Information Packages (SIP, AIP, DIP) are split  apart.  

 Bernadette Houghton [Deakin]: not sure what this means. When the packages are merged from one file, or from one package?


Evelyn McLellan [Artefactual Systems]: The use case could be for example when a large AIP is broken up into smaller chunks for placement in certain types of storage systems. For example, placing a large AIP into LOCKSS storage can require the AIP to be arbitrarily split into smaller chunks because LOCKSS networks often have filesize limits built into them. Information Package merging (above) could be defined as the process of reversing the effect of Information Package splitting.

27 ingest end core Completion of the total ingest process.     
28 ingest start core An event will be generated when the ingest process is started and the ingest process will be completed when an approval/acceptance event is recorded.)    
29 ingestion core The process of adding objects to a preservation repository. More detail can be gained by utilising "Ingest Start" and "Ingest End" rather than this one event.   

Angela Di Iorio [Sapienza University Rome]: The start and the end of the Ingestion "EVENT" is not already captured by the EventDateTime semantic unit?

Maybe a better structural refinement about  EventDateTime should cover this need. In the PROV-O mapping, I realized that there is this important difference between PREMIS and PROV-O: PREMIS requires just the structural conformance to a datetime format and doesn't provide a mechanism for distinguishing the starting and ending time for an Event. In the PROV-O the distinction is made for activities (startedAtTime,endedAtTime) and for instantaneous events that in PROV-O are classified as generation, usage or invalidation. Because PREMIS is domain-specific, this distinction associated with the Event type vocabulary would refine the information. Maybe similarly to the Rights statement, semantic units for startDateTime and endDateTime should be provided for Events that need time to be completed and a semantic unit (i.e. atTime), for capturing the time of instantaneous events.

30 message digest calculation core The process by which a message digest ("hash") is created.  
31 metadata extraction (propertyExtraction) core Extraction of technical (or non-technical) metadata like the resolution, colordepth etc. from a file using tools such as JHOVE.    
32 metadata modification   Changes to the metadata about an object. Recorded when a package or file has been modified, added or deleted    
33 migration core A transformation of an object creating a representation in a more contemporary format.

Libor Coufal [National Library of Australia]: The use of the term "migration" (as it is used here) in the community is inconsistent with OAIS (which uses "transformation') but that's for a completely different discussion... Anyway, I am not sure that "more contemporary" is a good characteristics of a target format nor that it is the best way to define migration. I would say that the definition of normalization would probably be better suited here (although not perfect), or just "in a different format" as it may not always be the case that the format is more contemporary.


Bernadette Houghton [Deakin]: Agree with Libor that "more contemporary format" is not the best description. Maybe "useable" or "preservation-friendly"?


Lina Bountouri [EU Publications Office]: I believe that we must either be more neutral in the adjective we are using, such as saying "a new format" or use the "preservation-friendly".  

34 modification   Changes to the metadata about an object and/or the act of changing a file or bitstream after receipt of the object, but before the object is ingested into the repository.   Lina Bountouri [EU Publications Office]: In the Publications Office, we might ingest METS packages that will update (modify) the metadata and/or the object but there is no definition of which of the two or if this action is for both of them.  
35 normalization core A transformation of an object creating a representation in a supported preservation format. Libor Coufal [National Library of Australia]: I don't think that the definition captures the distinction with migration - isn't the purpose of migration also to create a representation in a supported preservation format? Normalization (as I understand it) means that all objects in the repository are transformed into a limited number of selected formats - often/usually at or before ingest. E.g. all word-processing documents and "plain" PDFs are migrated into PDF-A.
36 object modification   The act of changing a file or bitstream after receipt of the object, but before the object is ingested into the repository.  

Bertram Lyons [AVPreserve]: What terms are used if a repository "changes a file or bitstream" after the object is ingested into the repository? I am unclear of the intention of setting the "before the object is ingested" qualifier in this definition. 

Evelyn McLellan [Artefactual systems]: Agree. Maybe just limit the definition to "The act of changing a file or bitstream".

37 object validation core  Structure and compliance validation of the Object (e.g. an AIP)   Evelyn McLellan [Artefactual systems]: This is a little confusing. Maybe the definition should be "Information Package validation"? 
38 quality review   recorded when quality review is performed and noted as passed or failed  

Libor Coufal [National Library of Australia]: The description does not define what quality review is...


Evelyn McLellan [Artefactual Systems]: Agree, "quality review" is vague and is used in both the term and its definition. Would it make better sense to use "Information Package validation" instead, and define it as "The process of verifying whether an Information Package conforms to pre-determined specifications" or something similar?

39 quarantine   Segregate objects for designated period of time (e.g. before running a virus check)    
40 recovery   The act of regaining one or more files after a disaster. Usually occurs as part of a disaster recovery process.     
41 redaction   The process of modifying the content of a digital object to remove or mask information considered to be sensitive in nature (that is, the information cannot be viewed by non-authorized users of the repository).

The process of eliminating potentially private and sensitive data from a disk image or copy thereof.
42 replication    The process of creating a copy of an object that is, bit-wise, identical to the original.  
43 SIP creation  core Creation of the SIP   Evelyn McLellan [Artefactual Systems]: This is quite vague. What is the act that constitutes the creation of a SIP? Putting digital objects and metadata into a container of some kind? Moving them to a certain location, or maybe adding some kind of content or packaging information?
44 storage migration   A change to an object’s storage location    
45 transmission    The process of transmitting to a repository metadata and/or digital object(s). Transmission usually comes before ingestion.    
46 unpacking   Extracting objects from packages (e.g. .zip, .tar)  

Bertram Lyons [AVPreserve]: How is this semantically different from the current "decompression" term - which seems to assume the same action as this "unpacking" term? 

Evelyn McLellan [Artefactual systems]: Unpacking does not necessarily involve decompression. Some types of packages are not compressed, so the action is simply to extract the contents from the package. Bertram Lyons [AVPreserve]: Right, but in some cases .zip and .tar are actually compressing and decompressing (not just unpacking). I'm wondering if we need clearer events in general, such as lossless compression/decompression; lossy compression/decompression; packaging/unpackaging (no compression is assumed here).

Evelyn McLellan [Artefactual systems]: Whether compression is lossy or lossless could go into eventDetail or eventOutcomeDetailNote or similar.

Evelyn McLellan [Artefactual systems]: If we add unpacking as an Event, can we also add packing (or unpackaging and packaging)? I can see a packing/packaging Event being used when an AIP is packaged into a bag, for example.


47 unquarantine   Release of a file from quarantine    
48 virus check core The process of scanning a file for malicious programs.  
49 wellformedness check   Checking if a file is wellformed. Often validation checks already include wellformedness checks. In any case the result of those combined checks is recorded in two separate events.    



Core Events Discussion

The Committee has introduced the notion of core preservation events. These are events that may be deemed to be central to any organisation wanting to preserve digital objects in the long-term. The concept has not been applied to a Library of Congress preservation vocabulary before and hopefully will generate community discussion; for example, if an event is core, does the organisation have to record it for every object, or should it mean that an organization is capable of recording such an event if it is applicable?



Peter McKinney [Institution]:  Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. 


Bertram Lyons [AVPreserve]: My first comment about core events is that the idea of PREMIS is to scale across many digital repository types (not just archive or library repositories) - any core designations would have to be generic enough to meet any repository use cases in the context of ISO 14721 and ISO 16363. I think a normative approach is less valuable than a recommended approach. Ideally, any organization documenting preservation events would want to be capable of documenting ANY event that was necessary to take on the objects under the care of that repository. My guess is that if the concept makes it into this controlled vocabulary then it is core by default -- something someone should document if needed.