Sustaining Consistent Video Presentation

This technical paper addresses approaches to identifying and mitigating risks associated with sustaining the consistent presentation of digital video files. Originating from two multi-partnered research projects – Pericles and Presto4U – the paper was commissioned by Tate Research and is intended for those who are actively engaged with the preservation of digital video.

Presenting digital video consistently is dependent on the design, coordination and quality of all aspects of both the video file and the video player. Specific factors such as what features of a codec are supported by the decoder, and how one colour space is converted to another affect how videos are presented. Media players are of course developed over time – new features are added and bugs are resolved – but while such changes may improve the quality of a player they also create scenarios where a digital media file may play differently in a new version of a player compared to an older one. As a result, the ever-evolving state of media playback technology creates challenges or technical complications for audio-visual conservators who are tasked with ensuring that digital video is presented consistently and as originally intended.

Approaches

In 2014, during the initial discussions that contributed to the development of this paper, conservators and conservation technicians from the Presto4U Video Art Community of Practice Group discussed use cases and difficulties in sustaining particular renderings of digital video over long periods of time. The strategies that were initially considered to help ensure sustained and consistent presentation allied closely with the traditional preservation treatments of emulation, normalisation and migration, but it was also noted that guidance was needed to determine if these treatments were necessary, and which strategy represented the best approach.

In general, implementations of audio-visual files that adhere properly to their associated standards and use the minimum required level of complexity are easier to control technically and would find more consistency in playback. However, many of the tools available to media creators generate files with unnecessary complexities. In addition, the familiar tools used to create media files are often ill equipped to allow for analysis or assessment of the file formats that they create. As a result, an alternate set of tools is necessary for conservators to evaluate digital media and to identify the reasons for possible playback discrepancies.

Emulation

Sustaining video presentations through emulation requires maintaining a player and all of its dependencies. For instance, if a creator determines that a video is intended for presentation through QuickTime Pro 7 this may mean preserving QuickTime Pro 7 as an application along with its underlying set of components as well as an operating system that can support QuickTime Pro 7’s underlying 32 bit QtKit framework. Often video players rely on system codec libraries or audio-visual frameworks working behind the scenes, so it can be challenging to determine what exactly is necessary to document emulation sufficiently.

On a Mac it is possible to review which files are being used by an active application with the lsof (list of open files) command. For instance, while playing a QuickTime file in QuickTime Pro 7,1 open the Terminal applications and run the following command:

lsof | grep QuickTime

The substantial list of files produced will include some which may or may not be related to the media being played but are opened by the QuickTime application. Particularly notable will be the files within QuickTime’s component libraries such as those within /Library/QuickTime or /System/Library/QuickTime. The component files within these directories are used to decode or demux (demultiplex) specific audio-visual formats. Some applications will install components within these folders during installation so the additional features will also affect any application using QuickTime Pro 7. Two computers may both use the same version of QuickTime and same operating system but if the component libraries differ, the players may work inconsistently.

While playing video in VLC Player and running the following command:

lsof | grep VLC

the report will show that most of the files used by VLC are from within the VLC application itself with a much lower reliance on system files. Because VLC is much more self-reliant than QuickTime it is much easier to generate an identical VLC playback environment on a different computer. Additionally the VLC website provides access to an archive of its installation files going back to the early years of its development. In this way although the playback of a particular file may possibly change from one version of VLC to another (due to bug fixes, new features etc.) it is feasible to acquire older versions and emulate a playback scenario from the past. The portability of VLC and availability of versions make it much more suited to emulation strategies than QuickTime Pro 7.

Normalisation

When following a strategy of normalisation, content could be reformatted to a single format (or possibly to one of a short list of formats) that the conservator is better able to control technically. The disadvantage of normalisation is that as a collection becomes more technically diverse it is more difficult to find normalisation formats that sustain the significant characteristics of a diverse collection. Normalising effectively also requires the process to be assessed to identify any possible loss between the original and the resulting normalised copy. If the significant characteristics of the original are manipulated in order to produce a normalised output then playback may be affected. For instance, if an NTSC DV file with 4:1:1 chroma subsampling is normalised by converting it to a 4:2:0 h264 file the colour resolution will be reduced and diagonal lines will appear more block-like than in the original.

For audio-visual normalisation, lossless codecs may be used to prevent additional compression artefacts from affecting the output, but such lossless codecs should be selected that can support various significant characteristics. Codecs such as jpeg2000 and ffv1 offer a great amount of flexibility so that the most popular pixel formats are supported. However, lossless codecs find much lower levels of support compared to lossy ones and lossless normalised copies of original media may use substantially higher data rates compared to the original. For instance, a high-definition h264 file may playback properly but once transcoded to jpeg2000, most modern computers would have difficulty playing high-definition lossless jpeg2000. Playing an HD jpeg2000 file could provide accurate images but playback may stutter or lag. A similar problem exists for uncompressed video, where for large modern frame sizes it may be a challenge to get a disk to read data fast enough to provide a real-time playback of uncompressed video.

Normalising can offer a more expected playback for conservators that gain technical familiarity with the specifications used for normalisation targets. However, the process of normalising itself is dependent on the design of the demuxer, decoder, encoder, and muxer utilised in the normalisation process. If the decoder is using the wrong colour primaries or misinterpreting the aspect ratio then this misunderstanding may become part of the resulting normalised files. The output of normalisation should be assessed to ensure that the result looks like the source with the hope that the normalised copy finds more consistent and sustainable playback.

Working with budgetary and technical limitations, conservators may not have complete control over presentation technologies. A work may be intended for an interlaced display but may have to be presented on a progressive monitor because a suitable interlaced monitor is not available. Or, similarly, a variable frame rate may be presented through a display mechanism that only supports a short list of supported frame rates. For works that do not happen to comply with the technical constraints of the available player a new access derivative must be made to adhere to these constraints. In this way the design of a normalisation strategy may include the production of standardised exhibition formats linked to the technology used by an institution or particular artists rather than for purely archival purposes. It might be necessary to support a range of different players. Creating such derivatives may be a necessary part of facilitating access or display but care should be taken to ensure that the significant characteristics are not needlessly manipulated during this process, but only altered to fit with the restrictions of a particular access or display environment.

Migration

In a migration strategy the media may be maintained as it is. The conservator would track and define its technical qualities and determine how to play the file back properly with modern players. With this strategy there is not a specific need to sustain a specific player (emulation) or change the media to achieve consistency (normalisation), but a consistent presentation is achieved through selection or manipulation of the player. Sustaining consistent media presentation properly through migration requires a more in-depth understanding of the significant characteristics of the work and how they are interpreted (or misinterpreted) by a player that is known to present the video as intended.

For instance, if a video file contains metadata that indicates that a BT.709 colour matrix should be used, but the officially preferred ‘look’ is from a player that presents the colour incorrectly through a BT.601 colour matrix, the discrepancy and target must be identified and documented so that a future presentation could utilise a BT.601 matrix. Another example is where a video file utilises a container that states one aspect ratio whereas the stream states another and the creator of the media is accustomed to a player that prioritises the stream’s aspect ratio for use. This conflict must be well understood by the conservator so that the presentation intended by the media creator may be recreated. In a strategy of migration it is less important to maintain the specific player but more important to maintain the knowledge of how to achieve a particular type of presentation with the video file.

Significant characteristics

When considering the options of normalisation and migration (and to some extent emulation) the identification and documentation of the significant properties of a video file are crucial to maintaining the intended playback and evaluating whether it reproduces successfully. To the greatest extent feasible the significant characteristics of audio-visual media should be sustained throughout conservation and presentation activities and contexts, including within the digitisation of analogue material or the reformatting of existing digital material.

Aspect ratio

The display aspect ratio refers to a ratio of the width of the presented frame to the height of the presented frame. This is not usually determined by the frame size alone. For instance a video with a 720 x 480 frame size (which contains an encoded image that uses 480 rows of 720 pixels each) may be presented at a wide 16:9 ratio or a narrower 4:3 aspect ratio. When a 720 x 480 frame size is presented at 4:3 it may occupy the space of 640 x 480 on a computer monitor. A 720 x 480 frame presented at 16:9 is often not technically feasible because 480 is not divisible by 9, but the image would roughly occupy 853 x 480 pixels on a monitor.

The pixel aspect ratio expresses the ratio of the presentation width of the pixel to the presentation height of the pixel, so a 720 x 480 image may have a pixel aspect ratio of 8:9 (meaning that the pixel is intended for presentation as a thin rectangle rather than a square) and thus have a display aspect ratio of 4:3. The equation goes like this:

( width / height ) * ( pixel-aspect-ratio ) = display-aspect-ratio

Audio-visual files may contain aspect ratio information inconsistencies. Some streams store aspect ratio information and some containers do as well so it is possible for this information to be contradictory. For instance, a DV stream may store an aspect ratio of either 4:3 or 16:9 (only these two are possible). However, this stream may be stored within a container that declares that the DV stream should be presented at 3:2 or 2.35:1 or that the stream should be presented at a ninety-degree counter-clockwise rotation. Usually in the case of such discrepancies the container’s declared information will take precedence, but players may vary per codec or per container in this precedence.

Some containers, such as AVI, do not have a standardised way to document aspect ratio. The same is true for many codecs, such as nearly any form of uncompressed video. As a result some combinations of container and stream, such as uncompressed video in AVI, may not conform to any particular aspect ratio at all. In most cases such video will present with a square pixel aspect ratio so the display aspect ratio will be simply the frame width divided by the height, but in many cases this is wrong. For instance a videotape digitised to uncompressed video in an AVI container with a frame size of 720 x 576 may have originally presented at a 4:3 display aspect ratio but the resulting AVI file will likely present at 720:576 which is equal to 5:4. This will make the video appear somewhat stretched but could be compensated by adjusting the player.

Newer versions of the specification for the QuickTime container use an ‘aper’ atom to store various aspect ratio instructions for different intentions.2 Here there may be different aspect ratios for ‘Clean Mode’, ‘Production Mode’ and ‘Classic Mode’. These aspect ratios are in addition to aspect ratios already defined within the track header and possibly the aspect ratios declared with the stream. In these cases the aspect ratio of the QuickTime track header should be considered authoritative, although not all players may agree. To inspect QuickTime container architectures use a QuickTime atom parser like Dumpster or Atom Inspector and look for the trackWidth and trackHeight values within the ‘tkhd’ atom (Track Header), or see the ‘tapt’ atom that contains various alternative track aspect ratios.

Frame size

The frame size refers to the width and height of the pixels encoded in the stream, such as 720 x 480 or 1440 x 720. Certain frame sizes are predominant, although nearly any frame size may exist. Sometimes the presence of chroma subsampling will limit which frame sizes are possible, for instance 4:2:0 frames must use even-numbered widths and heights, and 4:2:2 must use a width that is a multiple of 2, and 4:1:1 must use a width that is a multiple of 4.

Care should be taken when the pixel width or height of the display monitor is substantially different than the frame size of the video since a large amount of video scaling will be required. For instance if a work of computer video art with a small frame size is intended to be shown on a high definition monitor there are several methods for scaling the video from one size to another.3

If a 192 x 108 image is intended to be shown on a 1920 x 1080 monitor then the width and height must each increase by a factor of 10. The total number of pixels in the original frame is 20,736 but the presentation must use 2,073,600; thus 99% of the pixels in the resulting presentation must be artificially created. Typically when video is scaled from a small size to a larger size the newly introduced pixels will be set to values that average the luminance and colour of its neighbours. In many cases this approach will result in the addition of new colours that never existed in the original image. In the case of computer or pixel video art the effect may appear muddy and artificial.

FFmpeg contains several methods for scaling pixel art to a larger size, such as the ‘neighbor’ scaling algorithm (-vf scale=flags=neighbor) or the hqx filter (http://ffmpeg.org/ffmpeg-filters.html#hqx, accessed 27 February 2015). With these methods an image may be increased to a larger frame size but retain the pixel art look of the original smaller image.

Colourspace types (conversion matrices)

Although most computer displays present video pixel data in components of red, green and blue (RGB), most video is stored as YUV.4 Getting from RGB to YUV and YUV to RGB requires using an equation to transfer one colourspace to another. The rationale behind using YUV is that all of the information about luminosity is moved to one channel (Y) and the colour data is moved to the other two (U and V). Since the eye is less sensitive to colour compared to luminosity, colour may be sampled at a lower rate without much effect on the viewer.

A presentation challenge for working with YUV video is that there are several different equations available to convert Y, U and V to R, G and B. If YUV data is interpreted using a BT.601 equation it will have different colours than an interpretation using the BT.709 version of the equation. Generally the same equation should be used to convert YUV back to RGB as was used to originally create the YUV from RGB, although occasionally a video creator may consider that the unintentional use of the wrong colour matrix provides the intended look.

In general human eyes are not very sensitive to minor colour differences so without a side-by-side or attentive comparison the differences between the various colour matrices may be difficult to deduce. However, the difference will be easiest to identify within the areas containing the most saturated colour.

Some codecs such as h264, ProRes, or VP8 will contain metadata to declare which matrix should be used for decoding to RGB. However, many streams do not provide this information and often neither do containers.

The QuickTime container stores colour matrix information within the ‘nclc’ section of the ‘colr’ atom. These parameters are used by ColorSync to generate one of the following video colour spaces:

  • HD (Rec. BT.709)
  • SD (SMPTE-C / BT.601)
  • SD (PAL)

With the absence of ‘nclc’ data, QuickTime applications defer to use SMPTE-C/BT.601 to convert YUV to RGB, which would give incorrect colours if the intention were to use EBU PAL or Rec. 709.

The FFmpeg scale filter supports the overriding of the colour matrix with the in_color_matrix and out_color_matrix options to change how YUV is interpreted to RGB. This feature can be used in normalisation transcoding to better explicitly set a colour matrix if a source file does not define this properly.5

Chroma subsampling

For YUV video often the colour data is sampled as a lower resolution than the brightness information. Thus a 720 x 480 video may encode brightness information at 720 x 480 but the colour may be subsampled at 360 x 480 (4:2:2), 360 x 240 (4:2:0) or 180 x 480 (4:1:1). Converting video from one lower resolution chroma subsampling pattern to another (such as 4:1:1 to 4:2:0) will significantly reduce the quality and accuracy of the colour data.6

Interlacement

Interlaced video is recorded as two field-based images per frame. On a progressive monitor these two images are combed together to form one image from two. On a progressive display interlaced content may appear to have a slight jagged horizontal look. If the ordering of the fields is altered so that the images appear in the wrong order the effect can be substantial.

One method to test that the ordering of fields in an interlaced video is correct is to play the fields as their own images without joining the two fields into a whole frame. This can be done with the ‘separatefields’ filter in FFmpeg, such as:

ffplay {interlaced_video.mov} -vf separatefields

Material that is improperly interlaced would play back in an especially choppy way with this presentation. FFmpeg has additional filters such as ‘setfield’ and ‘fieldorder’, which may be used to correct improperly interlaced video or video with incorrect interlacement metadata. Additionally, the interlacement patterns of video frames may be detected through FFmpeg’s idet filter. The following command:

ffmpeg -i {interlaced_video.mov} -vf idet -f null -

will provide a summary that states how many frames appear to be interlaced and how many appear to look progressive. Occasionally, FFmpeg’s idet filter may conclude that a video file ‘looks’ to be interlaced although the metadata of the stream and container either does not declare so or states an incorrect value.

Within QuickTime the ‘fiel’ atom stores information regarding the arrangement of fields and frames. With this atom the container declares if a video track is intended for progressive or interlaced presentation.

For QuickTime files that do not contain a ‘fiel’ atom the player may obtain interlacement information from the codec (if such information exists) or presume the file is progressive. For interlaced video within a QuickTime container it is recommended to use the ‘fiel’ atom to clarify whether the video is intended for progressive or for a type of interlaced arrangement.

The following is an example of fiel showing progressive video (data gathered via ‘mediainfo –inform=”Details;1” file.mov’):

00D0430D       Field/Frame Information (10 bytes)

00D0430D       Header (8 bytes)

00D0430D      Size:                             10 (0x0000000A)

00D04311        Name:                         fiel

00D04315        fields:                          1 (0x01)

00D04316        detail:                          0 (0x00)

The following is an example of fiel showing interlaced video:

00000ED3        Field/Frame Information (10 bytes)

00000ED3        Header (8 bytes)

00000ED3        Size:                             10 (0x0000000A)

00000ED7        Name:                         fiel

00000EDB       fields:                          2 (0x02)

00000EDC       detail:                          14 (0x0E)

The ‘fields’ value will contain one of the following values:

- 1 or 0x01 = progressive

- 2 or 0x02 = interlaced

If fields is set to 0, then ‘detail’ will also be set to 0, otherwise ‘detail’ will be used to indicate one of the following:

- 0 or 0x00 = progressive

- 1 or 0x01 = field with lower address contains topmost line

- 6 or 0x06 = field with higher address contains topmost line

- 9 or 0x09 = field containing line with lowest address is temporally earlier

- 14 or 0xE = field containing line with lowest address is temporally later

If the ‘fiel’ atom is present it may be edited via Atom Inspector as a sub-atom of the ‘stsd’ Sample Descriptions atom.7

YUV sample range

For 8-bit video a luma or chroma sample may have a value from 0 to 255 (in hexadecimal 0x00 - 0xFF) or 10-bit video would use a range of 0-1023 (for simplicity this description will provide examples based on an 8-bit expression). Video broadcast standards apply a constraint for video sample ranges of 16–235 for luma and 16–240 for chroma. Thus for video in broadcast range a value of 16 is black and 235 is white. However, for video in full range a value of 16 is a dark grey, 235 is a light grey, 255 is white, and 0 is black.

Some codecs such as h264 can indicate if they are intended for broadcast range or full-range playback. However, most codecs and containers do not have a mechanism to express this specifically.

Full-range video can be identified in MediaInfo by the presence of the ‘Color Range’ value:

Color range                             : Full

or in MediaInfo’s trace report, see the video_full_range_flag:

00308AB1  video_signal_type_present_flag (9 bytes)

00308AB0  video_signal_type_present_flag: Yes

00308AAD  video_format:                              5 (0x05) - (3 bits) -

00308AB4  video_full_range_flag:                 1 (0x01) - (1 bits) - Full

In FFmpeg a full-range video is noted by the use of a ‘yuvj’ prefix for the pixel format:

Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), yuvj420p(pc, bt709), 1280x720 [SAR 1:1 DAR 16:9], 25444 kb/s, 50 fps, 50 tbr, 50k tbn, 100k tbc (default)

The ‘j’ of yuvj stands for jpeg, which typically uses a full range for sample values without broadcast sample range constraints.

Video using full range is often found in DSLR cameras. Such video is often difficult to consistently present and process correctly as so many video applications are designed specifically for broadcast range. If a full range video is interpreted as if it is in broadcast range it will appear with a loss of detail in very white or black areas of the image and there will be a loss of contrast.

Full range v. broadcast range

The following command will generate a test file, which uses uncompressed 8-bit UYVY encoding and contains 256 frames:

ffmpeg -f lavfi -i color=r=256,geq=lum_expr=N:cb_expr=128:cr_expr=128 -pix_fmt uyvy422 -vtag 2vuy -c:v rawvideo -r 256 -t 1 test256y.mov

The Y value of each frame will be equal to the frame number, so frame 0 will show black and frame 255 will be white.

Playing the sample test file and using a digital colour meter can show what RGB values are created by the decoding. Here one can see that if it is decoded as broadcast range then frames 0–16 all create the same RGB value (0,0,0) and again frames 235–255 all create the same RGB value (255,255,255). Although frames 0–16 and frames 235–255 contain unique YUV values the RGB data they create is indiscernible on an RGB monitor.

Codec specific considerations

h264

The following command will create a very simple one-second h264 video with 256 frames (one for each possible value of an 8-bit luma expression):

ffmpeg -f lavfi -i color=r=256,geq=lum_expr=N:cb_expr=128:cr_expr=128 -pix_fmt yuv420p -c:v libx264 -r 256 -t 1 test256y_broadcast.mp4

Because of the extreme simplicity of the visual image, the resulting h264 stream will be lossless. Each frame will contain only identical pixels where the value of the luma channel is equal to the frame number and chroma channels are set to 128 (mid-point of 8-bit range). Thus frame 42 will contain only samples where Y equals 42, so when displayed on a monitor R, G and B should be 30, 30 and 30 (a dark grey).

When the file is open in a libavcodec based player such as VLC or in QuickTime X each pixel per frame will decode identically as intended. This can be verified with a digital colour meter. QuickTime Pro 7 does not decode h264 properly but presents pixels with values that may deviate from the original. The overall effect of watching h264 in QuickTime Pro 7 is that there is a faint layer of noise over the image, which is added in all decoding, which affects both playback and transcoding via QuickTime Pro 7. Because of this, using QuickTime Pro 7 to transcode this h264 sample to an uncompressed format would be lossy and the resulting uncompressed file would contain the noise introduced by the QuickTime Pro 7 h264 decoder.

Additionally H264 supports YUV encodings at both broadcast range and full range. If a full-range h264 file is played through a player that does not support the proper interpretation of full-range YUV samples then the video will appear to have a reduced level of contrast and whites and blacks will be clipped.

DV

Within QuickTime Pro 7 there is an option (under ‘Show Movie Properties/Visual Settings’) called ‘High Quality’. When ‘High Quality’ is disabled (which is the default) QuickTime will only decode part of the DV data. Presumably this was intended to allow older, less powerful computers to play DV even if it meant doing so improperly. In QuickTime Pro’s preferences is an option to ‘Use high-quality video setting when available’. Checking this will ensure that video is played correctly. When ‘High Quality’ is unchecked then DV files will play back a blurry and inaccurate image.

NTSC DV uses a 4:1:1 colour subsampling pattern that samples colour horizontally but not vertically. Nearly all modern video for the internet uses a 4:2:0 pattern. Both 4:1:1 and 4:2:0 sample colour data at the same rate but in incompatible patterns. As a result, when NTSC DV is converted from 4:1:1 to 4:2:0 (such as in a transcoding for a web presentation) there will be substantial loss to colour detail. The results will contain a softer image and diagonal lines will appear jagged.

MediaInfo

MediaInfo assesses digital audio-visual media and reports on various technical characteristics.8 MediaInfo will demultiplex (demux) the container format and interpret the contents. In many cases MediaInfo will also analyse portions of the streams contained with the format to gather additional information. The resulting information is then selected and associated with a technical vocabulary managed by MediaInfo. MediaInfo may then deduce additional pieces of information. For instance, by identifying a particular codec fourcc, MediaInfo may then deduce and report on a codec name and other associated attributes.

By default MediaInfo will show a fairly concise report. This can be obtained via the MediaInfo command line program with this command (replace file.mov with the filepath of a file to analyse):

mediainfo file.mov

A more detailed report may be obtained with:

mediainfo -f file.mov

The -f here stands for the ‘full’ option. In the full report many metadata values will appear identical, although they are formatted to serve different use cases.

By default MediaInfo uses human-readable labels for metadata terms. For archival or programmatic use of MediaInfo there is a raw language option, which uses internally unique metadata labels. This option may be obtained like this:

mediainfo -f –language=raw file.mov

As an example of these outputs here is duration information of the video track with MediaInfo file.mov:

Duration                                  : 1mn 21s

Source duration                      : 1mn 21s

with ‘mediainfo -f file.mov’:

Duration                                  : 81938

Duration                                  : 1mn 21s

Duration                                  : 1mn 21s 938ms

Duration                                  : 1mn 21s

Duration                                  : 00:01:21.938

Duration                                  : 00:01:21;28

Duration                                  : 00:01:21.938 (00:01:21;28)

Source duration                      : 81949

Source duration                      : 1mn 21s

Source duration                      : 1mn 21s 949ms

Source duration                      : 1mn 21s

Source duration                      : 00:01:21.949

and with mediainfo -f –language=raw file.mov

Duration                                 : 81938

Duration/String                       : 1mn 21s

Duration/String1                     : 1mn 21s 938ms

Duration/String2                     : 1mn 21s

Duration/String3                     : 00:01:21.938

Duration/String4                     : 00:01:21;28

Duration/String5                     : 00:01:21.938 (00:01:21;28)

Source_Duration                     : 81949

Source_Duration/String          : 1mn 21s

Source_Duration/String1        : 1mn 21s 949ms

Source_Duration/String2        : 1mn 21s

Source_Duration/String3        : 00:01:21.949

Note that each duration string type expresses the duration in a different manner. ‘Duration’ expresses the time in milliseconds, whereas ‘Duration/String3’ uses HH:MM:SS.mmm (hour, minute, second, millisecond), and ‘Duration/String4’ uses HH:MM:SSFF (hour, minute, second, frame).

Identification and playback maintenance risks

In the example above the video track provides two sets of durations called ‘Duration’ and ‘Source Duration’. MediaInfo will often start a metadata term with the prefix ‘Source_’ or ‘Original_’ to express a conflict between the container and stream. Here the video stream contains 00:01:21.949 of video but the container presents a duration of 00:01:21.938. In this case, the QuickTime container of the file uses an edit list to only show a portion of the video. Video players that properly interpret QuickTime edit lists will show 00:01:21.938 of video whereas players that do not will play 00:01:21.949.

In MediaInfo’s metadata labelling language the presence of a particular tag that is paired with a tag of the same name prefixed by ‘Source_’ or ‘Original_’ documents an expected or unexpected difference between the metadata of the container and the metadata of the stream. In some cases players may vary as to which (container or stream) are used to gather essential technical metadata for playback. When significant characteristics of a file vary internally care should be taken to determine which expression of that characteristic is official.

Conclusions

Conservation laboratories are filled with tools and long-refined expertise designed for dealing with a variety of artworks and materials. However, for digital video works the tools are only recently being identified and the expertise is in very early development. It is not difficult to imagine that conservators of the near future may be equally adept at identifying and resolving presentation and maintenance issues with digital video as conservators are currently with other types of physical material.

Unfortunately the technological diversity and complexity of codecs, containers and implementations within digital video collections makes it difficult to provide a simple set of guidelines that address presentation inconsistencies. However, familiarisation with open tools such as FFmpeg and MediaInfo can reveal many details about digital video that are often unseen and provide more technical control over such content.

A more active relationship between conservators and the developers of their utilities can also benefit both communities. As part of the research for this article the author submitted several tickets and requests to MediaInfo and VLC to make them more suitable for addressing conservation concerns and issues of consistent presentation. Often a small amount of development (whether performed voluntarily or through sponsorship) can have a great impact on conservation workflow. Additionally, as digital video conservation is still in an early stage of development it is crucial for digital video conservators to work as a community, sharing experiences, seeking advice and guiding the development of necessary expertise.

Close