Culture-mining and the Search for Meaning
Prototype for an online audio/video (re)Search Tool

Research and Development 2005–7: Julia Kristeva © Tate Photography. Olafur Elliason The Weather Project © Olafur Elliason, Photo © 2003 Tate, London, Grayson Perry, Photo © John Napier, courtesy Victoria Miro Gallery, London. Yinka Shonibare, Still from Un Ballo in Maschera (A Masked Ball) © 2004, Commissioned for the Moderna Museet, Stockholm. Produced by Moderna Museet and Sveriges Television. Courtesy Stephen Friedman Gallery, London
Become a beta-tester
We invite participants to use and provide feedback on a prototype interface, which proposes a social tagging system for populating the archive with meta data.
Further information on how to participate
Introduction
Artists, museums and the heritage sector are creating ever-increasing amounts of audio-visual content. One of the biggest issues facing the museum and heritage sector over the next five years will be how to manage, interface and distribute that content to the public and research sector. Without effective methods for searching and sorting items, the growing mass of cultural data risks becoming increasingly unobtainable.
Tate is working in collaboration with Goldsmiths College, University of London Department of Computing, to produce an open source application for searching and retrieving audio/video content online. A prototype is currently in development, using sample files from Tate's Online Events Archive.
The application aims to demonstrate the potential for an intuitive search engine that can efficiently retrieve and deliver fragments of audio/video content. The user would be able to search content in a very detailed way, to save and organise search results; and effectively create a personalised service based around the contents of the archive. Ultimately the project aims to develop a user-centred tool that will allow audiences and academics to quickly search, retrieve and play results drawn from large volumes of long-play content.
The ambition of this research and development project is to identify and expand evolving technical standards associated with multimedia content, in order to build an application that will demonstrate 'intelligent' behaviours that assist the user. This will be coupled with the development of semi-automated management features that streamline the archive's maintenance. These unique technical architectures will integrate with the development of an ontological infrastructure that can effectively capture and describe the broad range of subjects contained within the archive.
The research engages with four main areas:
- users and user requirements
- intelligent and intuitive search engines
- automated description, indexing and management features
- technical standards for online video delivery and playback
Due to the issue of data sparseness and current limitations associated with automating tagging features, the research has led to the proposition of a system for social tagging. See the beta test link above to offer your feedback on this feature of the application.
In partnership with the Department of Computing, Goldsmiths College, University of London and Supported by the Arts and Humanities Research Council ICT fund for cultural resource enhancement.
Primary Goals
- To create an online, intelligent and flexible archive retrieval system and interface, for Tate's digital video assets (in the first instance focusing on a select portion of Tate's existing Online Event Archive).
- To develop systems that will aid in the management and maintenance of such archives. The maintenance will be enhanced by making very laborious tasks automatic
The Online Events Archive currently holds around 500-600 hours of video and is regularly updated with the ongoing documentation of artists' talks, cultural theory lectures and symposia. The large volumes of content and ongoing production of long-play programmes are unique attributes of this type of archive. In the year 2004-5, the Tate Online Events Archive and associated content received over 21,000 unique visitors, collectively watching up to 280,000 archives. However the use of the archive as a research resource is limited by its poor indexing and the fact that the recordings are only available as long play files, their descriptions giving no indication of where within a streamed clip to find a particular subject.
The prototype system will be able to extract from long recordings only those sections (fragments) that are relevant to a user's current interest. Furthermore, the system will be able to personally assist users, directingand guidingthe search/research activity, with 'intelligent' suggestions. Ongoing development aims to include play back features that provide the ability to move forwards and backwards, so that users are able to glean the context of a retrieved video fragment. In this way users will be able to search large volumes of long-play content and quickly find and collate the archives most relevant to their query.
Infrastructures and sustainable workflow methodologies that assist with the hosting, management, interfacing and distribution of these cultural heritage documents is a relevant concern for the museum sector. The maintenance of the archive is problematic. To maintain by hand an index of the archive as it is currently being developed is not feasible. Therefore the usefulness of the archive is undermined by the difficulty of navigability at one end and the difficulty of management at the other.
The primary goals of this prototype project are to tackle both of these difficulties, making a section of the Tate Online Events Archive into a powerful resource for researchers. This prototype will serve as a paradigm for later development in which the rest of Tate's digital video assets can be turned into a searchable well-indexed online media library.
The ambitious long-term goal of this research is to evolve systems that will semi-automatically fragment and tag multi-media material both at the time of production and at the time of delivery. Automated features will be developed through the use of speech to text software; providing transcripts from which indexing can be automatically drawn and applied. Systems of social tagging will also be employed, such that the application is able to learn from users and improve its understanding and indexing of content contained within it. The system will be extendable and reusable for other collections.
Ambition and Outputs
- An intelligent system that allows easy access to fragments from Tate's digital video recordings
- A system that can semi-automatically fragment and tag audio-rich streaming media
- An ontology (a specialist vocabulary and logic) for representing, in a computer understandable format, the content of recorded talks and symposia mainly focused on modern and contemporary art;
- A search engine, that can reason with ontological descriptions in order to find relevant fragments;
- Intelligent software components that can summarise and make suggestions in order to help researchers; furthermore, these components are aware of the preferences defined by individual users; collectively they are denoted as an 'intelligent personal assistant';
- A user interface for the system, deployable in web browsers;
- A methodology for tagging the material. This methodology will be applicable for any audio-rich streaming media
Context
Contextual Factors that have defined the project include:
- The nature and characteristics of the content
- The existing and potential audiences for this archival resource
- Comparable interfaces
- Current technical developments in computer science, relevant to the projects aims
Content
The Tate Online Events Archive is a growing resource focussed on modern and contemporary art and culture, historically consisting of a selection of Tate's Adult Education Programmes that have been webcast live, and are available afterwards online as long-play files.
The archive content consists of talks and presentations recorded in a video format. Ultimately the material defines itself on the basis of speech, text and ideas. Visual elements are represented (slides, moving image, laptop presentations) and edited live along with a three-camera mix including titles and name credits. Talks of one to two hours are available as long play files. Currently symposia events between four to twenty hours in duration are manually cut up into individual speaker's presentations, fifteen to forty-five minutes each.
The content covers a wide range of overlapping topics including; for example: fine art history (modern and contemporary), cultural theory, visual culture, social science, design; as well as fine art, new media, performance, film and curatorial practice.
The volume and diversity of content presents particular issues that need to be addressed in the proposed systems management and maintenance features.
Audience
Audience trends and user profiles were developed based on server statistics, an online audience survey, website and mailing list feedback, as well as in depth surveys on the use of the archive conducted with small groups of curators and post-graduate students. The Surveys determined how the archive would be used and what language would be used for searches. It estimated the time that different user groups would spend using the tool, as well as suggested additional features desired by users from the interface.
Four main user groups emerged; including academics/researchers, artists/practitioners, peers from the sector and general audiences. Research suggested that these four user groups ultimately fell into two main categories, general users with a limited amount of prior knowledge pertaining to subjects contained in the archive; and researchers or domain experts who had a high level of prior knowledge pertaining to the subjects contained in the archive.
Interface
Initial investigations involved benchmarking a range of cultural interfaces that engage the use of search functionality and/or video distribution over the internet. By examining and analysing these comparable interfaces it was concluded that desirable features of a user interface may include:
An interface for non-registered general users:
- A scalable browser-like navigation system
- A customizable 'welcome' / 'what's new' / 'what's popular' / and 'accessibility options' interface
- A simple browse interface
- A search interface with high level categorizations listed and a keyword field
- A search results listing
- A playback interface
- A help / FAQ interface
An expanded interface for registered researchers and domain experts:
- A personalization interface that includes bandwidth, geographical, general interest and keyword details associated with individual user requirements
- A 'shopping basket' or personal folder interface, where search results can be saved, organized and returned to by individual users
- A search interface with high level categorizations listed and a keyword field; as well as a relevance field which enables a user to expand results based on keyword groupings and broader category hierarchies
- A personal assistant, adjunct to the search results interface, which if selected, provides users with further detail of related or associated events and subjects
- A printable summary of results selected and saved by registered users, with clear copyright information, in order to assist researchers in referencing material effectively
Technical Development
Technical Factors that have defined the software prototype development include; existing intelligent search technology for searching and retrieving information, automated speech to text transcription, current research and development in the area of indexing visual and audio material and the development of players and playback behaviours for video content.
