Understanding the technical infrastructure needed to support Art Maps has been an interesting journey. While the core of the concept of relating art to place has not changed, our initial thoughts around the ‘easy’ problem of crowdsourcing location rapidly led to discussion about: 1) what we mean by location for an artwork; and 2) the more general problem of crowdsourcing people’s very different interpretations of the art works.
First, to give a little context, consider the following map. Here, the Tate collection has been ‘mapped’ courtesy of a Google Fusion Table. You can view the interactive version of the map on my Horizon blog.
Unsurprisingly for a UK collection the UK and Europe feature heavily but we’ve found a range of artworks from across the world – and yes that is an artwork from McMurdo Sound in Antarctica.
At the start of the project Tate had already tagged many artworks with location recorded as longitude and latitude. These had been obtained by looking up the place names that curators had associated with each artwork on the geonames web service. But if all you have is ‘India’ you get the co-ordinates N 20° 0’ 0” E 77° 0’ 0” – a very specific, yet totally inaccurate, location for most of the works in India!
With this in mind our first goal was to refine the location and we set out to facilitate the public to ‘crowdsource’ the information.
It was only minutes into the first meeting before the discussion moved from the comparatively simple ‘do we mean where the artist sat’ or ‘the location of the subject’, to ‘where has it been displayed’ or ‘where the artist lived at the time’ and so on. Technically we could imagine providing for all of these as options. Through simple processing it would be possible to understand the relative frequency of competing suggested locations (and dates) and eventually provide enough evidence to update the curated content.
But then we came to the more challenging issues of location: ‘where do you think of when you see this Kandinsky‘…
(For those who answer Russia, you know your art! In which case next time I’ll pick something even more abstract!)
This inevitably led the project to consider the more general problem of crowdsourcing personal interpretations from the public and moved from information that could be comparatively straightforward to process to a much more complex problem.
There are many aspects of the information you might want to know about an artwork that can be defined a priori and considered as a list of facts, e.g. date, title, artist, location, etc - albeit that some of the data, like location in this case, may in fact not be that accurate! In database terminology we would call this set of facts the schema; the semantic web community within information sciences would view this more generally as an ontology.
It is interesting to note the definition of ontology in this context from [Gruber 1993], “the formal, explicit specification of a shared conceptualisation”. Herein lies the problem - what if you and I do not have such a shared conceptualisation?
Compare this with the Art and Humanities view that divergence of conceptualisation is to be encouraged, embraced and discussed. Hence, the core challenge is one of a fundamental nature – if your interpretation involves an expression of your own conceptualisation, how can I have built a technology that knows about it in advance. Ouch!
Maybe in the future, artificial intelligence will be able to process this information on our behalf. However, while noting the many great advances in the field (and it did start back with Turing…) we are not there yet, so some pragmatism is needed.
Our design is in principle quite simple and uses what we have referred to as ‘semi-structured’ blogs. We start with curated content about which we seek information, with participants using a blog to make their contributions. After navigating around the content, if the contributor wishes to add to the discussion they are taken to their blog where a post has been partly constructed from a template supplied by the content curator, containing links to the original content (indeed any meta data desired can be contained in the template). The template also contains fields for contributors to add structured information (like location) according to the defined schema. They are then free to comment further and provide images, audio, and text as they would in a normal blog post to support the construction of their interpretation.
Once posted a blog-ping is sent to the content curator website, which can then process the blog content. Importantly, not only do we know what is being talked about, the structured content is amenable to automated processing. Only after this do we invoke processing of the unstructured content. Such processing can be based on some well-established statistical techniques, such as machine learning – now an everyday technology used in internet search, grammar checkers and computer language translation. But importantly the structured information within the blog post at least allows us to know, unambiguously, the artefact under discussion.
Part of the technical mission has been to build a reusable platform; for example, since each blog post is ‘self describing’, a contributor only needs one blog to comment on many different collections using different templates, with each content provider being able to identify postings about their own content from their templates. Our next steps are to enable any content creator capable of creating a spread-sheet to populate their own service with their own content and engage their audience in new and interesting interpretations.
[Gruber 1993] T. R. Gruber. A translation approach to portable ontologies. Knowledge Acquisition, 5(2):199-220, 1993.