I get asked regularly whether I am digitizing everything in my archives’ holdings. There is an expectation that everything is digitized and accessible without having to physically enter an archives. But digitization is just the beginning. Sure we could scan everything and throw it online, but without metadata, social tags, and enhanced description, these digitized records would sink into digital obscurity. If our goal is to truly make digitized content accessible and usable, then we need to find new ways of describing the records and transcribing their content. But as always, it comes down to resources. How are Archives supposed to accomplish the noble goal of providing meaningful and discoverable digital collections, when they are so often staffed by only a handful of people? More and more archives are turning to their patrons, supporters, and the general public to outsource parts of this descriptive work. They have jumped on the crowdsourcing bandwagon by enlisting ‘citizen historians’ who will hopefully have the time and the inclination to transcribe massive amounts of digitized archival content.
Why use one person when you could use thousands?
Crowdsourcing archival transcription has become popular with some of the largest archival institutions in North America. Library and Archives Canada has recently ventured into the water with a small crowdsourced transcription project related to the Coltman Report from 1818. On the grander side of things, the Smithsonian Transcription Centre boasts an impressive 6,552 volunteers who have transcribed more than 180,000 pages of historical material. Their purpose is to make their “collections more accessible and useful to curators, researchers, and anyone with a curious spirit.” I love this notion of a “curious spirit.” It is what seems to make these types of projects successful. I also think there is a desire to contribute and to make your mark on history.
In some cases, crowdsourcing transcription may even represent an act of collective and personal commemoration. I think one of the most interesting examples of crowdsourcing transcription is Operation War Diary, launched in 2014. The project is supported by Zooniverse, a citizen science portal self-described as “the largest online platform for collaborative volunteer research.” The goal of the project is to make the information contained in 1.5 million pages of War Diaries from the First World War accessible to researchers, academics, and family historians. The platform is user friendly and can be learned in a few minutes with the online tutorial. I would love to see Library and Archives Canada do something similar with the War Diaries of the Canadian Expeditionary Force, which were digitized some years ago.
How is the data being used?
I have been following this project for some time, and it is exciting to see that academics are starting to analyze the collected data and have been able to make some initial observations about the daily life of soldiers during the First World War. Richard S. Grayson (Goldsmiths, University of London) has written a paper about Operation War Diary and crowdsourcing methods in the hopes of challenging popular public perception that the war was fought almost entirely in the trenches. He explains that the reason historians “have not systematically analysed the day-to-day lives of soldiers is that to obtain a meaningful figure for a range of units one would have to carry out a very laborious analysis over a wide range of unit diaries with the purpose of examining daily activity.” A crowdsourcing project of this kind can offer exactly the type of data that is required to perform this analysis.
Using the data compiled by Operation War Diary’s host of citizen historians over the past two years, Grayson was able to begin to identify general patterns. He posits that even with the small proportion of data gathered thus far, this type of work “can show differences over time for different types of unit, and it can show differences between such units.” There is obviously much more tagging and analysis to be done before historians are able to draw any significant conclusions, but the fact remains that in a world where funding for humanities and social sciences is low at best, the use of technology to engage masses of volunteers may well provide the resources necessary to accumulate the large data sets that are so essential for meaningful historical analysis.
Is crowdsourcing useful on a smaller scale?
Operation War Diary is a project with the financial backing of several large-scale academic and archival institutions. Not many archives have the resources required to embark on such an ambitious project, but I was excited to see that closer to home, some smaller archives are choosing to explore this technique. At this year’s annual conference of the Association of Canadian Archivists, Bridget Whittle from McMaster University reported on the Postcard Project – an effort to gather basic descriptive metadata for over 7000 images. In her presentation at the conference, Bridget cautioned that crowdsourcing should not be seen as a replacement for archival work. Her impression is that it takes at least as much time, if not more, to facilitate a crowdsourcing project. However, she explained that one significant benefit of enlisting others to do the descriptive work was that the participants often went above and beyond the basic information that was asked of them and provided additional information such as links to blogs, Wikipedia pages, and additional related images.
I asked Bridget to offer some additional perspective on the project and provide some advice for others considering crowdsourcing as an approach for archival description.
Can you describe the problem you were trying to solve with this project?
Bridget: We had 4000 postcards that were rarely used because no one could be bothered to look through fifteen boxes to find one card. The cards had been scanned, but there was no descriptive information for them, and no time for staff to do it. Without this information the scans were even less accessible than the physical cards.
What was the most rewarding aspect of the project?
Bridget: The most rewarding aspect was the pleasure people took from hunting up extra information for the cards and just how pleased people were to be asked. (I’m hoping to say that the second most rewarding aspect is having them online and widely used, but we’re not there yet.)
What was the most challenging aspect of the project?
Bridget: The most challenging part of the project was probably the time commitment between advertising and responding to questions. Despite including an FAQ, I found I was responding to the same questions over and over again.
Can you offer any advice to archivists looking to undertake similar projects?
Bridget: Choose projects carefully and take the time to plan them out. Crowdsourcing isn’t a time saving measure, your time just goes into different areas. Good planning and testing will minimize the extra time staff have to spend on the project.
If you can, do a test run with a number of people. It will help work out some of the kinks. If you have ten people test it and one person has a question, it probably means you will have that question about fifty times. That doesn’t mean you should change it necessarily, but it will highlight the areas you will want to look at before unleashing it on the world.
Will archivists continue to exploit crowdsourcing as a descriptive tool?
We are not yet in a place where computers can provide consistent interpretation of scanned, handwritten text, so we still need to rely on humans to transcribe content from archival records. In the case of the McMaster Postcard Project, Bridget felt that aside from the obvious descriptive benefits, crowdsourcing presented an excellent opportunity for outreach that served to increase good will within the community. Her conclusion was that crowdsourcing is not a replacement for the descriptive work of an archivist, but it can be a very useful tool, if applied within set parameters.
As I see it, crowdsourcing provides a way to include people in the archival process and it has the potential to have significant positive effects, if deployed in the right way. I am interested to see whether this will be a sustainable model for our profession or whether the will to volunteer will fade in proportion with the number of crowdsourcing projects that are out there. For now, it seems that the desire to contribute is real and archives just have to find ways of engaging the right people who will be committed to the project.
I will end with two crowdsourcing projects I think are really exciting:
New York Public Library’s What’s on the Menu
Tate Modern’s AnnoTate