Accio Archives: Unleashing BitCurator

I am stepping aside for this one and giving the Things I’m Fonds Of floor over to a guest today.

This post has been provided by John Moran.

Originally from Ireland, John now lives on the unceded territories of the Coast Salish Peoples. He has recently graduated from the MAS program at UBC and is currently working as an archivist at UBC Archives. When he isn’t working, John is reading on the beach, hunting down scenic hikes, and day-dreaming about being able to play a musical instrument really well.

Take it away, John….

For over one hundred and twenty hours in the last ten months, I have been fortunate to have trained under one of the best digital archiving wizards around at the City of Vancouver Archives (CVA). Ok, it’s not quite the Ministry of Magic, and I am far too old and tall to be compared to Harry Potter (although I have met Daniel Radcliffe — that’s for another blog post), but I have been battling something more insurmountable than the Dark Lord: a dragon’s nest of born-digital records. My wand of choice? BitCurator.

A question that I heard over and over while studying, is what do we do with born-digital files on media, i.e. floppy disks, flash drives, DVDs/CDs, hard drives, cell phones, and laptops? The answer was most often an awkward silence; no one had quite worked out the magic spell yet. One story that keeps me up at night was from a concerned colleague who, while volunteering in an archives that will remain nameless, was handed a box full of floppy disks and told to throw them in the garbage. So in the garbage they went. They were probably blank….right?

So what exactly do we do with born-digital media? BitCurator is an answer. With the magic of disk imaging, we can copy what is on born-digital media and use the analytical tools in BitCurator to identify file types, file systems, duplicate records, and some real dark magic stuff like file system metadata (important for provenance). There is even a blindingly powerful spell that looks for personal information on documents that may need to be redacted or removed. Disk imaging on an ongoing basis gives archivists the opportunity to save materials from floppy disks and CD-ROMs that may not last for more than fifty years. It also allows us to conduct appraisal in order to satisfy our burning curiosity to know what is on the media and to allow for the disposition of items that are not worth keeping, thereby saving valuable space.

I gained experience using BitCurator as part of a volunteer position at the City of Vancouver Archives (CVA) where BitCurator has been used to image hundreds of DVDs and CDs from the 2010 Olympics in Vancouver. Little was known about what was on these optical disks so the initial objective of imaging was appraisal. The goal was to generate an inventory. The reality was hour after hour of repetition to gather the images and metadata before you get to the really interesting stuff. So what did the set-up look like? And how do you image an optical disk?

“You need to unplug man”: Using a non-networked computer

Best practice is to have a non-networked computer set up as is the case at CVA. This is highly recommended by BitCurator. The environment used by BitCurator is an Ubuntu derived Linux operating system, so having a dedicated computer means a faster machine and depending on the format and size of the images being obtained, this set-up gives you the best chance of avoiding an overwhelming number of technical issues. Understandably, this may not always be feasible for smaller institutions. If you already have a computer available, you can begin right now by downloading BitCurator (make sure to download the version of BitCurator that is for a dedicated machine).

There is, however, an alternative to having a dedicated computer. BitCurator can be set up as a virtual machine (VM). Put simply, using a VM means running BitCurator on the Linux operating system alongside your Windows or Mac OS, and being able to switch between the two to operate BitCurator whenever it is needed. While this is very convenient, it does have some drawbacks. Sarah Lake of Concordia University for the BitCurator Project highlighted lagging and bugs as being most prevalent with the VM set-up. But a colleague of mine recently used BitCurator on a virtual machine to image a shared-drive and his experience was very positive for that particular project.

Imaging CDs and DVDs – a workaround

Guymager is the tool for imaging in BitCurator (see step-by-step instructions for this and other tools here). However, there is a drawback with Guymager. When imaging data on CDs and DVDs, it is broadly recommended to use ISO images.  ISO images are exact copies of CDs and DVDs and are used in downloading such programs as BitCurator itself. Looking through Google searches on ISO images, it is common to see many people backing up their own private Harry Potter collections using this format. Of course, this is also so that their DVDs and CDs can be played on modern laptops and iPads that don’t have disk readers anymore! For more in-depth information on the ISO image, check out this link.

The two file formats supported when imaging using Guymager are E01 (Encase Image File Format) and AFF (Advanced Forensic Format). At the City of Vancouver Archives, it was decided that CDs and DVDs should be imaged as ISO images. Instead of using Guymager, the Linux command window was used (see red arrow below) and a line of super user code was written to command BitCurator to turn the CDs and DVDs into ISO images – proper dark magic stuff. This super user command not only enabled the desired format but also greatly sped up the process, because once the command line is written, only the accession number had to be edited for each new CD or DVD being imaged.

The magic spell for creating ISO images in the command window:

sudo dd if=/dev/sr0 of=/media/bcadmin/bcWorking/[FILE NAME].iso

Sudo in Linux means ‘superuser do’, or in other words, ignore everything else and do this.

‘dd’ means copy and convert

‘if’ is the input file

‘of’ is where you would like the output file to end up on your computer and the file name you would like on it.

For each optical disk, a photograph was taken to record any information written on it. The image was made read-only, a file directory of its contents was created, and, finally, an MD5 hash was created of the image. One concerning issue was the number of DVDs and CDs that could not be imaged – the media was not sufficiently robust, even though some of the disks were less than 10 years old. Just goes to show that we can’t afford to wait on this stuff!

The image below shows a workflow used by digital archivist Walker Sampson (more on this available here).


Once BitCurator is set up and a workflow has been established for your project, volunteers and students can then be let loose to work with the system and in turn, see how many podcasts they can listen to while they are processing the records. BitCurator provides an opportunity to get a handle on the back-log of digital media. This is digital preservation that can be done off the side of your busy desk – assuming you have the power to capture a volunteer. It is also a great way of alleviating the burning curiosity of what is on the digital media in your collections. Throwing them in the garbage or leaving them for a future you is no longer the answer. You can give it a go today.

And it’s not rocket science. It’s just a bit of magic.

Best to just install BitCurator then…For installation information, check out this link.

BitCurator Quick Start Guide

Step by step guides to all tools in BitCurator.

Good luck!