You can read Part I of Season 2 here.
And we’re back! Kelsey Kim, C-SPAN Project Archivist here, ready to introduce you to the next phase of C-SPAN Records digitization.
A summary of the ground we’ve already covered:
- Digitization is not just a lone person in a dark room scanning documents all day, every day until retirement—in fact, it’s a complicated, multi-step process, only part of which is capturing digital images of archival documents.
- A LOT of prep work goes into project planning, including getting estimates of quantity of material to digitize, how long it takes, and what information should be collected on the items.
- The right equipment makes a HUGE difference in a project. For SCRC, upgrading from a flatbed scanner to a digital camera on a copystand meant a more than 12 hour difference in the time to digitize half of one box of viewer mail. That’s right—12 HOURS. And SCRC is lucky because we were able to procure not just one, but TWO copystands and cameras. *Slow clap*
So, we now have a project plan, and our next step is the first part of the actual digitization process—PREPARATION OF DOCUMENTS FOR DIGITIZATION (we just call it “Prep”). This consists of two parts: physical prep and preliminary metadata.
What is physical prep?
Before a document is ready to be photographed, it’s a good idea to make sure it’s easy to handle. Remember, you’re trying to digitize one single item at a time, so if two papers are stapled together, you have to remove that staple. Rather than doing that while you’re in the photo-taking groove, we do it ahead of time. Of course, we’re archivists, so there’s a specific way of removing staples, and every time a staple is removed, it has to be replaced by a paperclip (specifically a non-rusting one). Standards, you know.
About a box’s worth of staples
What else do we do? We’ll unfold things, smooth out creases, pull letters out of envelopes, put archival paper around newspaper clippings (newspaper is acidic and will damage other paper), look out for water damage or mold (aka the Archivist’s Bane), and generally make sure that a document is stable enough to be gently picked up, moved to a new surface, photographed, and put back without damage. This leads me to an important side note: if you’re storing documents for long-term use, please do not use tape or rubber bands. They WILL NOT HOLD UP.
What is preliminary metadata?
The other half of our prep is collecting preliminary metadata. We already talked briefly about metadata, but just for a moment, lets take another dive into it.
Fancy, professional people often refer to metadata as “data about data.” Of course, that’s like describing the taste of an orange as “orangey.” Another way to think about metadata is that it’s just data which describes some type of item. This data can then be used to locate that particular item (like a book in a library catalog or a song in a music library), track changes to that item (for example, edits made to a Word document), or prove when, how, and by whom that item was created (like when you’re trying to prove to your boss that you ARE doing work, not just falling into a YouTube spiral). Of course, metadata can do a lot more, like establish correlations between items, document who has the rights to publish material, and show decay of digital files (yes, that happens).
Typically, people who work with metadata sort it into different categories. Some of these are below:
- Descriptive metadata describes the item
(obviously). Metadata in this category
usually includes a title, keywords, author(s), etc. Its purpose is basically to help with search
and discovery of an item.
- Administrative metadata relates to the management of an item, like the file type, creator, date of creation, and a bunch of other technical information. This type of information isn’t found IN the document, it just provides context for the work you’re doing with an item. Some of this, like the file type, may be grouped separately as Technical metadata (depending on who you talk to).
- Structural metadata is about the relationships between parts of an item, like multiple articles in a journal or magazine, or that the table of contents and a book chapter are still part of the same book, etc. It’s a bit more conceptual than other types, but that’s basically what you need to know.
- Preservation metadata is used to track the quality of a file over time. For example, did you know that digital files can deteriorate? Well, they can—it’s called bit rot. For people in the digital storage game, there are ways of tracking that deterioration by comparing copies of the same item periodically and migrating to new formats for stability.
Bored yet? It’s okay, I kind of am and I do this stuff all day. But rest assured, the last thing you need to know about metadata is that there are different standards of what’s called “core” metadata—sort of the bare minimum metadata—particularly for descriptive metadata. That’s what most archivists use for guidance. They’re not pulling it out of thin air (most of the time).
So now, for C-SPAN. We’re working on digitizing three different series within the collection: Press Releases, Viewer Mail, and Photographs. Currently, our big push is prepping the Viewer Mail records, since this is the largest of the three series. We want information on each LETTER (it sounds crazy, because it definitely is), so it takes A LOT of time, and there are A LOT of staples. But that level of metadata means that if someone is looking for a letter containing “An Ode to Brian Lamb*,” they will be able to find it.
Here’s a sample of what we’re collecting on each piece of viewer mail:
- Title (Usually letter, date, ZIP)
- Creator (The name of the sender, but this will not be published for privacy reasons)
- Description (Brief summary of the content of the letter)
- City
- State
- ZIP code
- Date
- Pages (Some letters include clippings or photographs, these are also counted as “pages”)
- Keywords (Usually three per item)
Sounds easy, but when each box has an average of 400 letters, it’s definitely a struggle!
Are you starting to see why digitization takes so long?
We’ll also collect metadata on the Press Releases and Photographs series, but because it’s different material, the data will be a little different! For example, most photos don’t have ZIP codes, but it does matter if they’re in color or black and white, so we’ll track that instead!
It’s all a big task, but it’s essential to making these documents discoverable, otherwise it’s just clicking through tens of thousands of photos until you find what you need, and who has the time for that?
*Brian Lamb is the founder of C-SPAN.
Follow Special Collections Research Center on Social Media at our Facebook, Instagram, and Twitter accounts. To search the collections held at Special Collections Research Center, go to our website and browse the finding aids by subject or title. You may also e-mail us at speccoll@gmu.edu or call 703-993-2220 if you would like to schedule an appointment, request materials, or if you have questions. Appointments are not necessary to request and view collections.