The C-SPAN Chronicles: Season 2, Part III

I’m not much of a cocktail party person (I prefer Netflix in my PJs), but occasionally, I manage to drag myself out of the house in the evening for some light socializing.  If you’ve ever been to any sort of party, you know that the one question you can always anticipate is “what do you do?” And when I give my answer—“I specialize in archival digitization”—one of the most common follow-up questions is “what does that mean?”

It’s a good question! Digitization involves so many different steps and tasks done in particular order that it can be difficult to conceptualize.  Usually, it helps to walk someone through a loose workflow, to help them understand what it means to digitize, from step one to step done. Today, that someone is you!

Let’s dive in!

Here is a very general workflow for our C-SPAN digitization project. 

  1. Project planning
  2. Physical preparation
  3. Preliminary metadata collection
  4. Quality checking
  5. Imaging & additional metadata collection
  6. Quality checking (again)
  7. Post-processing
  8. Quality checking (again again)
  9. Data cleanup and preparation
  10. Posting/sharing/storing

Keep in mind that each of these steps is done for every single box we hope to digitize.  That’s ten steps, and we haven’t even gotten into the weeds yet!  Let’s break it down a little further (emphasis on “a little”) to give an overview of what this looks like in practice.

Steps 1-3: Project Planning, Physical preparation, and Preliminary metadata collection. 

I’ve already talked your ears off about these steps, so rather than go back over it, suffice it to say all of this is done before even approaching a camera, and it entails going through every box and, in some cases, every document in that box.  For more information on these steps, see the previous blog posts on the subject:

Step 4: Quality checking (round 1).

               Ah, yes, you’ll notice this shows up more than once in our ten-step list.  This first time it pops up, what we’re usually checking is the preliminary metadata.  That’s when we go back over a portion of the material that’s been completed and we check every single field of metadata again to ensure that it’s correct.  At the beginning of a project, or when someone new joins the team, we check 100% of the work that’s done.  As the process and metadata gatherer improve and fewer errors are made, we can scale back to checking around 10% of the material.  Of course, in a box with 700 items, 10% is still that 70 documents and each document has 15-20 fields of metadata, so it’s no quick task!  And yet, this step is central to making sure that the data we get is useful and accurately matches the documents we’re hoping to digitize.

Step 5: Imaging & additional metadata collection

               This is the part most people think of when they think of “digitization”—the part where a person sits at a scanner and spends the day putting documents on and taking them off.  First qualifier to that, I said “imaging,” not “scanning” on purpose! As I’ve mentioned previously, a great deal of digitization these days is done with a camera, not a scanner.  That way, you can just click right along.  Here in SCRC, one camera enables us to take about 1,000 pictures a day of standard-size archived documents.  Second qualifier, digitization is actually a very active process.  Yes, you go in order and take picture after picture of various papers, but while you do that, you are checking against existing metadata and adding metadata from imaging (for instance, how many images exist for one item; one paper with writing on the front and back will actually be two images).  Throw in the fact that, for many collections, the papers come in a variety of sizes, all of which need to be fit to frame, and you can see that this actually requires paying very close attention to what you’re doing!

One of our magical digitizing machines (the camera is in the middle). Photographer not included.

Step 6: Quality checking (round 2)

               Again? Didn’t we just do this in step 4? Yes, but this time, we’re checking to make sure that we imaged all the pages included in a box or folder and didn’t miss a page or duplicate a page (or catch our fingers in the frame).  Like with the metadata quality checks, we check 100% of the images at first, then taper down as we get better.

Step 7: Post-processing

               A lot of smaller tasks fall under this step and it can be very project-dependent.  For C-SPAN, post-processing includes creating derivatives, cropping and rotating, and redacting.  When we take images, we usually take them in TIFF format, because they are high-quality and contain a lot of data!  The problem is that they’re huge and it is hard to actually use them online or send them to people.  Instead, we use the TIFFs to make smaller files with less data (in our case, JPEGs), and then store the TIFFs in case we ever need more copies of the image.  The JPEGs are also the copies that we crop and rotate so they look nice for all of you later.  For the C-SPAN viewer mail, we have an additional step in post-processing: redaction.  Obviously, when people wrote in to C-SPAN, they usually included their names and addresses.  Since we don’t want to share that information without their permission, we just wipe it off one of the derivative files.  It sounds easy, but it has to be done one-by-one for all roughly 400 letters in a box! 

Step 8: Quality checking (round 3)

               Yep, once again we have to check.  Mostly, this time, we are checking the redactions to confirm all personal information has been removed.  The better we get, the fewer we need to check!

Step 9: Data cleanup and conversion

               After ALL the metadata for ALL the images in ALL the boxes is done, we have to go through and prepare that metadata for publishing with the images.  That type of cleanup includes standardizing formats and terms used in each field so it’s all uniform.  Then, we turn that metadata spreadsheet into another type of file that we can load onto our hosting platform and pair with the accompanying images of documents.  Luckily, for this step, a lot can be done by computers!

Step 10: Posting/sharing/storing

Finally!!!! This is the culmination of all the work—it’s online!  Whether the information is being used to build a public website (in the case of C-SPAN), being shared with researchers, or being integrated into an online collections database, this final step is the satisfying conclusion to all of the work in steps 1-9.  We do it all for YOU, so YOU can get a view inside of the archives without stepping foot there. 

Worth it. 😊

Follow Special Collections Research Center on Social Media at our FacebookInstagram, and Twitter accounts. To search the collections held at Special Collections Research Center, go to our website and browse the finding aids by subject or title. You may also e-mail us at speccoll@gmu.edu or call 703-993-2220 if you would like to schedule an appointment, request materials, or if you have questions. Appointments are not necessary to request and view collections.