Progress continues on the optical media project here in the Preservation Department. All discs in the University Lecture Series collection have been ripped and are now in the process of being permanently stored. If permissions allow, certain lectures will also be uploaded to the Special Collections’ YouTube channel.
Technically speaking, most of the time so far has been spent navigating the Ripstation, a combination hardware/software system designed to rip large number of optical discs. The collection includes more than 1000 discs, primarily formatted as CD-DA, DVD Video, or data discs. Each of these formats requires a slightly different approach to preservation, typically in the form of what software is used.
The majority of the discs recorded before 2010 were formatted as Compact Disc-Digital Audio (CD-DA). Because this is also the format most commercial CDs use, Ripstation’s proprietary software (also called Ripstation) was best-suited, as it is optimized for this sort of collection. Discs were ripped into two different formats for both preservation and access purposes. For preservation, the BWF 96khz/24bit format was selected for its lossless, uncompressed quality and its ability to embed desired metadata within the wrapper’s header, thus greatly reducing the chance of intellectual separation between content and metadata. For access, the .MP3 format was selected, because it is widely accepted and supported as an accessible audio format. In addition, of the available output formats, .MP3 can be most easily transcoded into an .MP4 file to upload to Special Collections’ YouTube access channel, with little risk for losing any data.
This transcoding for the access copies is handled by Adobe Media Encoder, as is uploading directly into the Lecture Series playlist. To match the access copies from the magnetic media, part of the Lecture Series collection that has already been uploaded, the desired output is an .MP4 with audio wrapped inside with a logo (YouTube only accepts video files). After upload, we apply closed captions to all files for accessibility.
For the DVD-Video carriers in the collection, the desired output (perhaps somewhat obviously) differs from the CD-DA carriers. After some experimentation with variants on a data validation workflow, our conclusion was that the optimal output for Special Collections’ purposes was an .ISO disc image, which can be mounted easily as an access copy for researchers.
As the project progressed, some discs we encountered were neither CD-DA nor DVD, but simply data discs onto which .MP3 or other media files had been “dragged and dropped.” These were ripped with the DataGrabber software, and their original file format was maintained.
What metadata Ripstation uses and where it draws them from varies by the software used, which itself varies by the format of optical disc being ripped. Ripstation’s primary software is the program of the same name, which is intended for CD-DA-formatted discs, typically commercial ones. For automatic metadata population, an internet connection is required, so Ripstation can scour private and open-source databases for the artist, album, track titles, and other relevant metadata per disc. Acquiring metadata this way would not be helpful to the project because of the singular and noncommercial nature of the content. Due to this constraint, as well as networking limitations, this particular Ripstation was left offline.
So from where could the software draw its metadata? Ripstation has accounted for this possibility in the design of the User Data feature. Typically, the names of ripped files could be an assigned structure of metadata that would look something like %D_%A_%Y . Each letter corresponds to an established metadata category, so files named according to this structure will look like “[AlbumName]_[AlbumArtistName]_[AlbumDate]”. This system also allows for user-input metadata, in the form of a .TXT file in the program folder. The User Data system, which allows up to 10 user-defined metadata categories (%0 – %9) and can be used with all Ripstation software, is what we used for this project.
Each disc file was named according to its AV number and container number, according to the information available in the masterlist. For later discs with no container number available, that value was substituted with the date of recording. Batches were named with the reference number of the collection, the container number of the first disc, the container number of the last disc, and (if CD-DA or DVD) the disc type.
Now that all 1000 discs have been ripped, the next phase is twofold: 1. Documenting the project (of which this blog post is a part) and 2. For the lectures without permissions restrictions, encoding and uploading to the Special Collection’s YouTube Channel. This process has already begun, with over 5000 minutes of audio made publicly available so far.