Libraries, unbeknownst to many, are pretty savvy places when it comes to mass storage. While our computational needs are often fairly typical–largely we run Websites, however large and complex–we often have myriad systems that house terabyte upon terabyte of data. In particular, digitized works and now the digitally-born files we receive through the archives can consume TBs at a rapid clip.
What is better known is that libraries are chronically underfunded, so devising ways to create storage without going broke is sort of the holy grail quest for us. It’s easy to put out an RFP for a storage array that will return myriad bids at a price that can be staggering. Yes, if you need high availability storage to support a business operation, then you do need this kind of storage, but for digital collections it’s overkill. While a library such as ours may only need 20-30TB to meet today’s needs (actually, we need more, but I’m referring to the actual size of our digital collections, not the redundant copies and other stuff that resides in our arrays), it’s wise to assume that within five years this sum is likely to increase three- to five-fold, and steadily increase beyond that at a rate greater than an arithmetic progression.
All that said, I would bracket out digital collections from what follows, since we do use arrays of enterprise disks to serve those files and store their backups, and will be using the soon-to-be-launched Ontario Library Research Cloud (formerly known as ODLRC) for our preservation copy. Even excluding that large pile of files, we have storage needs that seem to emerge weekly that we need to address. These needs typically are not longterm in the same way that we commit to our digital archives, so we need this storage to be both agile–quickly deployable and reclaimable–as well as inexpensive.
That’s where Backblaze and its pioneering work with developing their own type of consumer-disk storage array comes in. Way back in 2009, Backblaze took the unusual step of releasing the design specs for their storage array, essentially inviting others to copy it and build their own. In the intervening years, they have released updated specs for a version 2.0, 3.0, and 4.0 of the pod. Along the way, Protocase, the sheet metal fabricator that was producing the metal chassis for the pods, expanded and began selling not only the chassis to anyone who wanted one, but also a fully wired model that lacked only the 45 drives. This has apparently gone well for them, so well that they’ve now established a division known as 45 Drives that sells these pods.
Back in 2012 we bought a v2.0 pod from Protocase that houses 3TB drives, so 135TB raw capacity, and just recently acquired a v4.0 pod from 45 Drives that has 4TB drives for 180TB raw. The total cost for both pods was in the vicinity of $33,000, which means $105/TB or $.10/GB. Compared to even the least expensive enterprise storage, these are staggeringly economical figures, as Backblaze pointed out long ago when they launched the first pod specs.
We don’t intend to use these for archival storage–although with enough of them properly configured we probably could–but we have identified such a wide range of storage needs within our own organization (McMaster’s University Library) and in the Sherman Centre researcher pool that we simply must have flexible, low-cost storage that we can provision in large chunks and quickly.
The two pods pictured (v2.0 is red, v4.0 yellow) are now racked up and sitting behind an Eaton UPS in the basement of one of our libraries (not in our main data centre to aid with disaster recovery routines). As we move forward with building the software layer and services supported by these boxes, we will post more and share what we’ve done for others who might be considering such a configuration in their own space.
One last note: a 4U, all-steel server with 45 drives in it is the heaviest thing you’ll find in a data centre except for the lead-acid battery modules in a UPS. Heaving it around while racking it up is quite the workout. Bonus!