I’ve been thinking about the increasing number of large files, mostly music and movies, that I’ve been storing on my hard drive and thinking about how to protect them long-term. When the only means of acquiring multimedia like this was through buying discs (whether a DVD or audio CD), the solution was simple and elegant: rip the disc to your computer, compress it and keep the original disc on your shelves for both a visual memento as well as an off-computer backup whose physical media would last 25+ years. But most of us aren’t getting our files through physical discs, which means using hard drives in various configurations for storing backups. This post will explore best practices for data integrity while keeping the backup setup as simple as possible.
Best Practices for Hard Drive Longevity
The better you take care of your hard drives, the longer they will last. The following tips work not only for your backup drives but should also be implemented for your desktop drives.
- Disable power saving on your hard drives.
- Use a real time SMART monitor to keep a continuous check on your hard drives’ health.
- Research has shown that power cycling your drives on and off is one of the leading causes of disk drive failure. To help them last longer, just let them run and avoid the on/off thermal cycling.
- When you buy a new drive, hammer on it 24/7 for a few days in order to make sure that it’s not prone to errors. If you get a hard drive that shows errors, send it back and if possible request a new drive (not remanufactured) to be its replacement. Some manufacturers (Hitachi) are notorious for using non-quality controlled hard drives as warranty replacements.
Note that once you go RAID, you won’t be able to spin down individual disks (you can only spin down entire arrays RAID disks). The best practice for selecting hard drives for long term backups is to choose enterprise quality hard drives instead of consumer drives for archival storage. Drives meant for the consumer market can time out doing error recovery when part of a RAID setup. This is bigger problem with hardware RAID controllers – software RAID can be configured to be more responsive for doing the actual error recovery, but we’ll get into setting up RAID for data backup later.
Best Manufacturers of Hard Drives
It’s easy to become biased against certain hard drive manufacturers all because they released a batch of bad hard drives. Forming biases is human nature, but you’ll have to be even more astute to sift the gold from the dross when it comes to hard drives.
- Do your research toward specific models, not necessarily particular brands. For example, Western Digital is a really good and reliable brand for hard drives. But Western Digital Greens aren’t proving very reliable right now for RAID setups. This is in part because of their power saving features, as well as not functioning very well in an array setup because of their variable RPM design. A fixed RPM is necessary to synch the rotational speed within the RAID array. WD Reds are the exact model that have the best reviews. In particular, a WD Red 3TB drive is able to fit all of its data on only 3 platters and features a 3 year warranty.
- As a general rule, stay away from the most dense pressings of hard drives, i.e. the highest capacity that a manufacturer offers. The problem with the cutting edge is that you can bleed, so allow other people to be the beta testers and pick up a manufacturer’s highest capacity only after it’s gone through a few revisions.
- Putting that all together, here’s a good Amazon search to find what we’re looking for in terms of reliable hard drives:
WD Enterprise Hard Drive
What About LTO5 / LTO6 Tape for Storage?
I’ve thought about using tape for long term archival data storage and I just wanted to mention it because it’s a legitimate option with some drawbacks. You can probably find companies getting rid of used LTO5 gear who are moving to the newer LTO6 tape standard. Here are some drawbacks of using tape for long-term data storage:
- The 3TB quoted capacity for tapes is for highly compressible data. For already compressed data, such as compressed movies (xvid / h.264) or music (mp3 / flac) files, there’s basically nothing left to compress and you will only get about 50% of the advertised capacity.
- Blank tape is cheap. The actual tape drive is expensive though, almost $2k.
- You have to keep track of all of the tapes and regularly perform read integrity checks to make sure your data is sound. The only way to speed up the process is to buy multiple tape drives which is cost prohibitive.
Raid – Do it In Software
Once you have your hard drives, it’s time to put those drives to work and back up your data. This is where a RAID setup comes in, providing redundancy and error correction for your backup’s data integrity. I can’t give specific recommendations as each setup will be unique to the amount and kind of data being stored, but here are a few general tips regarding setting up RAID.
- You don’t need hardware support to setup a proper RAID array. This also implies staying away from motherboard RAID, which uses proprietary low-level software to implement their “hardware RAID”. Once you settle on implementing RAID in software (which is what I recommend), then any motherboard will do.
- RAID types: JBOD means “just a bunch of disks” and is one of the worst ways to setup your RAID array for long term storage as their is no error correction or redundancy in this setup.
- Hard drive spanning can be better implemented using OS-level software as mentioned earlier. This could be special NAS Linux distros (FreeNAS / NAS4Free) or within Windows (Windows Home Server).
- RAID 6 is what has been recommended to me as the best of the current RAID setups for backup purposes. RAID 6 beats RAID 5 for data recovery purposes: RAID 6 makes it less likely that you expose yourself to another drive failure while rebuilding a failed drive. There are some drawbacks with using RAID 6: you lose some throughput speed and you introduce some overhead for the parity bit, but those are acceptable costs for the benefits it provides. RAID 6 requires at least 4 drives.
- ECC RAM is recommended for reliability purposes when it comes to selecting RAM for your software RAID setup. ECC RAM is itself fairly inexpensive. The added cost increase comes when you buy a compatible motherboard. The recommended amount of RAM you’ll need for your RAID setup is 1GB per TB of storage, though that guideline is only for optimal performance for high-IOPS workloads. You should be fine using less RAM for a RAID backup setup.
- Use ZFS as the file system format: it’s checksum features along with its self-healing abilities makes it more robust than ext4. ZFS is an available option on Linux NAS distros like NAS4Free and FreeNAS.
What About Buying a NAS?
Buying a hardware solution is actually a pretty good choice for those who aren’t DIY-inclined (at least not when it comes to setting up hard drive storage arrays). Based on my prelimnary research, here are the top two choices for NAS hardware controllers:
Drobo from what I’ve read is not recommended due to reliability problems.
The Synology + series had great reliability reports and users rave at the functionality. Don’t take my word for it: you can read the reviews here. In particular, the Synology 415+ at the time of this writing is a solid performer and pairs well with WD Red 3TB drives.
The drawback of running a proprietary NAS RAID system is that if the unit itself fails, you have to buy a model from the same manufacturer to get your data back. If you use a linux distro like the previously mentioned http://www.nas4free.org/, then you can be up and running on any old computer that’s powerful enough (with 1 GB of ECC ram for every 1 TB of storage in your Nas4Free RAID setup). FreeNas is the other free software NAS distribution that’s already been mentioned. Another option for anyone that considers setting up their own Linux distro to be daunting is to use something like FlexRAID and their transparent RAID system. It’s a good Windows / Linux offering for those looking for a commercial product with support at less than $100 for software RAID.
A Final Note for Torrenters
I realize that some people reading this article might be torrenters, which means they have gobs and gobs of data that they’ve downloaded and are looking for the best way to store a backup copy in case their hard drive(s) crash. These are some special notes for this population, based on my own experience as well as observing friends who torrent.
- Separate your real data – important digital documents / family photos – from your torrent data. Make sure that you first secure your important data – a RAID setup like the one covered above / an off-site service like CrashPlan / Apple’s Time Machine for Mac users – before worrying about backing up your torrent data.
- Consider dedicating a spare PC with a few multi-terabyte drives to function as your seedbox. You can use any minimal linux distro, including the NAS distos like NAS4Free mentioned earlier. Headless is nice as you spare the expense of having to provide a keyboard and monitor for the machine. To lower your energy costs, consider purchasing a Mac Mini or an Intel NUC (“Next Unit of Computing”).
- Consider printing out a directory listing of your music folders / movies. If you ever needed to recreate these files from a hard drive failure, you’d just need to reload the associated .torrent files (you know, the files you downloaded to get the files in the first place) so make sure you save them in the right place. A completely low-tech solution is just to use the printout of your directory structure and manually search for the files on your tracker of choice should you need to retrieve them. Note that the drawback of using a tracker as your ‘backup’ is that you’ll incur a ratio hit when you re-download your files … and of course your tracker might go bye-bye. Trackers in general aren’t known for their longevity, but even if your favorite tracker goes down there will likely be others to take its place.
- Don’t use a NAS for torrenting. I mean, you can – the Synology even has the Transmission client baked in, but it’s not recommended. A NAS, like RAID, offers parity and redundancy. Compare a single drive on a desktop PC which would incur minimal read and writes that are buffered through a cache layer. A NAS writes in stripes and multiplies the wear across all drives. A NAS also presents a security hole as your network router, together with the NAS software, are the only wall between it and the Internet.
- Take a look at your downloading habits – are you engaging in what amounts to digital hoarding? If you continually fill up your hard drives with stuff you accumulate online, you might need to cut back on the acquiring and start actually consuming what you download.