Friday, October 30, 2009

More On Backup

I have just inventoried my processed images - my keepers if you will - 380 gig of images that I cannot afford to lose (after all they produced two books).

At 15 cents per gig per month, my monthly charge (assuming no access) would be $2 for Jungle Disk, $57 for the storage and extra fees if I upload or access any of my stored files. This amounts to something in the order of under $1000 per year including those upload fees. I can think of a lot of things I can do with $1000 and probably so can you. Certainly one more 1 TB external drive will be around $200, but doesn't automatically back up and disconnect from power and any other connections that could be damaged by a lightning bolt or major power surge.

I do take my image files to the office for off site storage, but do it far less often than I really should. Perhaps I should just cough up.

Now, the vast majority of those images don't get edited month to month - probably fewer than 5% get edited in a year - after all they have taken 6 years of digital work to collect. Put that way, the sensible thing to do would be to back up everything I now have to a hard disk, unplug it and take it off site (as well as keeping local backups with my Drobo). I could then use something like S3 to back up all the new edited images (and any changed images from before) and once a year, put them onto a new hard drive, delete everything at S3 and start over with new edited images yet again.

In 10 years I will have accumulated 10 partially filled off site hard drives but only paying Amazon to store one years worth of images. This way I get automated off site storage (almost no effort on my part) for a modest sum of money (and once a year I have to back up my images to a hard drive. Given that I can back up all my processed iamges to one drive, it probably even makes sense to back up everything to each yearly drive so there is redundancy even in the off side hard drives.

What are you thinking about reliable painless off site backup?


Omar said...

What you're talking about now is archiving in a sense. I've been thinking about that too as a way to cut the S3 charges.

I hate the idea of having my offsite backup in an electronic device (hard drive), but there doesn't seem to be any low cost alternative.

Let's assume in 10 years you have a catastrophe requiring you to recover your files with the off-site backups. You get the current year or two from S3, but need to use your drives for the rest. What hard drive interface will be available at that time? Will USB or SATA (the two current methods on the Windows world) be on your computer at that time?

With a bit of scrambling, I'm sure it will be possible to find a solution, but what about in 15 years? Not only do you have to store off-site copies of your files, you also have to make sure you have off-site access to hardware that will run those drives, or read the dvd's, or whatever.

This is where my head starts to hurt, and I go back and look at some photography web sites.

George Purvis said...

Hi George,

Have you considered that
1. As Omar said, drive interfaces and recording technologies evolve about every five years. Ten years from now it may be difficult to find a way to read the images on the drive.
2. Drives are designed to rotate. Will a drive that has been turned of for ten years even start up again? Will the heads have frozen in place? Will the drives spin up? Will the magnetic record still be there? Would your car start up after sitting in the garage for ten years?
Don't you think that an archiving plan needs to include testing and migration of the data to new media at reasonable intervals (3-5 years?).

In practice, archiving and backups are hindered both in time and expense by the shear quantity of data that is changed. The amount of data that changes is heavily dependent upon workflow.

A non-destructive workflow that stores edits in sidecar files results in much less modified data than a workflow that modifies the image files. In a non-destructive workflow where all image edits are retained in sidecar files, the original image never changes. Sidecar files are usually much smaller than the image files (e.g. in Lightroom). While small, sidecar files represent a substantial investment of time and talent by the photographer and should be archived shortly after being changed.

The archiving problem can be divided into two:
1. archiving the large quantity of immutable data in the images once and for all in several places, and
2. quickly backing up small amounts of easily identifiable sidecar data that is changing

Providing that there is an immutable link between the images and the sidecar files (e.g. the image filename is never changed), the images need never be archived again, even if they are copied to other directories or other computers and used for editing or printing.


Omar said...

One more thought. Jungle Disk may be coming out with a feature I used in a previous (now defunct) solution. Rather than storing the backup on S3, we may be able to store it on any internet accessible computer. Say, your business or home, or family member's computer. Essentially, it's an internet based sync solution.

If this feature gets added to Jungle Disk, you can cut a lot of the cost out of the storage.

Martin said...

Have you considered a simple web hosting package instead of S3? Unlimited disk and data transfer at networksolutions for example is available for $27.18/month. If you "only" have your 380GB, you might even get away with $12.41/month and 5TB monthly transfer.

Take a software like ChronoSync on the Mac and you can automate the process using FTP. After an initial upload you only need to sync the changes.

Paul Bailey said...

You know what. I back up on archival gold discs, an internal 1 Tbite drive, an external disconnected drive and then make an old fashioned print.
George, I have a sneaky suspicion that the print may be the most accessible form in future.