What if a Hard Drive is Only a Cache?
When computers like the MacBook Air started coming equipped with solid state drives (SSDs), I thought the technology is cool, but it seemed like a drag to have to give up 50% or more of the storage capacity we’ve grown accustomed to in our hard drives. My laptop has 250GB of capacity. As of this writing I’m using up around 138GB of it. I started looking for ways to economize on space. I went digging around for junk I could throw away, and I found that this is a near-impossible task for a few reasons:
- There is plenty of stuff on here that might be junk, but I don’t know whether or not my computer needs it to run.
- There is plenty of stuff that I know my computer does not need to run, but what if I need it someday?
- Even if I did have a good method, sorting out the useful from the not useful would be a tedious, full-time chore.
That’s when it hit me: storage is cheap and networks are near-ubiquitous. Google’s Gmail service lets you keep all your messages for perpetuity, so why not extend this model to every file or document you’ve ever used? If your computer worked like this, your hard drive would only store the system files and documents that you need right now and push disused items into “suspended animation” in the cloud.
Computers work like this already on a small scale. If your CPU can’t find the data it wants in its level 2 cache, it goes and grabs it from main memory. Main memory can create the illusion that it is (for all intents and purposes) infinite by using space on the disk for infrequently-accessed data. You take a performance penalty when you have to hit the disk, but it’s worth it for the breathing room it gives your applications.
Now imagine that your high-performance hard drive is nothing more than a “level 5” cache. All the system files and documents that you are actively using will be immediately available, and if you want something you haven’t touched in 12 months, your computer will transparently pull down a compressed copy from the cloud. You would notice a lag, especially if the file is large, but in exchange you receive these benefits:
- Your storage will be infinite, for all intents and purposes. Amazon S3 shows that this could be affordable.
- You will never have to throw anything away unless you really, really want to. In the beginning, Gmail didn’t have a delete feature, and it freaked people out so they put it in.
- You will never have to do housekeeping on your files. They key here is a first-rate index.
- You will only upgrade your hard drive if you want better performance, not because you ran out of space.
- You would have access to all your files from any device connected to the network.
Caching techniques are some of the purest of pure computer science, and way over my head, but they can be very sophisticated, including pre-emptive fetching. It should be possible for the “cloud cache” to appear to be able to read your mind.
If you want more evidence that this would work, consider this:
- Imagine that you have a huge video library, and could not imagine storing it in the cloud. Now go to http://hulu.com. Surprise, your video library is in the cloud. A download is nothing other than a local cache miss.
- Your browser already uses a small fraction of your hard drive for caching. We could just turn it up.
- For an example of an application of an always-full hard drive, look at Apple’s Time Machine. Once your backup drive is full, it quietly moves the oldest material off (in this case, to oblivion, but you get the idea).
If only I were pursuing a PhD, I could work on this. *chuckle* Someone else is going to have to do it.
Comments
I have to admit, I’ve been living the dream these last few years
Code in SVN, Email in IMAP, Calendars in a network calendar, Contacts via .Mac & Plaxo… and there’s surprisingly little else of value on my system these days. Word docs? What value in those
?
Leave a Comment