ow my head, or, distributed systems

My brain, as the Far Side cartoon says, is full.

(Sorry about the lack of posts for a month. School happened. It's winding up now.)

In lieu of actual content, here's the reading list for 6.824, MIT's distributed systems course, which is what's been occupying a large number of my waking hours for the last four months. We built a distributed filesystem for it. Whee, Paxos.

It's frustrating when academic papers don't list their date of publication. I usually end up going through the sources and taking the maximum of the publication dates there as a rough estimate of it, for mental calibration purposes, but that's annoying. It's kind of depressing how many of these project are nominally open source but have been like "we're going to post the source Real Soon Now" since they were published. It's really depressing how few of these projects have obviously become something "real people" (by which I mean developers) actually use. I know other academic papers build on these ideas and closed-source commercial software probably borrows the ideas a lot but we never know, but still, the projects themselves seem like such… dead ends. Without doing a literature search it's hard to get from the paper about project X to all the things which have used it or built on it, except in the relatively rare occasion where the research project has gone on to commercial or open-source success, eg. TOR. It took me a decent bit of digging to figure out that anybody still cared about CoralCDN, for instance — the -announce mailing list last saw substantive traffic in 2006. I kept feeling déjà vu — I'm pretty sure I'd run into the Chord routing model before, for instance — but I couldn't figure out whether I'd read or been told about it in 6.033, or if somebody like OpenFT had picked it up (which I paid attention to when I was in high school, but which is another project that may or may not be dead). My biggest random take-away from this course is that I need to read up on AFS and Zephyr and whatever other distributed systems I use a lot and figure out how the concepts we learned about get applied in those systems.

Ramble ramble. Anyway — I find distributed systems an interesting, if sometimes maddening, area of study, which continue to be relevant with mumble cloud computing mumble peer-to-peer mumble stuff. I want to play with them more.

4 thoughts on “ow my head, or, distributed systems”

  1. invisible applications

    Some projects do become things real people use, but you might need to look at the scale of Amazon or Google for somebody who has a real need for distributed systems. And unless they publish a paper in turn, like Paxos Made Live, the rest of the world may never know. I am worried that the competitive advantages of not publishing are too great and the field will become too much like medical drug research, where the cutting edge is locked away in closed institutions and therefore progressing more slowly. (N.B. I know nothing about drug research, it’s just my impression.)

    Also, I think that most of the research (at least that I’ve read) assumes a network like that available to academic researchers, widely distributed and dynamic with only a few nodes in any one place. Amazons or Googles, on the other hand, have their own datacenters, which changes a lot of things. (Perhaps there’s an unfilled niche for open-source systems that can be deployed on EC2?)

    I do remember this kind of frustration when I was doing my M.Eng. I recall getting the data from Vivaldi (from PDOS) handed down to me from James Cowling like some kind of monastic relic. There’s a part of this inherent to academia (your research might not be used, but it might inspire some other research that is used), and it may just happen to be that distributed systems actually try to have some real-world utility.


  2. This may be selection bias, but I’ve noticed that research in computer graphics processing does tend to get rapidly subsumed into commercial applications. For example, recently there was a demo for “content aware fill” in Adobe CS5, and lo and behold, an open source implementation of a similar algorithm could be found Paul Harrison’s thesis (I’ll note, it’s likely Adobe put in the engineering effort to make the feature really polished, and maybe they’re using a slightly different algorithm). Even just sitting in for some lectures of MIT’s computational photography class, there’s really spiffy research doing impressive things that, if a company sat down and packaged it up for consumer use, would blow out of the water most existing software. But research is always a few steps ahead of industry: that’s what makes it research. 🙂

  3. Chris Lesniewski-Laas

    It takes an order of magnitude more effort to make something real than to make a research prototype. That investment is usually better made by the commercial or open-source world than by academics. I haven’t found the paper trail (of what influenced what) that hard to follow, in general. Google helps a lot. But perhaps it’s partially down to my familiarity with the people involved.

    Incidentally, Mike Freedman gave a follow-up talk about his experiences with deploying and maintaining Coral at NSDI 2010 a month ago. It was the talk right before mine.

  4. Yeah, I found the NSDI ’10 Coral followup paper in my searching. It was fascinating to read when I should have been reading the other papers for the class instead. More like that, please. 🙂

Comments are closed.