Verbs for Devices

Much as I did last year, I’m writing a proposal for research in a particular field, that would be cool if someone (perhaps me) would pursue it; this year, as I’m finishing up an independent study in human-computer interaction– interestingly, with one of the same professors from that course last year, who was cool enough I was inspired not just to take this course, but actually to have him advise me on my thesis as a whole. I haven’t been blogging much as I finish said thesis, and the myriad other things at the end of five years at Hopkins, but I thought that this proposal might interest a few of you. Enjoy.

“He glances up and grabs a pigeon, crops the shot, and squirts it at his weblog to show he’s arrived.” —Accelerando, Charles Stross

In a nutshell, my research proposal is to make the above quote possible. More specifically, the greatest lack in human-computer interaction is the lack of integrated personal networks that share not just a common networking protocol, but a common set of verbs. My research proposal would be to create a lightweight standard for verb-based interaction between discrete software and hardware agents.

It is probably most useful to examine the discrete steps in the overly concise sentence above, to see where current technology is lacking. We will assume, for the time being, that the devices share at their base a common network– something along the lines of the MIT Media Lab ‘MIThril’ project’s Body Network; obviously, battery and wireless functions have sufficiently advanced since their 2003 research that we could rely on discrete power sources to power our devices, and thus use a wireless standard, such as Bluetooth or XBee, for communication (XBee provides a better model from which to work, as it is not dependent on a master-slave setup, but either could work). So then, there are several discrete steps involved:

Human directs camera to aquire a photo.
Camera sends photo to core computer.
Human manipulates photo.
Human sends photo to weblog.

All of this can be done using current technology, of course– but it’s hundreds, if not thousands, of button and mouse clicks; first, I have to take the camera out. Then I have to click the button. Then I have to plug the camera into my computer. Then I have to tell the computer to get the photo. Then I have to open Photoshop to crop the image. Then I have to upload the photo to my website. Then I have to create a blog post incorporating the photo. 45 minutes later, this is no longer cool, or even vaguely interesting.

Using smartphones as universal devices has allowed us to cut some of these steps out; I can now take a photo, manipulate and tag it, and upload it to My Flickr Stream all in one iPhone program. I have to make some sacrifices to do so, however: first among them, I have to use the iPhone’s tiny, mediocre camera. It’s true that the iPhone camera will likely be upgraded soon, but even then, it will only be equal to a decent camera from eight years ago– not, say, a modern DSLR camera. If I use a DSLR, I then am at the mercy of all the steps outlined above; even if I use a (really neat) Eye-Fi card, I still have to do the uploading and blogging steps manually, because nothing speaks all of those languages– the photo upload, the blogging, the editing, and the camera.

Allow me, then, to propose an alternative. I take a photo with my camera, and it just dumps it on my personal network. My photo editor– whether that’s on my phone, my laptop, my wearable computer, or my neural net– sees a photo on the wires (metaphorical) and grabs it; when it’s done, it dumps it back on the network, with the instruction that something blog it. Then, a device who understands blogging can handle the upload and blogging of the photo; if it pushes it to Flickr, perhaps it can also send an AtomPub squirt out for the blog.

So, aside from a bit more convenience, what’s the power here? The power is that we don’t need direct integration. Currently, if you want everything actually to work, you buy Apple. Why is that? Because I know that my iPhone speaks the same language as my MacBook, and that if I want them to send a photo over to my TV, I can use an Apple TV box and it’ll “just work.” If I then want my photos backed up, my Time Capsule will (probably) handle it. And that’s great; if I want to have my entire life handled for me, all I need is, say, $10-15 thousand dollars, and I can just have a wonderful iUniverse. What Apple has done here is not teaching one device to do everything: my Macbook doesn’t know how to talk to a TV, and my TV doesn’t need to know how to manage rotating backups from across my network. Instead, they’ve figured out the lesson UNIX people learned decades ago: make a bunch of small tools, each of which does something really well, and then just chain them together. Apple could be better about this, but currently, it’s the best there is.

The thing is, while I’m a happy Mac person, I don’t want to be locked in to the iUniverse. I want to have a MyUniverse, with MyTools and MyStuff, because while Apple stuff does indeed work, it’s not always the best in its class; XBMC has more features than Apple TV, and a Drobo is a much better storage machine than a Time Capsule. To go toward our original example, if Apple ever makes a camera, it might be very nice, it will certainly be shiny and white (or brushed aluminum), but it may not be the best in its class, or it may just not be the one I want. It would be nice if everything– Apple or not– could speak this common language.

I would suggest, as a step toward the common language, to take a page from the Activity Strea.ms book, and use verbs as the common thread; that way, to swap in a device, all I have to do is teach it to respond to the same verb– for instance, “blog this” or “play this” or “display this,” and then everything can interact with it in just the same way. We have this in some systems already, of course– the Internet has lots of really good standardizations of API, like the RESTful idea, which is indeed verb-based– but not in the local field, and that’s where we need to go.

Thus, if given a year to do a major research project in HCI, that’s what I would suggest: that we figure out a common language for all these discrete things to speak, and then spend some time teaching the devices actually to speak it. That way, when we go from wearable computers to implantable computers, I don’t need to teach my camera how to interact with its new semi-robotic overlord, or indeed, its new semi-robotic overlord what a camera is: they can both speak a common language for actions, and all will be well. This will enable human-computer interaction by letting the human stop messing with it, which is what most humans actually want in their computer interaction; they want the power of infinite flexibility, but the ability to have things just work, without comment or complaint or five thousand mouse clicks.