May 2008 Archives

Like a Cylinder

| 0 Comments | 1 TrackBack

Dear readers,

At long last, I've done it! I've left a school in a normal amount of time!

(For reference, I spent only five years in elementary school, then I spent three years in one high school before transferring to an IB school, which is a mandatory two-year program; spending five years in high school is somewhat difficult to explain to the neighbors, regardless of its merit.)

But this time, I've actually finished Hopkins, in eight semesters! W00t! And I got to wear shiny white and maroon honor cords as well (UPE), which is always nice. (Some people were wearing 8+ cords, which seems like a lot to me; presumably they were highly honorable, or at least colorful.)

Of course, I'm not leaving Hopkins; that would be too easy. I finished my Bachelor of Science degree, but I've now ascended from Concurrent BS/MSE status to fulltime Graduate status; while I was a grad student before, this makes the whole situation much easier to explain. (Also I can take far fewer classes that I don't like, which is a nice side benefit of grad school.) So I'm still planning to finish my MSE in one additional year; all that's necessary is six classes, and a thesis. (It doesn't sound so bad when reading it on the page, so for correct context, please read "six classes," then turn on this music, then read "a thesis." Got it now?)

So my parents and brother came out from Montana / down from Yale on the 15th, and only left yesterday, for a full two weeks of adventures. And packing. (A lot of packing; I'm leaving my apartment for another even closer to campus (technically, the same distance from campus-- across the street-- but closer to the School of Engineering, where my classes are), and Shannon's leaving her apartment for one underneath my new apartment. So that was interesting, but much easier with all the help.)

And we went up to Yale to hear Quinlan play (in a pretty chapel) before he leaves to go on tour in Italy with the YSO, then to New York City to see Spamalot on Broadway, which was fantastic. Now they've taken my cat (Lysistrata) back to Montana with them for the summer, and I'm staying in Virginia for the last days until I fly to San Francisco, CA Saturday morning! Yay!

Other fun things: * I got a new iPhone today-- after only 60-odd days. :-( Thank goodness for AppleCare-- and also for the Apple approach to replacements, which was awesome. I described my problems (short version: "it don't work," with crashes, lockups, batteries dying, and other nastiness), they checked it over for about 45 seconds at the Genius Bar, then said "yep, seems bad," and handed me a new phone! Compare this to my blood pressure ticking up to stroke levels to get Dell to acknowledge anything bad exists in the universe-- I think my Apple switch will be permanent, as I'm in love. (Although I'll make sure to renew AppleCare.)

  • We had a new Upsilon Pi Epsilon initiation, of these guys, and I got elected President-- a completely powerless position (it's an honor society, not a legion of doom), but it's fun nonetheless. (In related news, should anyone wish to join a legion of doom with me at its helm, please contact me.)

  • I did a fun Information Retrieval final project, which consisted of graphing XFN relationships. I could only run the simulation out to one level of contacts (because then Kevin Rose's friends overwhelmed any static graph), but it was fun, and my professor seemed to enjoy it.

  • I finally broke down and bought a Wii, which is incredibly fun to play with.

So then, off to California in just over 24 hours (as I leave in the middle of the night); I'm excited. Time to run for now!

Evil

| 0 Comments | 0 TrackBacks

Well, the last month has been exciting, in more than one way. I've been meaning to post a long rant about the evils of my roommate (who recently lost his mind), but heck, the bullet points speak for themselves, and I'm not in the mood for a rant, so here goes:

  • He was thrown out of Hopkins last May (2007), after signing a lease with me for the year.
  • He eventually decided to come back to Baltimore, to work for some deadbeat games company.
  • He started dating a (clinically, as diagnosed) schizoid woman, who makes up stories and tells them to people. (Hopkins threw her out as well.)
  • Around December, said woman makes up a particularly devastating set of lies about me, and then informs him he can't be friends with me anymore (this after four years of rooming together.) She also tells all of my friends said lies, in conjunction with her roommate.
  • Things proceed in ever-worse terms until April, when he suddenly moves most of his stuff out in the middle of the night, without warning. He also steals some of my things.
  • In May, he and his girlfriend proceed to institute a reign of terror, vandalizing my apartment, throwing things through my windows, and, after classes were over, destroying a large portion of my clothing. They then threatened the life of my cat, necessitating an emergency moveout of both of us. (Don't worry; she's safe, and now in Montana with my parents.)
  • He then proceeded to go to a Dean at Hopkins, lie about his standing, and rant for several hours. (Luckily, said Dean thinks he's insane; it is often useful to have a working relationship with the administration, and since I'm the leader of two student groups, I've tried to cultivate one.)
  • Finally, in a move straight out of Jack Thompson's playbook, he wrote a letter to my parents, alleging all sorts of horrible things. Fortunately, as my parents were in my apartment as he and his girlfriend proceeded to vandalize it, this didn't have a lot of impact.

So here it is, dear readers: be careful who you date. They can really screw you up. My parents have always said this, of course, and I know they appreciate my finally agreeing with them. (Though I haven't had any problems of that sort, it's a useful object lesson.)

I've found even listing the above so downbeat that I'm going to write a second blog post with all the much more fun things that've happened in the interim; go read that for how I actually feel.

New Tech

| 0 Comments | 0 TrackBacks

So I've been busy the last few days, doing a variety of work-type things, but also trying to occasionally explore fun new tech, or changing my old tech, as the case may be.

For instance, while creating my last post, I did what I often do, composing in a plain-text editor and then pasting into Movable Type; the only difference was that this time, MT completely freaked out at me, and refused therefore to let me use its nice little "make link," "make bullet point," etc. buttons. I'm not sure whether it was because Firefox was freaking out, having eaten most of the memory on my system, or whether MT was angry at how long my post was (nearly five printed pages, single-spaced; not long for a paper, but huge for a blog post), or precisely what happened, but after twenty minutes of futzing with it, I realized I didn't have the time to cajole it into working, and spend a solid hour making all the links linked and the bullets bulleted, whipping out my manual (X)HTML skills of yore. Which, while I don't mind doing it, was quite a bit slower than making MT do it for me.

Afterward, I thought to myself. "Self," I thought, "isn't there supposed to be some good way to do lightweight markup in text for exactly this reason?" "Yes," I thought, "there is, but I don't know it." Well, it's time, then, to learn another new language, and noting that Movable Type supports not one, but two such languages for that very purpose (Textile and Markdown), I set about the documentation for each, and eventually settled on the latter. This post was therefore written in Markdown, and I have to say, it's nice; it's not a revolutionary change or anything, but I'm happy with it. (Textile seemed a bit too lightweight, frankly; more "silly snippets of bloggery" than anything I'd write anything in, and now that I can submit whole papers as blog posts, who knows what I'll need from it next?)

Another wonderful experience, with the same post; as I was preparing to upload the text I'd written and format it with MT, I went to log in to Movable Type, as usual, on my OLPC in class. As I did so, I encountered the following error:

Access to this site has been denied.

"Uh, why?"

Access to the category "Sex" is disallowed by your site policy.

"Buh?"

That's right, folks. You know, I've been accused of many things, but "running a porn site" hasn't been one of them previously. Obviously, this is yet another abusive content filter, running amok-- but it's fairly awful, sitting at one of the world's great universities. This, then, provoked an angry letter to IT@JH-- who (naturally) still haven't had the guts to respond. Congratulations, guys, you've made life better at Hopkins again. (These are the same people who block SSH access to Hopkins computers on the wireless network; understand that anyone on the Internet can SSH to these computers, except people on the wireless network on our very own campus. Gotta love it.) In any case, this did make for an amusing story to tell my Digital Preservation class later that day.

In other news, I've been looking into ways to do GTD without carrying around both my HipsterPDA and my iPhone all the time, and while I was initially suspicious of it, Remember the Milk has really won me over. It provides all the functionality I need from the tasks portion of my HipsterPDA, and it has a great iPhone interface to replace printing cards from my GTDTiddlyWiki. Part of the total solution, as well, is to move my agenda back out of the wiki and into a calendar-- this time (as I'm now on Mac, and have the nice iPhone integration), iCal. All of it flows together nicely, and now I have fewer things to take with me out the door. The only thing remaining: put my nicely shrunk-down PDFs that I carry with me in there (a Penalty Guide reference card for Magic, schedules for MARC and the JHMI Shuttle, etc.) somewhere the iPhone can get to them quickly and directly.

Graduation is only 16 days away; my parents show up in 9 days, which should be a good time; I'm moving again, so they get to help with that, but otherwise much partying is in order. Then up to Yale to hear Quinlan play, then to New York City for a bit, then back to Baltimore for a few days before they leave, then I leave just after them for San Francisco! My housing for the next twelve months is taken care of (ugg, signing two leases (and putting down two deposits) at once is no fun, but it's all handled from now, through my internship, and my last year at Hopkins), Valleywag just published some great photos of the Six Apart offices, and the whole thing seems quite exciting!

In the meantime, though, I need to go write a paper, do a project, study for three finals, and figure out how to pack my apartment up yet again.

Number of outstanding tasks: 11

This is my final project for 600.409, Digital Preservation, here at Hopkins; it's been an incredibly fun and rewarding seminar with four professors (for our eight students): Randal Burns, Sayeed Choudhury, Tim DiLauro, and John Griffin (winner of the minimalism award for his business website). They've been great to learn from for the entire semester.

Our final project requirements are fairly open, basically requiring real thought about the preservation of digital artifacts, whatever (after a semester of discussion) we've construed that to mean. In clearly the best project requirements sheet ever, after a long discussion of the requirements and format, the professors state

"Students are not required to follow the above format, and may instead propose any project leading to any deliverable that represents an equivalent level of effort."

Later, in the evaluation section, they state

"Projects will be evaluated on the process by which the results are obtained, not on the strength of the results themselves. Unusual hypotheses are encouraged and negative results are acceptable. It's the journey, not the destination."

Clearly, a great assignment. One last metanote, then I'll start with what I actually wanted to explore: after I asked how to deliver my paper (one of the suggested formats was a web page of some sort), the response said that possible acceptable formats would include:

  • "Delivering a PDF file by email.
  • Setting up a web page visible only to your friendly instructors.
  • Setting up a web page visible to the whole world.
  • Posting your writeup on your blog.
  • Starting a flame war in a newsgroup summarizing your major points.
  • Digging your results.
  • Etc."

Wow. :-)

So then, on to the actual work.

For my project, I wanted to explore digital archives (on which we've spent quite a lot of time), from the angle of "How should scholars get access to a digital archive?" This investigation started from my experience obtaining a reader card at the Library of Congress, so I will begin my exploration with that, very real, scholarly repository.

To access the Library of Congress, one needs to meet their researcher requirement:

"The Library is open to all researchers above high school age (18 years or older) possessing a valid photo identification (e.g. driver's license, passport) with a current address." Library of Congress FAQ

So this is not a highly restrictive archive-- one would hope that one would have a scholarly purpose, but this is not (at this time) strictly required. In this case, one then proceeds to fill out a form asking for a bit of personal information (a current address is about as stringent as it gets, although they do ask survey questions relating to what you intend to study). Then they take your photo, and hand you a card; while the website says that it expires in two years, I was told that it does not actually ever expire; as long as one keeps the card, it can be used in perpetuity (leading to some researchers, now of a rather advanced age, still having their decades-old card with their original photo of a bright-eyed graduate student). With this card, then, one can access most of the resources of the Library. (A few reading rooms have much more stringent requirements-- in most cases, to protect exceedingly rare and valuable books, but they aren't particularly relevant to the digital analogy-- data is not usually fragile in the same way, and so wouldn't be protected for those reasons.)

So wonderful; I have a card identifying me as a reader (with the traditional terrible government-issue photograph), I can make my librarian mother proud that her son is now a "real" scholar. If I lose it, I can go back to the registration area, and they will reconfirm my physical address, then give me a new one. They will never update my photo or any of the other information. When I go to use a reading room, I then present the card; while it has a barcode on it that could be used to confirm it's real, the attendant simply compares my picture to the card, and lets me in-- even forty years after I first obtained the card.

This idea boils down to a Shibboleth (in the Biblical sense; the academic sense we'll get to later):

"Gilead then cut Ephraim off from the fords of the Jordan, and whenever Ephraimite fugitives said, 'Let me cross,' the men of Gilead would ask, 'Are you an Ephraimite?' If he said, 'No,' they then said, 'Very well, say Shibboleth.' If anyone said, 'Sibboleth', because he could not pronounce it, then they would seize him and kill him by the fords of the Jordan. Forty-two thousand Ephraimites fell on this occasion." Judges 12:5-6, New Jerusalem Bible

"Speak 'friend' and enter." [Lord of the Rings]

The latter, obviously, for the more geeky / less religious in the audience; they are equivalent stories, in that I'm allowed access because I say (in the correct way for the resource I'm trying to access) that I would like to have access, and that I am a scholar who might benefit from it; I am then taken at my word.

Let's suppose, though, that the Library really did check whether I was affiliated with a legitimate research institution-- they could call Hopkins, and Hopkins would likely tell them (maybe not with FERPA, but probably) that I was a graduate student in good standing. Then they would let me in. Perhaps they would do that on an ongoing basis, but given that their cards are good forever, probably not.

So then, we have three use situations, and four added advantages, to the Reader Card system as it currently stands. The three use situations are:

  • First-Time Access - Registering as a reader and obtaining the card
  • Ongoing Access - Using the Reading Rooms
  • Lost Token - Going back and getting a new card

The four additional advantages of the Reader Card at the physical Library are:

  • To get some sense of the total number of people who use the library
    • In addition, of course, to the log books at the doors, but having a "total number of users" is different and useful to compare to "total visitors."
  • To collect basic statistical information on the Readers.
    • Where I'm from / what university I represent might be something nice to know.
  • To help to discourage passers-by from using the facilities just to use them.
    • Now, this is somewhat controversial (aren't libraries supposed to be *giving* information?), but it's not without merit to think the Library might have an interest in dissuading tourists from going in and poking around just for its own sake; as a research library (rather than a community library), it has a specific goal, and the millions of visitors to Washington, D.C. aren't necessarily interested in that work. Tourists have a glass cage they can stand in to see the Main Reading Room, if that's all they'd like to do. (As a teenager, I was highly indignant to be denied access on general principle, but I digress.) The simple act of making people go to a basement and register does perform basic throttling.
  • To help the Library provide more targeted services on an as-needed basis.
    • For instance, my user account, represented by the card, can be granted access to restricted areas of the Library.

So then, now let us turn our attention to digital identity, as we create a digital scholarly repository. We must preserve the three use situations, and we'd like to preserve the additional advantages (and create more) if possible, through this transition. Let us then consider several alternatives.

  • Unfettered access.
    • Anyone could have access, no need for registration. The data's there, why not use it?
    • This makes some sense for archives whose contents are not restricted; for instance, Project Gutenberg is a digital library for whom "checking out a book" doesn't require an account. It works much less well if the contents are sensitive, if they're restricted in some way (due to copyright, for example), or if having limitless access would hurt the archive in some way (for instance, an infinite number of users hammering on a video archive would likely cause access issues for scholars).
    • We get to skip all the overhead of having to maintain user information, and we still get to use normal techniques to get total users / total visits information (much as we get on any web page). We lose all the other advantages, however.
  • "Free Signup"
    • This is the approach that, for instance, the New York Times[http://nytimes.com/] uses. Give them your email address, create a password, and they'll give you access to everything.
    • There are standard methods for dealing with the three situations of normal use, and depending on the questions that are asked during signup, one can retrieve the statistical information, and possibly targeted services-- or at least, targeted ads; the latter may be helpful in maintaining a digital repository through offsetting operating costs.
    • If one requires a scholarly email address for signup, one can even get the scholarly affiliation requirement; this is how Facebook operated until recently, and there were some advantages to that arrangement.
    • The disadvantages, though, are that the repository is now tasked with maintaining all of the information, and dealing with lost/forgot password issues. The Library does this in real life, but as it turns out, there are additional options-- so why settle for only sufficient?
    • In addition, if there's any restrictions on the archive, it's fairly trivial to get around them; a robot can create hundreds or thousands of accounts in a day. So if you wanted to limit how much users could consume in a day-- too bad.
  • "Somebody Else's Problem"
    • Instead of creating our own authentication system (based on emails, or Reader cards, or what have you), let's instead use someone else's. Indeed, this is what the Library does in part; in order to confirm your name and address before it issues a Reader card, they check your driver's license or passport. They are, therefore, relying on someone else to assert your identity; then, because that other entity has given you an identity, they'll give you one, too.
    • We can outsource the third standard use case, and have only a lightweight first case; we only need to collect the information we deem important, and we don't have to mess around with passwords or usernames. If we're giving free account creation, we don't have to store *any* user information, but if we're relying on assertions from the third party, then we might wish to; again, whatever we think is important is what we go with.
    • We get the four listed advantages for free; we know total users and total visits, we collect the statistical and targeting information we want (including targeting for advertising, as in #2, should we be so inclined), and needing an account elsewhere to create one at the archive is precisely as much of an impediment as we choose it to be; more on this momentarily. There are additional benefits possible as well.

There are several digital equivalents to this external identity idea (including one expressly designed for academia, called, for reasons that should be clear from above, Shibboleth), but the one that has found real and broad-based success in the "real world" is OpenID. OpenID is a federated open standard; anyone can run their own OpenID server. In addition, OpenID has URLs as its username construct, which allows them to be globally unique by definition (so one doesn't have to worry about two people attempting to register the same name on your site), and allows delegated identity, so that users can use personal websites as OpenIDs without having to run their own servers. For instance, my OpenID is http://ussjoin.com. I do not run my own OpenID server; instead, I currently delegate that task to MyOpenID.com. Should I grow dissatisfied with them, I can delegate it elsewhere; these are all issues that I can handle on my own, and they will not impact on the consumer site-- in this case, the digital archive. This also helps to solve the long-term authentication problem; with delegation, users can point old identities to new, so that a chain of identity can exist-- something that does not in any real fashion happen with photo identification. (OpenID currently has delegation limits, but they can be worked with.)

So this seems useful on face, but let's look at the additional advantages of outsourced identity:

  • We can grant trust based on someone else trusting you. This means that we could only give accounts to those logging in with academically-granted OpenIDs. Alternately (and much more powerfully), we can allow someone to create an account with one OpenID, and log in once with an academic ID to prove this link. This identity claiming is essentially how ClaimID[http://claimid.com/about] operates, and it's a powerful tool-- I can get the power of my academic affiliation, but I don't need to use it to log in. This is how we can tailor the barrier to entry to the level the archive wants.
  • This outsourced trust helps defeat robots, a major flaw in scheme #2; while it's true that robot overlords could register elsewhere first, then at the archive, the additional steps required add overhead to them. This has actually worked at Ma.gnolia, a social bookmarking site; they have posted an explanation at their blog.
  • It helps integrate the archive into the rest of the world; rather than having YATIHTCAIMEEW (Yet Another Thing I Have To Carry Around In My Ever-Expanding Wallet), the same virtual "card" gets me in everywhere. This is another cited reason at the Ma.gnolia article. Sourceforge has added OpenID support (just today, actually) for this reason.
  • Detaching the archive from a login system, through not creating one, allows the archive to more-easily change in the future. If, in ten years, everyone has an identity chip implanted in their foreheads, OpenID might not be particularly useful-- but the layer of indirection created (and required) though externalizing authentication will allow the shim layer (what tells the archive that a user is logged in) to be simply and quickly recreated for the new ForeheadChip system; the shim still answers the same simple question (isTheUserLoggedIn()), whether they logged in with an OpenID, a ForeheadChip, or a telepathic newt. As digital archives struggle with porting data and metadata, it would be nice not to have to deal with reworking their authentication systems as well, each time technology is upgraded.

It seems, then, that externalizing the problem is both the closest fit to what the Library is doing right now, and provides benefits that the Library can't get today (in addition to the ones it can); our quest for a digital transition has found its goal. OpenID today seems to provide the things we want from authentication, takes a lot of issues away that we don't want to deal with if we can avoid them, and externalization in general allows us to migrate with the shifting winds of technology, with much less pain than data migration.

So where do we go from here? If I had a year to continue to explore these issues (in unrelated news, I'm looking for a thesis and/or project advisor for my MSECS :-) ), these are some of the directions I might go in:

  • Creating a general, usable system for incorporating live-verifiable authentication and authorization assertions from trusted identity brokers into a minimal-information identity system (Short title: Ident-i-Eeze In Real Life, or: Don't Give the Barkeep Your Home Address).
  • Extensions to OpenID to allow its use in high-importance transactions.

I hope that both my professors and anyone lucky enough to stumble in off the virtual street (and persistent enough to have read this far) have enjoyed reading this exploration. One of the advantages of this format is that though the article is now concluded, the conversation can continue at length in the comments; fittingly, my blog accepts OpenID comments, so feel free to log in with any OpenID-providing organization and tell the world your thoughts!