Long-Term Authentication for Digital Archives
Thursday, May 01 2008

This is my final project for 600.409, Digital Preservation, here at Hopkins; it's been an incredibly fun and rewarding seminar with four professors (for our eight students): Randal Burns, Sayeed Choudhury, Tim DiLauro, and John Griffin (winner of the minimalism award for his business website). They've been great to learn from for the entire semester.

Our final project requirements are fairly open, basically requiring real thought about the preservation of digital artifacts, whatever (after a semester of discussion) we've construed that to mean. In clearly the best project requirements sheet ever, after a long discussion of the requirements and format, the professors state

"Students are not required to follow the above format, and may instead propose any project leading to any deliverable that represents an equivalent level of effort."

Later, in the evaluation section, they state

"Projects will be evaluated on the process by which the results are obtained, not on the strength of the results themselves. Unusual hypotheses are encouraged and negative results are acceptable. It's the journey, not the destination."

Clearly, a great assignment. One last metanote, then I'll start with what I actually wanted to explore: after I asked how to deliver my paper (one of the suggested formats was a web page of some sort), the response said that possible acceptable formats would include:

Wow. :-)

So then, on to the actual work.

For my project, I wanted to explore digital archives (on which we've spent quite a lot of time), from the angle of "How should scholars get access to a digital archive?" This investigation started from my experience obtaining a reader card at the Library of Congress, so I will begin my exploration with that, very real, scholarly repository.

To access the Library of Congress, one needs to meet their researcher requirement:

"The Library is open to all researchers above high school age (18 years or older) possessing a valid photo identification (e.g. driver's license, passport) with a current address." Library of Congress FAQ

So this is not a highly restrictive archive-- one would hope that one would have a scholarly purpose, but this is not (at this time) strictly required. In this case, one then proceeds to fill out a form asking for a bit of personal information (a current address is about as stringent as it gets, although they do ask survey questions relating to what you intend to study). Then they take your photo, and hand you a card; while the website says that it expires in two years, I was told that it does not actually ever expire; as long as one keeps the card, it can be used in perpetuity (leading to some researchers, now of a rather advanced age, still having their decades-old card with their original photo of a bright-eyed graduate student). With this card, then, one can access most of the resources of the Library. (A few reading rooms have much more stringent requirements-- in most cases, to protect exceedingly rare and valuable books, but they aren't particularly relevant to the digital analogy-- data is not usually fragile in the same way, and so wouldn't be protected for those reasons.)

So wonderful; I have a card identifying me as a reader (with the traditional terrible government-issue photograph), I can make my librarian mother proud that her son is now a "real" scholar. If I lose it, I can go back to the registration area, and they will reconfirm my physical address, then give me a new one. They will never update my photo or any of the other information. When I go to use a reading room, I then present the card; while it has a barcode on it that could be used to confirm it's real, the attendant simply compares my picture to the card, and lets me in-- even forty years after I first obtained the card.

This idea boils down to a Shibboleth (in the Biblical sense; the academic sense we'll get to later):

"Gilead then cut Ephraim off from the fords of the Jordan, and whenever Ephraimite fugitives said, 'Let me cross,' the men of Gilead would ask, 'Are you an Ephraimite?' If he said, 'No,' they then said, 'Very well, say Shibboleth.' If anyone said, 'Sibboleth', because he could not pronounce it, then they would seize him and kill him by the fords of the Jordan. Forty-two thousand Ephraimites fell on this occasion." Judges 12:5-6, New Jerusalem Bible

"Speak 'friend' and enter." [Lord of the Rings]

The latter, obviously, for the more geeky / less religious in the audience; they are equivalent stories, in that I'm allowed access because I say (in the correct way for the resource I'm trying to access) that I would like to have access, and that I am a scholar who might benefit from it; I am then taken at my word.

Let's suppose, though, that the Library really did check whether I was affiliated with a legitimate research institution-- they could call Hopkins, and Hopkins would likely tell them (maybe not with FERPA, but probably) that I was a graduate student in good standing. Then they would let me in. Perhaps they would do that on an ongoing basis, but given that their cards are good forever, probably not.

So then, we have three use situations, and four added advantages, to the Reader Card system as it currently stands. The three use situations are:

The four additional advantages of the Reader Card at the physical Library are:

So then, now let us turn our attention to digital identity, as we create a digital scholarly repository. We must preserve the three use situations, and we'd like to preserve the additional advantages (and create more) if possible, through this transition. Let us then consider several alternatives.

There are several digital equivalents to this external identity idea (including one expressly designed for academia, called, for reasons that should be clear from above, Shibboleth), but the one that has found real and broad-based success in the "real world" is OpenID. OpenID is a federated open standard; anyone can run their own OpenID server. In addition, OpenID has URLs as its username construct, which allows them to be globally unique by definition (so one doesn't have to worry about two people attempting to register the same name on your site), and allows delegated identity, so that users can use personal websites as OpenIDs without having to run their own servers. For instance, my OpenID is http://ussjoin.com. I do not run my own OpenID server; instead, I currently delegate that task to MyOpenID.com. Should I grow dissatisfied with them, I can delegate it elsewhere; these are all issues that I can handle on my own, and they will not impact on the consumer site-- in this case, the digital archive. This also helps to solve the long-term authentication problem; with delegation, users can point old identities to new, so that a chain of identity can exist-- something that does not in any real fashion happen with photo identification. (OpenID currently has delegation limits, but they can be worked with.)

So this seems useful on face, but let's look at the additional advantages of outsourced identity:

It seems, then, that externalizing the problem is both the closest fit to what the Library is doing right now, and provides benefits that the Library can't get today (in addition to the ones it can); our quest for a digital transition has found its goal. OpenID today seems to provide the things we want from authentication, takes a lot of issues away that we don't want to deal with if we can avoid them, and externalization in general allows us to migrate with the shifting winds of technology, with much less pain than data migration.

So where do we go from here? If I had a year to continue to explore these issues (in unrelated news, I'm looking for a thesis and/or project advisor for my MSECS :-) ), these are some of the directions I might go in:

I hope that both my professors and anyone lucky enough to stumble in off the virtual street (and persistent enough to have read this far) have enjoyed reading this exploration. One of the advantages of this format is that though the article is now concluded, the conversation can continue at length in the comments; fittingly, my blog accepts OpenID comments, so feel free to log in with any OpenID-providing organization and tell the world your thoughts!

blog comments powered by Disqus

Some rights reserved, but not all of them, as that's rude. Design courtesy of, well, me.