Is it really you?

A somewhat less amorphous proposal…

I’ve been thinking a lot about how we can define who someone really is on the Web. With bibliographic material, we have the Library of Congress Name Authority File, which, though Orwellian sounding, does a fairly good job helping us differentiate the John Smith who romanced Pocahontas from the John Smith who wrote the definitive biography of Benny Hill. This business becomes a little more complicated, however, with archival material on the Web. Archival collections are filled with obscure people, whose roles in history, while not individually significant enough to make it into a high school text book, or the Name Authority File for that matter, are important because of their associations with significant movements, historical events, or other like minded and sometimes more famous people.

Mining archival material for these associations can be complicated. How do we know that the John Smith who has letters in the Big Famous Guy Papers is the same John Smith that is recorded in notes that are part of the Import Student Revolutionary Movement collection as having attended some significant meeting. As anyone can imagine, this problem becomes even more significant when archiving contemporary collections of individuals who are represented on Twitter, Facebook, blogs, and in chat rooms…

Tools like FOAF, EAC, and FRAD are emerging for disambiguating individuals and defining their identities throughout their distributed representations on the Web and in archival collections in widespread repositories. How might these tools work in systems that publish archival material on the Web and what impact on research in the humanities might these different approaches have?


  1. Susan Kline

    I like this idea a lot. This also makes me wonder if there is a connection here between having an online presence as a scholar and having to push to get your online presence/contributions count towards tenure and promotion?

  2. David Dwiggins

    There’s also the case with “born digital” materials where some people may not wish these connections to be made. Back in the late 1990s, my brother made a web page on the GeoCities service where he listed a bunch of his friends and made (somewhat innocuous) comments about each. One of his friends, now a professor at a prestigious east coast university, recently contacted him because this reference was coming up in internet searches, and he was wondering if his last name could be removed.

    The problem is that, in the meantime, GeoCities had been shut down, and the pages that now exist are archived copies saved by folks like ArchiveTeam.org and Archive.org. Because these pages are now archival versions of dead sites, there is no real mechanism for changing or “restricting” them. Of course, from a purely academic standpoint, we might prefer this, since it provides a more complete archival record. But archivists have always had to consider the privacy concerns surrounding records. And does considering privacy become even more important now that something someone wrote in college can be retrieved by a potential employer with two seconds of “googling?” Is there potential for backlash here, particularly as an increasing percentage of the population has essentially lived their entire life online?

    So I like the idea of discussing how to facilitate linkages between disparate information sources using names, etc. But I wonder if we should also consider the ethical aspects of the increasing availability and “linkability” of sources, particularly those that might refer to living people.

