Nov 10

How Much Data Modeling Is Enough?

Categories:

I’ve been beginning to invest some time in data modeling for the Semantic Web using RDF Schema and OWL (the Web Ontology Language), especially in terms of providing representations of archival resources online. I buy into the promise of Linked Data, but many of the things I am hoping to represent are complex. Arguably, data modeling can become as complex as you think it needs to, but it’s easy to get stuck in a black hole of doing too much, as humorous blogpost from the University of Southampton describes. Just the same, however, incautious modeling, or even undermodeling, can lead to undesired consequences (see, for example Simon Spero’s poster from DC2008, “LCSH is to Thesaurus as Doorbell is to Mammal” [abstract, blog post with poster diagram]).

I attended this year’s Dublin Core conference in Pittsburgh. In the Linked Data working sessions coordinated by Karen Coyle and Corey Harper at the conference, I kept reiterating the need for developing a means by which we can create models iteratively. I’m not sure what this looks like, however, and I’d be eager to talk about this. I’d also like to help myself and others determine when borrowing from existing ontologies and vocabularies makes sense, or when we should go off on our own.

This post has no tag

3 comments

people.umass.edu/arubinst

November 10, 2010 at 9:24 am (UTC 0) Link to this comment

I completely agree that, due to the power of OWL and RDFS, it is essential work for each modeling effort to find that sweet spot between complexity and simplicity. Perhaps we’re looking for a test framework and a set of best practices.

I would definitely love to talk about this more…
Christopher Gutteridge

November 11, 2010 at 11:19 am (UTC 0) Link to this comment

One thing I’m coming to hate is a schema that nearly does what I want but was made over specific.

We’ve made that mistake ourselves, making a class for members of our school, for example.

I’m currently working on a scheme for usefully describing places related to an organisation. Specifically that an event is in a room, with a building and that building is at this lat/long and the nearest public carparks are x,y and z with the following open hours…

Rather than start from scratch we’re probably going to mint very few new predicates or classes. Mostly we’ll just use GoodRelations, foaf and the like with guidelines of how to use them to make them useful for a consumer.

Ideally I’ll produce a validator so people can check it’s discoverable, parseable and saying what they meant to say.
Peter Van Garderen

November 12, 2010 at 2:46 pm (UTC 0) Link to this comment

Wish I could make this session Mark, great topic.

If I was attending I’d like to ask whether the archival community needs its own equivalent of the FRBR entity model? Almost all of the archival standards come with an *implicit* assumption of an underlying entity model that has never been formalized. Would this be an extension of FRBR or a brand-new ‘Archival Resource Entity Model’?

However, this may also be just another dead-end on the path to find the “One True Model to Rule Them All” yfrog.com/nejs8ubj. Perhaps using RDF triples, facets, key:value pairs, and/or noSQL technologies eliminates the need to have our data models represented by entity models altogether?

At any rate, I like your idea of establishing a framework or some common rules, even just agreement on syntax, for how we document & then iterate/version our archival resource data models. The standards bodies/processes (bogged down by their bureaucracy, volunteerism and international scope) are currently too slow to react to the technologies that are defining new requirements and capabilities for getting archival resources online (or even just catching up to years-old common XML practices). That said, I don’t want to dismiss the strength and legitimacy of community standards, we just need a better way to sync them (and have them informed by) any number of constantly evolving implementation models. Hopefully this THATcamp session can contribute to the discussion.

Comments have been disabled.

All original text, images, and code on THATCamp New England 2010 are freely available for you to use, copy, adapt and distribute under a Creative Commons Attribution 3.0 Unported License as long as you mention THATCamp and (if possible) link to THATCamp.org and the Center for History and New Media. The name "THATCamp" and the THATCamp logo are trademarks of the Center for History and New Media at George Mason University. The THATCamp New England 2010 theme is based on the Graphene theme by Syahir Hakim.

How Much Data Modeling Is Enough?

3 comments

people.umass.edu/arubinst

Christopher Gutteridge

Peter Van Garderen

Comments have been disabled.

Recent Comments

Archives