ALEG

Data Model - Intro/Issues

Why produce a data model?

The data model should help us assess whether we are keeping the right information about the right entities so that the system will meet the requirements of users.

But what are the requirements of users? For traditional IT systems, this is usually answered by referring to the business needs the system must meet. Even for library systems, this question is relatively straight-forward to answer in terms of meeting the data processing requirements of the day to day operations of a library (catalogue, acquisitions, circulation).

But ALEG is an unusual system ...

But for ALEG, this task isn't quite so simple. ALEG is designed to be a tool to aid inquiry and research, to further understanding of Australia's literature. Some requirements and usages are easy to understand: provide a simple way to find out basic information about a work or an author. However, the potential value of a research tool relates not only to the breadth of the resource on which it operates but also on how flexible it is; how the researcher can use it to answer questions the designers of the system did not anticipate.

A key part of ALEG's potential is the way it can help to reveal and elucidate the relationships between the 'entities' making up Australian literature; the authors, works, publishers, movements, genres, cultural and political forces.

So, the research value of ALEG is not so much in the 'raw' data of who wrote what and when (vital as that is), but how that 'raw' data can be view as coherent clumps - the relationships that are unveiled when the core data is analysed.

So the data model is even more important than usual

Much of information technology is "cut and paste". Problems form recurring patterns (often referred to as design patterns), and the job of system designers and implementors is to analyse the problem, match it against well known patterns and then use the appropriate technology to implement the solution (not a trivial task!).

"Different" problems (such as ALEG) are harder to match at the "global" level (it isn't a inventory management system, a human resources system or even a library catalogue system). And because we all like to think in metaphors, unless we really understand what we are trying to do with ALEG, we'll end up using the wrong metaphor and building the wrong system.

So the purpose of these data model documents is to make explicit our understanding

This data model should make it clear what data we are storing, what relationships we are representing, and why.

OK - What is ALEG?

I assert that ALEG is more like a police investigation system. These systems accept large amounts of data about suspects, crimes, relationships, rumours and allow investigators to trawl through it and discover relationships.

Many of these systems have been built, but from my own limited experience of them, they've had mixed success. Early attempts were hampered by hardware limitations - it takes a lot of horsepower to represent complex relationships, especially when you aren't using an appropriate data model!

One approach - Topic Maps

Recently the ISO published a new standard which attempts to define an approach to representing complex classifications and representations of relationships on an underlaying data set. This standard is called Topic Maps, and although the ISO defining document isn't especially edifying, there are several resources offering a more approachable introduction to Topic Maps:

In a nutshell, Topic Maps provide a framework for defining topics of interest separate from the material being linked to the topics. A Topic Map allows the definition of:

Topic Associations could allow very powerful automated processing where the right semantics are defined and understood. For example, and application that understood that the "is part of" association type was transitive would know that if Topic X (eg, Sunnybank) "is part of" Topic Y (eg, Brisbane) and Topic Y "is part of" Topic Z (eg, Queensland), then Topic X "is part of" Topic Z.

Topics can be involved in multiple associations. For example, Sunnybank can be associated with the "urban area" topic and/or the "suburb" topic. Brisbane can be associated with the "city" topic, as could the Sydney and Melbourne topics.

What is so special about Topic Maps?

Topic Maps are interesting for several reasons:

This all sounds good to me:

Related work

What next

Is this all jumping the gun? Isn't discussion of the way we're going to represent the ALEG data preempting a thorough analysis of what we are going to store?

Yes and no...

The language you use determines the approach you take when thinking about a problem. If you think in terms of punched cards or hierarchical databases or XML datastructures, you'll find that your approach is couched in these terms.

Coming up with a strong language to represent the problem is often half the battle. I assert that the ALEG data modelling problem consists of two sub-problems:

  1. a fairly standard data processing problem to deal with 500,000 odd records of three or four basic record types (work, creator, publication/edition/instantiation, maybe holdings or other references to accessible copies of works), some of which contain a large amount of text.

  2. a relationship recording/inferring/querying problem which allows the representation of a rich network of relationships between these 500,000 odd records

In the data modelling documents which follow, I'm taking a Topic Map bias. That is, I'm assuming that Topic Maps are a good way to represent the complex mesh of relationships which will make ALEG a valuable research tool.

Data modelling documents:


Home > Data Model
Kent Fitch
k.fitch@adfa.edu.au
Initial Draft: 22 May 2000
Revised: 26 May 2000
Revised: 7 June 2000 (added ref to Technical Issues on Topic Maps)