ALEG
Data Model - Inventory
Index
Introduction
This document attempts to identify the primary entities
of the ALEG data model and their attributes.
The function requirements of ALEG were broadly
indentified in the
Stage 1
report. The role of the ALEG data model is to describe
a set of data structures which will allow these requirements
to be met, ideally with a design which is consistent with
current thinking on the best way to represent literary and
related resources.
There are many possible starting points when constructing
a model for ALEG. The existing library
catalogue world is heavily influenced by the MARC model
of description. Museum and archive communities have
developed various models of their own, including the
CIDOC
Conceptual Reference Model and
ISAAR (CPF
document (International Standard Archival Authority Record
for Corporate Bodies, Persons and Families). The work done by
INDECS focuses on how best to model agents
(a grouping term for people and organisations) and the material which
they create. The
Harmony Project
is an attempt to identify some common mechanisms for representing
common entities and attributes across many spheres of interest.
After reviewing these and working through ALEG's specific
requirements, we decided to base our model on the
IFLA FRBR (warning - 144
pages of PDF!). IFLA is the
International Federation of
Library Associations and Institutions. FRBR is the
Functional Requirements for Bibliographic Records.
Aside from our own interpretation of FRBR represented by this and other
documents, you are
strongly encouraged to peruse the following resources:
The detail of the FRBR model is not repeated here. However,
we describe below our specific use and extensions to the FRBR model.
The following image taken from the above referenced
D-Lib
article summaries the FRBR model as extended by Indecs:
The main points to be emphasised from this model are:
- Information Resources are represented by 4 entities:
Work, Expression, Manifestation and Item
- Instances of each of these Information Resources can be linked
to each other. For example, a particular novel (work) may have
been influenced by a particular poem (another work), a short
story (work) may be expressed in English (an expression)
and translated into French (another expression).
- Instances of each of these Information Resources can be linked
to subjects.
- Information Resources are "transformed" by the "actions" of
"agents". The word "transformed" is very general;
"transforming actions" include conceiving, writing and
publishing works, translation, editing and illustrating. FRBR
does not describe "actions"; they are an addition to the
model introduced by Indecs and the Harmony ABC proposal.
If you haven't yet done so, please read the above commentaries
on the FRBR model before proceeding!!
The Core ALEG data model
The Core ALEG data model extends the FRBR model in the following
ways:
- As shown in the above diagram an entity is inserted between
Information Resources and
Agents. The above diagram uses the term "Action" to describe this
entity. The
Harmony
ABC project uses the term "Event". As described in both
the Indecs and Harmony documentation, the benefit of adding this
entity is that it provides a "place" to store information about the
event which led to an information resoure being produced. ALEG
sometimes wants to record more than just "who did what" - sometimes
we want to know "how, when, where, and why". Maybe one day we'll find
it useful to link one event to another events. By representing
an "event" as a first class object rather than as a
series of attributes which are tightly bound to some other entity,
this becomes possible.
An analogy: for the first few years of their life,
children could be represented as mere "attributes" of
their parents, as the relationships they can take part
in and the 'data' which describes them is pretty minimal
(birth date and place, weight, hair-colour, mother, father
etc). But as children grow, the richness of their
relationships and 'data' rapidly approach those of their
parents. Building a model of a family where children are
'second class objects' would be a big mistake.
Another analogy: imagine you were building a system
which could be used to generate a TV guide. One approach
would be to represent the names of programs (eg, "Bellbird",
"Four Corners") as a simple attribute - a string of text.
This would work fine initially, but if later you wanted to
add information about each program (producer,
actors, budget, abstract, links to related programs)
you'd have to redesign your approach. If however, you'd
represented the program as an "entity" in its own right
pointed to by the TV guide (rather than just a string of text
included within), you'd then be able to
add extra attributes to the program with minimal disruption
to your TV guide system.
- Relationships exist between agents. For example
one author may have been influenced by another.
- ALEG greatly expands the subjects representation
in the above diagram to that of a Topic Map. The
Topic Map contains a range of Topics, grouped by Topic
Type and linked together by Associations. For more information
on Topic Maps, refer to the ALEG
Data Model - Intro/Issues document.
- Agents, not just information resources can be associated
with Topics. For example, a Topic may be created
to represent the Jindyworobaks and writers may be
linked to that topic via a "member of" relationship, possibly
further described with a date range and other notes.
- ALEG represents information about Awards given to Information
Resources and Agents.
- ALEG will need to represent holdings information, so
that when a ALEG user finds a work of interest the system can help
them locate a physical instance of that work. Exactly how
this will happen is currently an open issue.
- An important component of ALEG is biographical material.
- ALEG will be used by some partners to store records
describing archival items, not just manuscripts but correspondence,
exhibition and promotion material, galleys, etc. In the long term
this material will probably be housed in a separate (but linked)
national system, but in the short and medium term, ALEG must
accomodate this material to facilitate the migration from
some partners current systems to ALEG.
- ALEG will probably need to record some information about
its users, and maybe about how they use the system.
- ALEG will need to record changes made to its data
by the contributing indexers. It will also need to support
basic workflow: the progression of changes from 'entered
in the system' to 'available to the public'.
Basic Data Model
In this model:
- Ovals represent entities and lines represent relationships between
entities
- Work, Expression and Manifestation are grouped together
for simplicity to avoid repeating all the relationships between the
other entities and Work. Expression and Manifestation, and to
indicate that there are very close relationships between these
three entities
- We don't use the FRBR Item entity (although
we will have to represent holdings, somehow...)
- A work/expression/manifestation can be associated with one or more agents
who are responsible for its creation. Agents are associated with
works in different ways: for example, as author, editor, illustrator, translator
or publisher. Sometimes,
the associations will have a time scope. For example, the role of editor of
Westerly will change over time.
- A work/expression/manifestation can be associated to other works/expressions/manifestations. Common relationships
include:
- partOf: a work is part of another work
which maybe a collection, sequence or a serial
- reviewOf: a work is a review of another work
- criticismOf: a work is a criticism of another work
- influencedBy: a work was influenced by another work
- Agents can be related to other agents. Common relationships
include:
- pseudonymOf: an agent is associated with a name
used by another agents
- associatedWith: an agent is associated with another
agent in a familial, partnership or friendship relationship
- influencedBy: an agent is influenced by another agent
- Works/expressions/manifestations and agents can be nominated for and given awards
- Works/expressions/manifestations and agents can be associated with topics
- Topics can be related to other topics, forming a cross-linked
"Topic Map", which is a superset of the typical thesaurus structure
(broader, narrower, related and prefered terms)
- Works/expressions/manifestations can be associated with Holdings.
ALEG
almost certainly won't record many holdings itself, but providing
information about holdings, possibly retrieved from other databases,
is an important function of ALEG. Users of the system may or may not be interested
in the holdings of a particular manifestation: the system will support
viewing holdings information for either a particular manifestation or
any manifestation of the work (even if that manifestation is not
recorded in ALEG).
how ? Kinetica might be a good start if we can reliably (and
cheaply!) match on title/author. For an extended discussion and recommendations, refer
to the
Holdings section on the
Issues document.
- Works/expressions/manifestations and agents can be associated
with simple archival item entities
Ancillary entities are not shown:
- ALEG users
- Usage logs
- Workflow/management information
- Thesaurus/Authority file structures
Some examples of how familiar entities are modelled may help.
In this example, "Voss", the novel by Patrick White is represented
as a single work, two expressions (one expressed by Patrick White,
the other expressed by Fernando Feroka) and two manifestations
(one published by Longhams, the translation published by Garcia).
The Expression (as we shall see below) records the language, allowing
the two expressions of "Voss" (the work) to be distinguished.
The above image represents the creation, translation and publication
"Events" using a shorthand - a simple arrow. However, the system
represents these events as entities, recording attributes of
these events (time, place, input(s) and output(s)). For example,
the translation event would be properly represented as follows:
Although explicit, this representation clutters up the diagrams,
so we won't show events as entities explicitly, although they
are really there in the model!.
The following diagram shows how we'd represent "Voss", the opera,
and a review of this opera. For clarity, the translation
has been dropped, events are not shown as entities,
"The Bulletin" (the Serial) is not shown, nor
is the publisher of "The Bulletin":
As another example, consider "Sunday Lunch", a short story
by Antigone Kefala which appeared in vol 2/3 of "Aspect" and
Dale Spender's "Penguin Anthology of Australian Women Writing":
Here, "Sunday Lunch" appears in two manifestations, each of
which are "partOf" another manifestation. Typically,
many other manifestations would be linked with serial issue
and anthology manifestations.
Note also that "Aspect" (the Serial) appears as a Work, linked
to (what would be) many "Aspect" (Serial Issue) Works.
Core Entities
Work
Attributes of work:
- title
- variantTitleList
- firstLineOfPoetry
- only for formList: Poetry
- identifierList
- A work may have a series of identifiers
which record how external systems identify this work.
- identifierType
- identifier
- workType
-
one of:
- Collected Works (usually all the works of a single author)
- Selected Works (including "omnibus")
- Anthology (including "back-to-back" novels by different
authors)
- Series (describes all series, whether they be novel trilogies, poem
"sequences", publishers series (eg, UQP Black Writers or maybe the "Larry
Kent" series)
(typically used for works which are novels)
Sequence (typically used for works which are not novels)
(expunged 27 Jul)
- Collection (was "Manuscript Collection" but at Hobart meeting
it was thought that a general "collection" work type was all that
was needed. "Website" was another proposed workType but also at
Hobart it was decided that a website was just a collection at the
"Work" level, and that attributes related to its
expresssion and physical manifestation were best described in
the expression and manifestation entities)
- Periodical (isPreferredFor Serial)
- Periodical Issue (isPreferredFor Serial Issue)
- Single Work (includes typical ALEG works such as novels, poems
and includes non-ALEG works, such as Van Gogh's "Starry Night")
- Extract (only if the extract is from a work which has not yet
appeared; if the work has already appeared then the extract is
just an expression of the existing work, not a work in its own
right)
- formList
-
one or more of:
- Drama
- Novel
- Short Story
- Poetry
- Picture Book
- Autobiography
- Essay
- Thesis
- Correspondence
- Criticism
Article (expunged 26 jul 2000)
- Review
- Obituary
- Biography
- Bibliography
- Manuscript Collection
- Column (was going to be deprecated, but reinstated 26 jul 2000)
- Interview (old, use will be deprecated)
- genreList
-
one or more of:
- Crime
- Fantasy
- Historical
- Humour
- Romance
- Satire
- Sci-Fi
- Travel
- War Literature
Speculative (expunged by popular demand 26 jul 2000)
Ficto-criticism (expunged by popular demand 26 jul 2000)
- Young Adult Literature
- Children's Literature
- earliestKnownDate
- When the work was first made available (?)
MLA: when we know
- abstract
- usefulFor
- The audience which will find this work useful.
One or more of:
MLA: intendedAudience and usefulForAudience
- Having thought again about using 'Intended Audience', we've decided we'd
like to use a similar field for 'Useful for Audience' (ie pre tertiary,
tertiary), but that we'd prefer to keep Young Adult and Children's as
sub-genres. This is because they are *studied* as genres, and because want
to retain that flexibility of being able to have people find John Marsden's
works whether they search from the Young Adult or the novel approach.
- ranking
- The system will allow
three rankings to be stored which may be used to order
search results to provide "better" information at the top
os the search result list.
- subjectiveRanking
- specified by the indexer as "high", "neutral", "low"
(default is "neutral")
- usefulForRanking
- another subjective ranking, specified by the
indexer as a useful resource for school,
research or general users. KF: does this
obviate the need for the "usefulfor"
attribute above?
- citationRanking
- calculated by the system based on the number of
works referencing this work - the more, the higher the
citationRanking
MLA/TW: Or ranking by user's particular area
of expertise? Ranking according to audience? eg, School, Tertiary, Research
- periodicalIssueInformation
- for a work representing an issue of a periodical
- year
- volume
- number
- month/Season (issue description)
- notes
Relationships of work:
- settings of the work:
- subjects:
- works/expressions/manifestations
- agents
- objects
- concepts
- agents responsible
- (Actually, event detailing type of event, agent, their role,
optionally time and place.
Typical roles would include creation, adaptation, revision,
interpretation, free translation.)
- expressions of the work
- related and influencing works
Expression
Attributes of expression:
- title
- variantTitleList
- identifierList
- An expression may have a series of identifiers
which record how external systems identify this expression.
- identifierType
- identifier
- formOfExpression
-
FRBR describes this vocabulary for this attribute as:
- alpha-numeric notation
- spoken word
- dance
- mime
- performance
- image
- photographic image
- musical notation
- musical
sound
- sculpture
The Dublin Core
Type
element values are listed as:
- collection
- dataset
- event
- image
- interactive resource
- model
- party
- physical object
- place
- service
- software
- sound
- text
At the Hobart meeting it was thought some qualification could be handy,
eg "moving image" is a useful subset of the DC "image" type value.
However, the recently released
DC Qualifiers
recommendation has not expanded the Type
vocabulary... This issue as discussed further in the
Issues document where is is recommended to
adopt the FRBR vocabulary.
- date
- languageList
- English as the default; list taken from language authority
file
- manuscriptNotes
- may include a reference to manuscript
location
Relationships of expression:
- agents responsible
- (Actually, event detailing type of event, agent, their role,
optionally time and place.
Typical roles would include expression, revision, abridgement,
translation, illustration.)
- parentWork
- manifestations of the expression
- related and influencing works, expressions and manifestations
Manifestation
Attributes of manifestation:
- title
- variantTitleList
- statementOfResponsibility
-
Describes the agents responsible for the manifestation as indicated
by the manifestation. As discussed in the FRBR documentation, the
information recorded in the statement of responsibility may differ
from the known creator(s) of the work and expression. The system
should record the statement of responsibility exactly as presented
in the manifestation (eg "written by Harry Feroka, as told to Bill ('Barnacle')
Bunbry"). As well as recording this text, the system must link
the names mentioned in the statement of responsibility to the
agent names with the appropriate roles ('author', 'translator', 'editor'
etc).
- manuscriptFlag
- Indicates that this
manifestation is a manuscript (and therefore
we will probably need to record holding information
either detailed as part of ALEG or as a pointer
to RAAM/son-of-RAAM)
- identifierList
- A manifestation may have a series of identifiers.
Commonly identified identifiers are ISBN and ISSN. However,
it may be handly to store the Kinetica immutable number
assigned to the manifestation to facilitate linking
to information on Kinetica. Similarly, other systems
may have numbering systems for manifestattions which ALEG
would find advantageous to record.
Note: Where a manifestation appears
as "part Of" another manifestation, the first manifestation
entity does not record the ISBN/ISSN of the manifestation
of which it is a part.
- identifierType (eg "ISBN", "Kinetica immutable number")
- identifier
- Edition/issue designation
- printerList
- reprintList
- limitedEditionInformation
- printRun
- notes (eg, 'signed and numbered')
- physicalDescription
- format: value selected from the recommended list of
Dublin
Core Format element values, which is the
MIME
type and subtype list.
The recently released
Dublin
Core qualifiers recommendation adds two qualifiers for format:
- medium, described as "the material or physical carrier of the resource"
- extent, described as "the size or duration of the resource"
"Format" can be used to describe how a resource has been electronically
encoded, "medium" can describe the physical carrier, and extent
can describe the size, but if the resource is not electronic,
the DC format is not clear on how to more fully describe the
resource.
For example, how to describe that this manifestation is Braille?
There is not a MIME type for text/braille... For a "talking book",
the corresponding expression's formOfExpression could be "sound"
and the manifestation medium could be "audio cassette", but is
that all that is required? For a large print book, there is
nothing "standard" in the DC:Format universe that seems to
allow appropriate description (in a standard interoperable way).
This matter is discussed further in the Issues
document where it is recommended that the DC:Format vocabulary
be used, but augmented with local formats: as 'Braille', 'Talking Book',
'Large Print', 'Handwritten/Manuscript', "Website'
For a work which is manifested as a web site, it
would have a work type of "collection", an expression
formOfExpression of "collection" (see Works
which are Web Sites in the Issues document) and
a manifestation format of "Website".
FRBR's manifestation attribute list includes:
- form of carrier (examples:
sound cassette, video disk, microfilm cartridge,
transparency)
- extent of the carrier (number of sheets, discs, reels)
- physical medium - the type of material of which the carrier
is produced (examples: paper, wood, metal)
- typeface
- type size
- accessAddressList
- URL/PURL for electronically
available manifestation
Relationships of manifestation:
- agents responsible
- Actually, event detailing type of event, agent, their role,
optionally time and place.
A typical event would be a publication event, describing the
date and place of publication and the publisher (as an agent).
The names mentioned in the statementOfResponsibility are
also linked to the manifestation.
- parentExpression
- partOfOtherManifestationsList
- Where a manifestation of an expression is "part of"
another manifestation of another expression (eg, published
in a periodical issue or an anthology), this relationship
links the manifestation to the manifestation in which it
appears (as "part of")
- related and influencing works, expressions and manifestations
Agent
Attributes of agent:
- creatorType
- human, organisation
- name
-
Actually, name is not a simple attribute, because a single
agent can have several variant names and be associated with
multiple pseudonyms. For a fuller explanation of how
agents and their names and pseudo-identities are modelled,
refer to the Names,
alternate names, pseudonyms document.
- For humans, name is broken down into:
- family name
- other names
- title/honorifics (eg, Rev, Sir, Honourable, Ms, ...)
- display name (if it needs to be displayed in some particular way)
For organisations, see AACR2 indexing rules
- identifierList
- An agent may have a series of identifiers such as authority
numbers
which record how external systems identify this agent.
- identifierType
- identifier
- gender
- male, female, {not recorded}
- birthDetails
-
Birth is treated as an event, and hence is really
modelled as an event relationship between an agent
and a date and place. Conceptually, we store:
- deathDetails
-
Death is treated as an event, and hence is really
modelled as an event relationship between an agent
and a date and place. Conceptually, we store:
- manuscriptNotes
- may include a reference to manuscript location, archival material
- biographicalNotesList
- biographicalNotes
- unstructured blobs of text for storing
arbitrary biographical information. There will
probably be some structure here - at least attribution
of the material.
-
- referencesToExternallyHeldInformation
- notes about material not stored
- nationalityList
- ethnicGroupList
- aboriginalMay include:
- expatriateAustralian (AUSTLIT: aust-e)
- visitorToAustralia (AUSTLIT: aust-v)
Relationships of agent:
- occupationList
- see Vocation
Topics
- significantEvents
- Actually, birth and death are just special significant events.
Other signficant event types include visited, migrated,
resided. Refer to the description of the
Event entity.
- relationships with other agents
-
For 'human' agents, relationships include:
'relatedTo', ...
For 'organisation' agents, relationships include:
'partOf', ....
For all agents, relationships include:
'memberOf (some organisation/movement', 'associatedWith',
'influencedBy', ...
Holding
Holdings will only be stored on ALEG if they cannot
be automatically obtained by searching external systems
(especially Kinetica). It is not the intention of
ALEG to maintain holdings information if it can possibly
be avoided!
Attributes of holding:
- holdingInstitutionCode
- localIdentifier
- localLocationCode
- holdingCount
- holdingNote
Archive Item
Archive Items are only held by ALEG to ease the transition
of some partners onto ALEG, and will hopefully eventually
reside in a national archive description format and
database. For details of the proposed interim archival
item facility, refer to the
Archive
Items proposal.
Award
Creators can be awarded either for a specific work or for
their body of work.
Attributes of award:
- awardName
- notes, link to award web site, ...
- year
- categoryList
- nominationlist
- workReference or authorReference
- winnerlist
- prizeName (eg, 'winner', 'runner up')
- workReference or authorReference
- prizeDetails (eg, '$1000 plus publication in The Age')
- notes
Attributes common to all core record types
- publicNotesList
- externalReferences
- referenceTitle
- referenceText
- electronicLinkInformationList
- referenceNote
- referenceSortOrder
- systemStatusAndWorkFlow
- a set of attributes which are not publically viewable but
which are used for communication between the ALEG
maintainers and by the system for supporting work-flow
- generalRecordStatus
- pendingCompletion, pendingApproval, available, temporarilyNotAvailable,
deleted, deleted(Merged)
- indexingStatusList
- toBeIndexed, fromOtherSource, indexedFromReview, inProcess,
topicIndexingPending, verifyPagination, forEditorialCommitteeMeeting,
readerInterpretationRequired
- recordHistoryList
- date
- recordStatus
- amendedBy
- amendedHow (what was changed and how)
- notes
- informationSourceList
This allows indexers to record the source(s) for
information recorded in ALEG. For example, you may want
to record M&M and Ferguson as sources for a birth date,
and the ABD as another source for an incorrect value for
that same date. Each 'attribute' of a record could
potentially be associated with multiple sources.
- attribute
- informationSource
- informationDate
- verifiedBy
- notes
- contributingInstitutionList
- list of ALEG partners responsible for the content
of this record
- ALEGStaffNotes
- These attributes are aimed at facilitating an exchange
of questions/replies/information between ALEG staff.
- from
- to
- date
- comments
- status
Infrastructure Record Types
RegisteredCustomers
Organisations/people who are registered as customers.
Not all users are RegisteredCustomers, as some material
is available freely to anyone
- code
- a short code/identifier of the customer (eg "ANU")
- name
- the customer's name
- typeOfCustomer
- a categorisation to support as yet unknown business analysis functions
one of: schoolPrimary,schoolSecondary,schoolCollege,
schoolOther, tertiaryUniversity, tertiaryOther, educationalOther,
libraryState, libraryPublic, libraryOther, person,
commercialPublisher, commercialInformation, commercialOther,
overseas
- consortiaFlag
- ipAddressList
- a list of IP address ranges which the customer may use to access
ALEG without requiring userid/password
- useridPasswordList
- a list of userid/password pairs which the customer may use to access
ALEG from any IP address. If required, a customer may be assigned
many userids and passwords (eg, separate userids and passwords for
staff and students). What isn't supported:
- restricting userids to IP addresses
- blocking IP addresses
- prioritising access by userid
- concurrencyLimit
- an integer specifing how many individual users from a customer
can be concurrently 'active' in the system. May be set to '-1' to
indicate that there is no concurrency limit (ie, the customer may
have any number of concurrent sessions). The definition of
'active' relies on the 'non activity timeout' system setting. It is
not proposed that the non activity timeout is an attribute of a
customer.
- accessRightsList
- Some customers may have access to more information
than other customers. If so, this field will list in some codified
manner what access rights this customer has.
- billingInformation
- a set of as yet undefined attributes to support the billing function,
maybe including:
- status (trial, paid, overdue, cancelled, ...)
- date subscribed to-from
- subscription amount ($)
- contact details
- ...
- notes
Notes
The system needs to represent uniformly and efficiently these common
conditions:
- unknown date/place/details - leave empty. The system will generate
the appropriate indicator to the user that this data is unknown.
- special publication cases, to be recorded against
a manifestation:
- published by the author
- no publisher (not an unknown publisher)
- no date of publication
- no title
Issues
- sequences
- These will be handled by the creation of a work
representing the sequence as a whole, and individual
works representing the parts of the sequence. The parts
will be linked to the sequence with a 'part of sequence';
relationship which for ordered sequences will record a
part number.
- letters in response to reviews
- As the review will be a work in its own right, other works
about that work will simply refer to the work and the type
of relationship between them.
- notes
- AUSTLIT has a large set of 'standard' note templates (page 71,
AUSTLIT Indexers' Guide, Feb 2000).
AUSTLIT will retain these but hopefully support easier entry with
a mechanism to select the appropriate template and 'fill in the
blanks'
- extract
- an extracts can be either a workType (and hence work/expression/manifestation)
or just another expression of a work, created by an "extract" event, depending
on whether or not the extract is the first and only known manifestation of the
work
The AUSTLIT Indexers (Tessa Wooldridge, Jenny Huntley, Lesley Banson & Jane Rankine)
produced
an extensive discussion
document on this inventory (Word document) on 6 June 2000.
Kent Fitch, on behalf of Marie-Louise Ayers, Annette McGuiness
and Kerry Kilner
k.fitch@adfa.edu.au
Initial Draft: 22 May 2000
Revised: 26 May 2000
Revised: 9 June 2000
Revised: 27 June 2000
At this stage, a decision was made to
move from the
Work/Instantiation model to
Work/Expression/Manifestation. The old
Work/Instantiation version of this
document has been
archived here.
Revised: 4 July 2000
Revised: 27 July 2000