ALEG
System Design

Version 1.0

Index

  1. Introduction
  2. Data Model
    1. Background
    2. The ALEG data model
  3. Data Maintenance
    1. Maintenance User Interface
    2. Management and Coordination
  4. General Public User Interface
    1. General Principles
    2. Searching and browsing
    3. Locating holdings
    4. Accessing archive items and manuscript collections
    5. Access restrictions
    6. Multiple Views
    7. Branding and source identification
    8. Logging
  5. Business Issues
  6. Interoperability
    1. ALEG as a Z39.50 target
    2. Extracting information from ALEG
    3. Publishing the ALEG Topic Map


If you make people think they're thinking, they'll love you.
If you really make them think, they'll hate you.
D. R. P. Marquis (1878-1937)

  1. Introduction

    This document describes the system which the ALEG development team wants to have in production by January 2001.

  2. Data Model

    1. Background

      ALEG is not a library catalogue system. Although the "base" entities described by ALEG can be cast in terms used by traditional library systems such as "title" and "author", ALEG's reason for existence is not to duplicate the National Library of Australia's Kinetica facility. Rather, it is to make available a rich resource for people interested in Australian Literature by providing:

      • biographical information on creators
      • extensive subject description of works, including relationships between works, creators and general topics
      • information about criticisms and reviews of work, including subjective rankings
      • contextual (guided) access to full-content material where possible

      There have been several recent data modelling and architecture exercises which have greatly influenced the ALEG data model:

      • The IFLA FRBR
      • The INDECS project
      • The Harmony ABC strawman proposal
      • The ISO Topic Map standard

      The IFLA FRBR

      A recent (1998) development in the data modelling of library systems was completed by the International Federation of Library Associations and Institutions (IFLA). Known as the FRBR ("Functional Requirements for Bibliographic Records"), it "teases apart" the concepts of Work, Expression, Manifestation and Item and in so doing helps model and understand the relationships between titles. For example, it can clearly identify two manifestations as embodiments of the same expression, or two expressions as realisations of the same work, although one may be a language translation of another.

      The INDECS project

      The INDECS project (INteroperability of Data in E-Commerce Systems) was established to develop a metadata framework for representing intellectual property and the transactions involving it. Their Schema and Model documents have a strong bent towards enabling e-commerce rights management related transactions, but necessarily require a precise modelling of the intellectual works and the agents who contribute to them. They also use the basic Work, Expression, Manifestation and Item representations of FRBR, but introduce the concept of the "Event" that describes how these products came about - who did what, the context, the inputs to the process.

      The Harmony ABC strawman proposal

      Regardless of problem domain, many entities and relationships which different system attempt to model are pretty much the same. Rather than force each implementation to develop their own representations, and hence waste effort and complicate interoperability, the Harmony ABC proposal attempts to define a common framework which diverse systems will be able to use. The ABC proposal acknowledges that FRBR and INDECS were major sources of inspiration for their work.

      The ISO Topic Map standard

      Thesauri are recognised as useful tools which add structure to a subject list. The ISO Topic Map standard (ISO 13250) defines data structures which can be used as a standard thesaurus (with broader, narrower, see also and preferred types of relationships between topics), or as a standard "index" to a collection of works. But Topic Maps have some other characteristics which make them extremely powerful, including:

      • Topics can be assigned Topic Types (eg, "illustrator", "mountain", "country", "era"). Topic Types are themselves Topics.
      • Topics can be arbitrarily linked together by "Associations". Associations each have an "Association Type", which is itself a Topic.
      • Topics are linked to resources by "Occurrences". The resource being linked to plays a "Role" in the Occurrence. Roles are themselves Topics.

      A set of Topics, their associations and occurrences form a "Topic Map". The Topic Map is quite separate from the underlying resources which it describes. Hence, multiple Topic Maps can be assembled and maintained quite independently of each other, and the underlying resource.

    2. The ALEG data model

      After reviewing the goals of the ALEG system and the data and relationships it needs to represent, seeking the opinions of experienced and respected figures in the Australian library community and reviewing the information discussing library data modelling, we have decided to base the ALEG data model on FRBR and INDECS models, adding:

      • an entity to represent awards
      • an entity to represent holdings information where a manifestation of a work has been sighted
      • Topic Maps to represent subjects, including relationship types (for example, "creator", narrower term "illustrator")
      • Encoded Archival Description (EAD) to represent archive items (initially populated by item descriptions from the Lu Rees collection)

      and removing:

      • the FRBR Item entity
      • the transaction aspects of INDECS

      For a full description of the development and contents of the ALEG data model, refer to the data model documentation.

  3. Data Maintenance

    1. Maintenance User Interface

      The ALEG Data Maintenance User Interface will be used by ALEG partners to maintain the ALEG data base. The interface will be web browser based, hence special software will not need to be installed and upgraded.

      There are many web browsers each with varying capabilities, adherence to standards, popularity and cost. It has been decided to base the ALEG Data Maintenance User Interface on Microsoft's Internet Explorer version 5 (IE5) because IE5:

      1. implements an advanced programming interface which allows the development of sophisticated applications
      2. is a stable product and widely installed, having being released in March 1999
      3. is available for current and recent popular operating systems (Windows 95/98/NT/2000, Macintosh) and is automatically bundled with most operating system installations
      4. has overwhelming market share (estimated at over 85% by Stat Market as at 18 June 2000)
      5. is free

      [IE version 5.5 has just been released (12 July). Claimed new user functions are pretty minimal - print preview, better performance. However, support for programming has been significantly improved, and IE5.5 is probably a significantly better platform for undertaking user interface development, although many of the new features are apparently not aligned with the W3C standards in-progress.]

      The guiding principles of the user interface are simplicity, speed and ease of use. The user interface must allow the user to do as much as possible with as few keystrokes/mouse clicks as possible. It must present the most likely action as the default and do as much as possible to prevent common mistakes and maintain the integrity of the system. Where-ever possible, the system should acquire data by requiring that the user makes a selection from possible values rather than having to type a complete value into a text box. The interface must remember the context in which the user is working and provide appropriate "templates", default values and an easy-to-access list of recently assigned values to reduce searching.

      A simple mockup of a web page to allow editting on a work/expression/manifestation is available here, but it will only be readable if you are using the Microsoft Internet Explorer 5 (IE5) browser. (If you are not using IE5, screen snaphots which can be viewed with any browser are available here.)

      Some of the required specific attributes of the editting facility are:

      1. It must be fast. As ALEG partners will be connecting over a Wide Area Network, it must minimise round-trips between the browser and web server and cache data in the client.

      2. Select rather than enter. The system should present users with valid selections from lists or as checkboxes/radio buttons where-ever possible rather than requiring entry of text. For example, the relationship of an agent with a work should be chosen by the user from a list, which shows possible relationships (and further, that "publisher" would be a valid relationship for a manifestation, but not for a work or expression must be recognised by the system).

      3. Search/browse and select. For many selections, the "list" of possibilities is too large to practically show (too slow to load to the client, too cumbersome to navigate). In these cases, such as an agent name list, the system should allow the user to type as many or few characters as they wish to restrict the search. For example, a user may enter "white" to retrieve the 103 "white" names into a select list. Each entry in the select list must show enough information to allow the user to differentiate them (eg, full name, dates and locations where available) and allow the user to find out more about a particular item in a popup window (eg, for a name, show related names, brief work list, link to a biography if available).

      4. Local error detection. Ideally, errors should be prevented by the user interface, but otherwise editing should be performed in the client so that errors can be detected as soon as practicable and without a round-trip delay to the server. Some types of errors cannot be detected by the system (a typo in a new title), but where-ever possible, the system should detect errors as soon as possible and with the least delay to the user.

      5. Context sensitive operation. The system must present a "template" which adapts to recognize the type of data being entered. For example, if the user indicates that the type of work being entered is a periodical issue, then the system should react accordingly:

        • Prompt for the periodical work of which this new work is a specific issue.

        • Prompt for the issue-specific details (number, date).

        • Make it easy for the user to lookup details entered against the periodical work. For example, editors will typically be entered against the periodical and "inherited" by each issue in the tenure date range of the editors. When deciding whether to add extra editors against a particular issue, the user must be able to quickly and easily check which editors are already in effect, and decide to update editor details against either or both of the periodical work and the new issue in hand.

        • Because the user will often want to now record manifestations of existing or new works as being "partOf" this periodical issue manifestation, the system should remember this periodical issue as a likely "container" when the user wants to record new manifestations of works.

      6. Sensible defaults. Where the context of the application permits the system should choose sensible defaults. For example, when creating a relationship link (or creation event!) between a work and an agent, the role of the agent should default to "creator". The language of an expression should default to "English".

      7. Notes at various levels. The system must allow the user to record notes at various levels, such as at the work level, each expression and manifestation level, and each agent, topic (thesaurus entry), relationship and event level. The notes can be of 3 specific types:

        1. a general public note, designed to be shown as part of the detailed record view - for example, "Although nominally the new-talent editor, he is known to have spent most of the 1960's completely sozzled and was about as welcome at an editorial meeting as a pokie at a parson's picnic". But beware of making actionable statements.

        2. a public source note, describing where this information was sourced

        3. a system-internal note, not designed for 'public consumption', but possibly clarifying some fact or making some observation

        Each of the 3 types can occur multiple times and is 'stamped' with the userid and date time of creation.

      8. Update History. The system must make it easy for the user to see the update history of the information in front of them - who updated what and when.

      9. Mass updates. The system must support the common mass updates which must be applied to multiple records. In some ways this may be relatively easier in this system as information is stored in as few places as possible and referenced (pointed to) rather than being copied. Hence, if the volume/issue information for a periodical issue needs to be changed, it just needs to be changed in one place, as all the 'partOf' manifestation records of the manifestations that periodical issue contains point to the periodical issue, rather than duplicating information about it.

        An anticipated common mass update will result from ongoing thesaurus/topic revision, where one term replaces another.

        [Any other common cases which need to be dealt with initially?]

      An update scenario - adding a new manifestation

      1. Click a "+" against one of the existing manifestations to create another one by cloning the selected manifestation.

      2. A new manifestation appears on the screen. (The mockup page lets you see how this might happen - what follows next has not been implemented in the mockup.)

      3. Probably amend the title, edition and ISBN/ISSN field if they are not applicable for this manifestation

      4. Edit the publication event. This will open a new popup window lookup something like this:

        Event for manifestation: One Tree Hill

        Event Type:
        Agent: Uncertain

        ...to the left are the most recent publishers assigned by this indexer...

        Search Agents:

         

        ...if an agent is selected from the
        right hand box, it gets 'promoted'
        & selected to the above 'recent
        publishers assigned' list...

        Only known publishers
        All agents
        Search for:

        Place: Uncertain

        ...to the left are the most recent publishing places assigned by this indexer...

        Search Places:

         

        ...if a place is selected from the
        right hand box, it gets 'promoted'
        & selected to the above 'recent
        publishing places assigned' list...

        Only known publishing places
        All places
        Search for:

        Date: (year) Not shown Circa Uncertain
        Source:
        Public Scope Notes:
        System Notes:
        Workflow Alert: Email to:

        Because you're telling the system that you want to define an event with a manifestation the system will default the type of event to 'publication' (but you can change it if you want it to be something else, such as 'printer').

        Because the event type is 'publication', the system will present you with a way to select publishers as being the agent in this event.

        Now, the system will know all the agents that have ever be assigned as publishers, and it will know the last 10 or 20 publishers which you personally have assigned.

        So, it would probably be a good idea if it:

        • lt you select very easily from the last 5 or 10 or 20 (?) or publishers maybe be showing them in a list
        • let you search very easily from all the publishers in the system, maybe by entering the first 2 or 3 characters of their name, to which the system would respond by showing all the publishers starting with those characters and allowing you to select one (this would be then added to your list of recently assigned publishers, making it even faster to select it next time...)
        • if you need to assign an agent which isn't normally a publisher, it should let you search all agents
        • if you need to create a new agent (new publisher), it should let you do that easily and without losing the context of the manifestation you are working on - that is, it should do it in popup-windows and not force you to move off or close the manifestation window
        • it should allow you to specify special common cases in some simple way, such as indictating something was published by the author, or publisher is unknown.
        • let you assign a publication place and year. Again, places will be selectable from the spatial topics, favouring those that have ever been used as publishing places, and remembering those which you have assigned recently.
        • let you assign source, public and system notes and a workflow note to be emailed to a co-worker

        The above screen-shot is quite large and busy. How could it be simplified?

        • Don't show the most recently assigned publishers. Because manifestations will be processed in a "random" order (with regards to publishers), remembering the last 10 or 20 publishers doesn't help much. Instead, the indexer must search for the publisher each time by entering the start of the publisher's name

        • Move the publisher search to a popup. For example, the "base" event screen might start like this:

          Event for manifestation: One Tree Hill

          Event Type:
          Agent:
          ....

          The user wants to select a publisher starting with "Bloom" so they enter the letters "Bloom" in the agent field like this:

          Event for manifestation: One Tree Hill

          Event Type:
          Agent:
          ....

          and then press the search button which pops up a window looking like this:

          Publication event agent search: One Tree Hill

          Search for: Only known publishers
          All agents
          Search Results:
           

          This might look better, but is it "better" in practise for an experienced user? This is hard to say - some people don't mind popup windows, some people find them distracting. Some people are most interested in minimizing keystrokes, others prefer step by step approaches even if it means more typing....

          The same comments apply to the place of publication - defer selection to a popup window?

          Alternatively, as the publisher has a very strong correlation with place of publication, the selection of publisher could populate a selection list of likely places (and allow for new places to be defined for the first time).

        • If the Public and System notes and workflow notes are only infrequently used, then they could be 'hidden' and only shown when actually used, or when a 'show notes' button was pressed (which would possibly show them in a popup, ready to be manipulated).

        Many of these issues will arise when the prototype system is actually tested. The goal will be to refine the prototype to produce a design which will allow the people using it to be as productive as possible when they are experienced users of the system. That is, whilst it is important for the system to be easily learnt, it would be a mistake to orientate the system solely towards inexperienced users at the expense of the productivity of experienced users.

    2. Management and Coordination

      Although the database is centralised, the operation of ALEG is distributed across Australia. The database will be maintained by different groups and the system must assist communication between those groups and make appropriate adaptations to the differences in the ways those groups will want to work.

      The system will support distributed operation in these ways:

      1. Different update/authorisation levels.

        The system will define Roles which can be selectively assigned to the staff of ALEG partners to allow them to update information. Suggested roles are:

        • add/edit work/expression/manifestation/holding/agent/award/archiveItem
        • delete work/expression/manifestation/holding/agent/award/archiveItem
        • add/edit topic map topics
        • delete topic map topic
        • supervisory functions:
          • maintain roles
          • approve changes for production?

        My personal experience in this area is that trust is rarely, if ever, abused. That is, with a skilled and dedicated team of people, they'll never do the 'wrong thing' anyway, and time spent putting lots of programming and administrative effort into defining and maintaining permissions is better spent putting in place a system which makes recovery from honest mistakes as painless as possible...

      2. Audit trail.

        The system will provide an audit trail showing who did what and when. Ideally, it will allow some limited 'undo' when someone mistakenly deletes a topic, or merge two terms.

      3. Workflow support.

        The system will allow records to be flagged as incomplete or needing intellectual input from a nominated person or persons before they can be completed. The system will allow maintainers to see what work is waiting for whose input. It will also support the maintenance of internal notes being attached to an entity or relationship to show the history of discussions on an issue.

      4. Identification of entities as part of a collection.

        Entities can be identified as belonging to a specialist collection. It is proposed that the topic map architecture be used to assign one or more collection topics to an entity, making possible the identification and extraction of those entities as part of a collection.

      5. Non-public entities.

        The system will support marking entities and relationships as being in a non-public state, where the information they contain can only be seen by ALEG partner staff.

  4. General Public User Interface

    1. General Principles

      1. No training required

        Doubtless, a few obligatory help pages will be constructed and a tip-of-the-day may lurk unobtrusively at the bottom of the home page, but noone will be expected to refer to these resources in order to use the public ALEG system. (The maintenance system will require documentation and training however, as that user interface will be much more complex and powerful.)

      2. Searching and browsing

        The system will offer the user two intertwined access modes, searching and browsing. More on this below.

      3. Clean layout, fast loading

        Sorry, no, changed our mind - messy, impenetrable and sluggish as treacle

      4. Support for accessibility guidelines

        The ALEG user interface will support the W3C's Web Content Accessibility Guidelines 1.0.

      5. Navigation

        Page headers and footers will show the user the context of the current page and allow them to move rapidly up the content hierarchy and previous page and to the home page, heeding the design advice of Jakob Nielsen on navigation and on providing an unsuprisingly user interface.

      6. Browser aware

        Users with more recent browsers will be able to receive an enhanced browsing experience

        Different browsers have different capabilities. Although Internet Explorer 5 arguably has the greatest capabilities and greatest speed, is free and available for most operating systems, some users choose not to install it, or can't for whatever reason.

        System designers like to be able to offer the "best" user interface, and not just one best one circa 1995 lowest-common-denominater web browser technology. With Internet Explorer now bundled with Windows, and having massive market share, it seems reasonable to target the system to IE4 and IE5 users. But Netscape is more widely used in academia than in the general population, and in any case, a system can't ignore 15%-20% of its potential users.

        The approach taken with the CSIRO web site was to generate the content in XML and to then translate it into different versions of HTML/DHTML depending on the client capabilities. This approach has some drawbacks:

        • the site can look slightly different to different users
        • web caches are nullified because the content cannot be reasonably cached as used by users with different browsers
        • some things are just fundamentally impossible in some browsers and so it is more than a matter of styling to provide the same information

        but nevertherless is better than the alternatives of not optimising presentation or ignoring some users.

        The layers programming model used by Netscape Navigator 4.x is defunct - it will not be supported by future versions of the browser. The new version of Navigator, the open-source Mozilla browser, has been long-delayed but is now in alpha test. Although there were initial hopes that IE and Navigator would converge to implement a common standard, those aspirations are fading, forcing projects like ALEG to choose between a lowest-common-demoninator approach or supporting multiple browsers for a long, long time.

        Hence, given market share and programming realities, the ALEG user interface will have as a primary target users of IE4/5 but produce a suitable (if not optimised) version for other browsers and will support any browser capable of rendering nested tables (Netscape 2 and above).

      7. User configurable

        ALEG will offer a default user interface, rendered using slightly different technqiues dependent on browser capability as described above. However, some users may which to configure the user interface, for example:

        • to use or not use frames
        • to use a version which does not require any client-side javascript capability

      8. Search on every page

        A search box and button will be prominently visible on every page. Sometimes a search may be performed on the whole site, or within some clearly identified context (this technique is used by the popular directories, eg this page offers a search on the whole Google Directory, or just within the romance section; this page is Yahoo!'s less explicit equivalent.

      9. ALEG branding

        The web site is the only contact most users with have with the ALEG project. The ALEG user interface will consistently and simply promote an ALEG identity through a simple logo (yet to be designed) and style (colour scheme, layout).

    2. Searching and browsing

      Users can discover the contents of the ALEG data base by searching and browsing.

      Searching refers to the entry of some text by the user describing what they are look for. Some search systems only allow the user to specify text. The system then returns the results which it evaluates as best matching what the user is looking for, ranked in an order which the system evalautes are most likley to meet the user's expectations. Popular examples of this apparently simple search strategy are Google and the main search at the top of the Amazon.com site.

      "Apparently simple", because the system has to work out what the user means (which is anything but simple).

      For example, enter "fence" into the Amazon.com search engine. Now try "Harry Potter". The Amazon search engine groups likely results into categories: books, music, DVD's, videos, electronics, software etc.

      Google relies on a recursive-citation-ranking algorithm (paper in PostScript format) to order the pages matching a search criteria, but has recently augmented the search results with a category result where the search term contains a known topic (eg, search Google for "Harry Potter").

      Another approach is to get the user to "help" the search engine by telling it more about the context of the search phrase they're entering, and possibly supplying filters to reduce the returned results.

      For example, Amazon offers a detailed and "power search" if you look hard enough. For many sites, such as the British Library OPAC, getting the user to supply lots of information up front in return for providing an accurate search is the modus operdani.

      Directory browsing takes a different approach of classifing of all material (often in many ways) and presenting the user with a hierarchical classification directory. Popular examples include Yahoo! and the Open Directory Project (also hosted as the Google Directory).

      What approach should ALEG take? The decision doesn't have to be made now, and can be changed during or after implementation. Here are some discussion notes:

      • If you don't provide reasonable query analysis and result ranking, you're going to need to let the user specify scope and filters. Building systems which provide good query analysis and result ranking takes time, so making the user do it themselves could be a viable cop-out. (Them's fightin words...)

      • Even if you provide excellent general ranking, there will always be occassions where a "hand-crafted" search by the user could have returned a more precise match

      • ALEG will have topics galore, but will we build enough hierarchy into the topic maps/thesaurii to support an appropriate fan-out of topics at each level (not too many, not too few)?

      • Searching within a scope narrowed down by a directory browse works very well (for the patient at least).

      • Step-wise refinement and query set manipulation was popular with search engines of 5 and more years ago. Is it still a useful technique, or is it just as easy to do the search again with another search term appended?

      Some approaches...

      1. Simple search

        Search term:

      2. Simple search specifying very broad scope

        Search term:  Scope:

      3. Getting context from the user more explicitly

        Author
        Title
        Subject

      4. Getting lots of context from the user

        Author
        First name, last name   Start of name   Exact name
        Title
        Any words in title   Start of words   Exact title
        Subject
        Any subjects   Start of subject words   Exact subject words
        Publisher
        Work Type:
        Work Form:
        Work Genre:
        Suitable for:
        Publication date:   Year:
        Sort results by:

      What comes back?

      Search results for: Patrick White

      1. White, Patrick (1912-1990) 1200 results
      2. White, Patrick ([1984]-) 3 results
      3. White, Patrick A. T. 2 results

      Search results for: Patrick White

      1. White, Patrick (1912-1990) 1200 results
      Biographical Details
      Works By 140 results
      Reviews of works 1134 results
      As Subject 256 results
      2. White, Patrick ([1984]-) 3 results
      3. White, Patrick A. T. 2 results

      Search results for: Patrick White

      1. White, Patrick (1912-1990) 1200 results
      Biographical Details
      Works By 140 results
          Short Story 27 results
          Drama 22 results
          Novel 13 results
          Verse 8 results
          Criticism 1 results
      Reviews of works 1134 results
      As Subject 256 results
      2. White, Patrick ([1984]-) 3 results
      3. White, Patrick A. T. 2 results

      Search results for: Patrick White

      1. White, Patrick (1912-1990) 1200 results
      Biographical Details
      Works By 140 results
          Short Story 27 results
          Drama 22 results
          Novel 13 results
                Voss
                      Criticism of 56 results
                      As subject 23 results
                      Related Work 2 results
                The Tree of Man
                The Vivisector
                (other novels would be listed here too)
          Verse 8 results
          Criticism 1 results
      Reviews of works 1134 results
      As Subject 256 results
      2. White, Patrick ([1984]-) 3 results
      3. White, Patrick A. T. 2 results

      The hyperlinked items in the above displays take the user to a page containing the described information (Patrick White's biographical data, the page describing "Voss" as known to ALEG, etc). The crudely drawn + and - buttons just open and close parts of the information hierarchy shown to the user. (Doing this is relatively straight forward with IE5, but so difficult as to be impractical under Netscape. Hence, users with older browsers would not see this dynamic display of information, but would either get a long list or hyperlinks which would result in another page being sent from the server when clicked. The directory approach of Yahoo! would suggest that work forms may be sub-directories, reviews of those works sub-sub directories, etc.)

      What happens when the user clicks on the "Reviews of Works" button, with 1134 results? Well, 1134 is too big a list to show, so these would have to be grouped somehow, probably based on the works being reviewed.

      That seems straight forward, but may not be a general solution. For example, "Patrick White as subject" describes 256 works. How should they be grouped?

      The user should also be able to search within their current scope. So, if the user has somehow positioned to "Voss", they should be able to search everything "under" that position for some other term, say, "David Malouf", and presumably find the relationship between "Voss the novel" and "Voss the opera".

      It is clear how to do this with the directoy paradigm. For example, imagine you were on the Open Directory Project's Nevil Shute page, and you entered "Alice Springs" in the search box and made sure the option "Search only in Shute, Nevil" was displayed (give it a go!) - you'd probably be pretty happy with the results. But how to do it with a + and - tree directory (like Windows Explorer) isn't quite so obvious. The Windows paradigm would have users right-mouse clicking the item and choosing "Find" from the popup menu which would open a dialog box and put the search results in a new window.

      Hmmm.. maybe ALEG should aim to populate the Open Directory Project Australian Literature page!

    3. Locating holdings

      For some manifestations of works, ALEG will record information about where it has been sighted. This information will be shown (decoded) to the user. However, a goal of ALEG is to offer detailed and relevance-ranked holdings information to user, highlighting "local" and "nearby" holdings at the top of the holdings list.

      It was never imagined that ALEG would store this information, as it is already available in NLA's Kinetica. However, how best to acquire this information and show it to the user is still undecided. The issues include:

      1. Access to Kinetica holdings information costs money. ALEG would probably have to pass these charges on to the user, hence forcing subscription access, at least for users incurring costs.

      2. ALEG could issue Z39.50 queries to leading libraries (University and State, perhaps others) rather than issue the request to Kinetica. This is obviously more expensive in total resource terms than issuing a single request to Kinetica (but will not incur financial charges), and could be problematic due to diverse Z39.50 target implementations and Z39.50 target availability

      3. Users will sometimes be interested in the availability of a work or expression, rather than a specific manifestation. This will mean that ALEG normally wouldn't be able to issue a query on a unique identifier (such as an ISBN (!!)), instead trying for a match on author and title. Hence, the results may be far from ideal.

      4. A useful service would ranking the holdings to show those 'nearest' to the user at the top of the list. This is impossible unless we know quite a bit about the user's location. Maybe we can get this information from the user's profile (registered user) or maybe we can deduce it from the IP address (a hit and miss affair, especially with multiple campuses, ISP's, wireless networks).

      For these reasons, we are not hopeful that holdings information will be one of the great features and attractions of ALEG by Janurary 2001, although we will work on approaches to these problems over the next few months.

    4. Accessing archive items and manuscript collections

      As discussed in the data model (needs updating) archive items will be held in Encoded Archival Description (EAD) format as part of the ALEG database, pending a more permanent home. Initially, these resource will not be searchable but will be accessible only from the agent associated with the archive item collection, and occassionally, from a work associated with some items in the archive item collection. Where archival information is available it will appear as a hyperlink from the agent or work page, and the user will receive a simple formatted list of archival item records, grouped by record type. This will be achieved by formatting the EAD with a simple XSL stylesheet into HTML.

      Information about manuscript collections is statically recorded in ALEG. That is, ALEG does not dynamically search RAAM and present what it finds. Rather, the indexer decides whether to link a manuscript collection to an agent (or possibly a work) and this information is presented to the user, typically as a note and an optional hyperlink to a specific resuorce in RAAM or elsewhere.

      A task for a future enhancement, maybe in concert with the redevlopment of RAAM could be to have a dynamic link, or maybe an alert when new material was added to RAAM.

    5. Access restrictions

      Whether ALEG is completely or partially a walled garden has yet to be decided, and probably won't be fully decided by the time implementation must start (if ever!).

      However, to allow for possible access restriction requirements, the system:

      1. must support registration of users. The data model inventory describes a simple data structure to represent "customers".

      2. must support identification of users ("logging on") at least to some parts of the system.

      3. must support the logging of what users do where that action may give rise to some charges (searching on Kinetica??).

      4. must support restricting access to some information to registered users, or some group of anonymous users based on connection IP address (for example, users from ALEG partners may have complete anonymous access to the system). The information to be restricted has yet to be decided...

      5. need not implement an interface with an external accounting system for the automatic generation of invoices and reconciliation of payments, as this will probably be manual or semi-automated and is outside the scope of this system for the time being.

    6. Multiple Views

      As mentioned above, ALEG will enable entities to be grouped into collections. Collections can be extracted by their owners for their own formatting and publication, or can be exposed with their own identity on the ALEG site.

      For example, assuming that a South Australian Women Writing collection is created, then it may be decided to:

      • Put an icon/link on the ALEG home page in the "specialist collection information" part of that page

      • Link to a page containing an appropriate banner, acknowledgements, background etc and an alphabetical hyperlinked list of South Australian Women Writers

      • Link each writer to a page containing bio info and hyperlinked list of works

      • Link each work to a page showing work details

      • Enable a simple author/title search over the collection.

      • An option to link back to ALEG for more complex, complete searching.

    7. Branding and source identification

      1. Branding. Each agent and work, expression, manifestation entity can be identified with one or more partner institutions as being the contributors to that entity. (Agent biographies can also be separately so identified.) As part of that identification, the contributing institution can tick a checkbox which says: "this institution should be explicitly acknowledged as a contributor to this entity when it is display in detail". When required, the ALEG interface will then display a subtle hyperlinked acknowledgement (text and/or icon) which the user can follow to an institution-specific page on ALEG which would typically contain a blurb about that institution's contribution to ALEG and further links. This page would be of the institution's own design.

      2. Source. ALEG will have a provision for a public 'source note' to be recorded on each entity and relationship. Where a public source note is available it will be shown in the detail view of an entity/relationship. The initials of the ALEG maintainer creating the source note will by shown and hyperlinked to a maintainer-specified page on ALEG showing a happy snap, resume, favorite cocktail recipe, whatever.. What a fine way to personalise ALEG and give a human voice to all that information!

    8. Logging

      The system will generate two types of logs:

      1. Because the ALEG interface will be running under a web server, a standard web log (in common log format) will be produced. Freely available and flexible log analysis tools such as Analog can read such a log to produce basic access statistics by system function based on IP address/network, and, if user authentication is required, by userid.

      2. "Interesting" parts of the system will log search terms and search options for latter analysis to help understand how people are using the system and identify areas requiring improvement (such as new thesaurus entries for "see also" terms).

  5. Business Issues

    1. General Principles

      The primary rationale for establishing the Australian Literature Electronic Gateway is:

      • To enhance research and learning in Australian literature

      In addition, the project has a number of subsidiary rationales:

      • To integrate and maximise the utility of existing infrastructure and content
      • To provide a publishing vehicle for the results of long term research
      • To capitalise on the development of new technologies which support these aims
      • To fulfil and deliver on obligations to the ARC and partners
      • To deliver a public good in the public interest

      The delivery of a public good in the public interest itself involves:

      • Developing long term sustainability of the public good
      • Commitment to a co-operative, collaborative, non-profit ethos
      • Ensuring that financial decisions, including achieving optimal cost recovery, support but do not drive the Gateway's strategic goals and primary rationales

    2. Intellectual Property

      The Gateway will encompass three distinct sets of Intellectual Property in support of its primary and subsidiary rationales

      • The IP residing in the system design
      • The IP residing in records and other information provided by partners
      • The IP residing in records and other information sourced from third parties

      The Gateway system will support identification of the IP residing in partner and third party records and other information (see Branding and Source Identification), and an IP register will be maintained by the project managers. All IP will be protected and formalised using the Blake Dawson Waldron Draft Gateway Agreement as a basis for formalisation of ownership, relationships and licensed uses of IP.

    3. Public Access

      The Gateway system will support partial or complete access restrictions (see Access Restrictions) to all Intellectual Property (such as records), and will support flexibility in setting such restrictions. The project's aim will be to maximise the amount of quality information available free of charge - and which will substantially populate generalist directories such as Google's Open Directory Project - whilst retaining sufficient 'value added' attractions behind the 'walled garden' to achieve optimal cost recovery. See the Business Model for fuller discussion of business issues in relation to the Service, Partner and Data Management modes.

  6. Interoperability

    1. ALEG as a Z39.50 target

      ALEG will host a Z39.50 target implementing the Bath profile, initially just for Functional Area A (Basic Bibliographic Search and Retrieval) at Conformance Level 0, as described in section 5.A.0 of the Bath Profile, with support for the XML DTD and Simple Unstructured Text Record Syntax (SUTRS) and MARC21.

    2. Extracting information from ALEG

      The ALEG system will licence partners to extract records from the database in XML format. The criteria for extract (ie, how to specify what is to extracted) has yet to be decided, but typically examples could be:

      1. All agents associated with a specific attribute or Topic (or set of attribute Topics) in a specific role (or sets of Roles) and the works associated with those agents in a specific relationship (or set of relationships).

        (For example, all agents with a gender of "Female" and either a birthplace of "South Australia" or some period of residency in "South Australia" and all the works they have "created" or "editted".)

      2. All works and/or agents associated with a nominated collection.

      3. All archival items associated with a nominated agent and/or work.

        (This would allow the production of the Lu Rees author files.)

    3. Publishing the ALEG Topic Map

      Initially Topic Maps will be used as a useful abstraction and design principle and implementation technique rather than as public way to access the ALEG database. However, the development and implementation of the system will be undertaken cognisant of the great potential of making Topic Maps publically available, exposing the base resources to be described to external Topic Maps and merging Topic Maps from different sources.


Home > Design
ALEG Development Team
c/- k.fitch@adfa.edu.au
28 July 2000
Renamed to "System design" 10 August 2000