ALEG
Weekly Report - Week 22, 29 September 2000
What I've done
- Two frustrating days spent getting the Oracle 8i Intermedia free text
searching working under Solaris. Attempting to creating text indices
using Intermedia failed with an error message indicating that the
Oracle "extproc" (external procedure, to which text indexing is
delegated) could not be launched. This problem was common reported
on internet mailing/discussion lists, so I worked through the various
solutions offered, but none fixed the problem. Eventually the problem
was tracked down to a domain suffix being added by the service resolver
which attempted to find the extproc, but not being defined in the service
configuration file created by the Oracle Installer - it turned out to
be a very simple problem caused by the Installer, but I was looking for
complicated problems caused by running 2 versions of Oracle on the
same machine...
- Started to tackle the problem of matching AUSTLIT sources with AUSTLIT
titles. AUSTLIT stores sources and titles quite separately, but in ALEG
we want to match the source to a work. Unfortunately, this is difficult
because the representation of the source title rarely exactly matches
the representation of the AUSTLIT work title - spelling variations plus
title variations (use of alternative title), author variations (use
of alternative names), date variations conspire to make automated matching
either very difficult or risky.
There are 9592 AUSTLIT sources. Of these,
5580 have no author specified (mostly periodicals not recorded
as works in AUSTLIT). Of the 4012 with authors, 3557 have now been
matched with AUSTLIT derived expressions, leaving 455 to be matched.
Some of these are non-AUSTLIT works, some are just gross mismatches
which will need manual assignment.
I'm currently working on the 5880 without authors, trying to
recognize the genuine periodicals and create ALEG works for them.
Some are not periodicals (instead, they are monographs or collected
works without authors specified), many of the 5880 are spelling
variations which I'm trying to unify.
What I haven't done but need to do soon!
- Document how ALEG will handle some tricky cases - The "Poets of the
Month" works from the mid 1970's and "Down the Lake with Half a Chook".
These are amongst the most "difficult" cases Tessa and Kathy can
come up with, so if we think the proposed data model can handle these,
we'll be happy!
Next week
- Finish matching the sources (all that can be done automatically)
and creating all the part manifestations by processing the AUSTLIT sources
file to create/match compound works and expressions, deal with the
periodical sources such as The Bulletin, The Age etc
Fran will also be producing a file of the reviews, so we'll have to see
how best to process them.
Summary
- More frustration with the Oracle installation and lots of data
matching problems with sources, but I'm whittling away at them, and
getting to know the data much better than I thought I would!