ALEG
Weekly Report - Week 17, 24 August 2001
What I've done
- Thesaurus - the saga continued this week, but with steady progress.
As the mapping exercise has been recognised as a very large effort,
I improved the tools to assist the mapping process.
We had a fairly complete mapping of place-names-as-subject to
the new thesaurus subjects, but we have many more place names
recording birth/death places and places of publication.
Many of these are not in the thesaurus as delivered, or are enrichment
terms (unassigned to the hierarchy), which is not what we want
for this use of place (we want to be able to derive broader terms
to identify agents born in a country, etc). So, I spent a lot of
time identifying these topics and matching as many as possible
automatically, and providing another program to help manually
map them. This is a risky process given the same name being used
for many localities, typos and general vagueness, so the whole
mapping (about 3800 terms) needs to be manually inspected.
- Added "affiliation" on works, initially as a means of
identifying the "Responses to Asia" subset
- Fiddled with the ethnicity and nationality lists. Duplicates
were removed and topics with 2 values (eg "Hungarian/Jewish") were
split into 2 attributes ("Hungarian" and "Jewish"). Some
more work needs to be done on the Aboriginal ethnicities.
- Investigated some problems with the "Responses to Asia"
data load - a small amount of manual work is required to
finalise this.
- William indentified a problem with the AMLC data load with
approximately 50 records which had a bogus statement of responsibility,
due to a program error. I fixed what remained of this problem manually.
-
The Awards module was developed late in the week:
- Database changes and maintenance changes to support entry and storage of
a new "Award" topic (added to the thesaurus hierarchy) and "Award Detail" topic.
The "Award" topic represents a particular award (eg, the Miles Franklin),
whereas the "Award Detail" represents the awarding of an award to a work or
agent, and includes attributes for year, placing (eg, "winner", "short-list")
section, sub-section and note.
- The old awards data for works was used to populate a new thesaurus hierarchy
for awards. I manually corrected typos and made minor adjustments to the
award names and formats in an attempt to cause as few "Award" topics to be
created as possible (I got it down to about 500 seperate awards), and then
naively classified them as "Australian" and "International", which are
currently the only arms of the Awards hierarchy.
- Then the old text awards were split into award name (the "Award" part),
and optional fields: section, sub-section, placing and year and loaded
as new topics and linked to the works. The old text awards are still on
the system and viewable but not updatable - I'll leave them there until
we feel happy nothing has been lost.
- There are 811 Awards-as-subjects. These were all mapped to "pending"
in the new thesaurus, but I tried to match them to award name and possibly
year with a program. About 555 were matched, leaving about 256 unmapped
due to spelling variations in the award-in-subject and the "Award", or
the award-in-subject just not being present as an "Award" in the thesaurus.
These need to be examined manually.
- Some Work and Agent awards have not yet be processed - I'm not
certain why, but they are in a different format to the 'majority' of
awards processed this week. There are about 500 work and 100 agent - of these
awards and I'll sort them out next week.
-
The Agent and Work stylesheets have been updated to render the new
Award Details. The Agent details page now shows awards given to
works produced by the agent, sorted by award name (eg,
http://www.austlit.edu.au/run?ex=ShowAgent&agentId=AO4
- The Apache web server stopped serving static pages after a system limit on
the number of concurrent files it can have open was reached. This did not
effect the AUSTLIT 'dynamic' site, just the doco site which consists of
files which are read by the Apache web server and delivered to the requester.
This seems to be a known problem with an interaction between Apache and
Tomcat, and will require stopping and starting the Apache web server
every few weeks until a fix is found and installed.
- Only a few more reports and one-off updates to assist the cleanup effort.
Next Week
- More thesaurus conversion/mapping and user interface. Fix up the straggling awards, map all the
Australian-publishers-as-subject to agent and "Australian Publishers" concept.
- To Do list. Work through the (mostly) minor issues on the to-do
list provided by Annette/Kerry/Marie-Louise.
Next few weeks
- First known dates.
- Simple, guided, advanced search screen design.
- William noticed that some of the place-of-publication data for
loaded records is wrong where the name of the place of publication (town or
city) occurs in more than one state (eg Richmond, Glebe). I'll investigate
when we map the spatial thesaurus.