ALEG
Weekly Report - Week 41, 9 February 2001
What I've done
- Set up ALEG data and programs on the loan machine. Because it
has 1.8GB of memory (ie lots!!), the size of the Oracle buffers has been
greatly increased over their size on the libadfa machine (which we
are sharing with the ADFA library). The new machine will have
1GB of memory, but this should be enough!
- I installed the latest version of the Apache project's Xerces
XML parser, version 1.3.0.
The improved code over the old version we were using (1.1.3) has
had a fairly major impact on response time where building the
XML and formatting it were critical components. For example,
formatting a summary showing all Peter Porter's nearly 1000 works
took 8 - 9 seconds using v 1.1.3 (excluding data base retrieval
times). With 1.3.0 it takes 5 - 6 seconds. Formatting
"Collected Poems [1961-1999] [by] Peter Porter" (several
editions, with over 700 components) took 10 - 14 seconds
with v 1.1.3, and just 5 - 6 seconds with v 1.3.0 (again excluding
database search/retrieval times). A main reason seems to
be that v 1.3.0 is much more frugal with its memory usage,
causing much less CPU time to be devoted to garbage collection
of discard objects in memory. I've sent a "thank you" to the Xerces
team for this excellent and free XML parser.
There are more recent versions of the Tomcat servlet engine and
the Xalan stylesheet processor available, which I'll investigate
soon.
- I noticed that response time for the Peter Porter selected works
seemed much worse on the loan machine than the old libadfa with the
same code base and machine settings. I turned out that I was
connecting directly from my browser to libadfa, but was connecting
throught the ADFA web cache to get to the loan machine. The
output to this request was quite large - about 335 KB, and the
time taken by the ADFA web cache to receive and buffer it and
pass it on to my browser doubled response time. Any request
which generates large bursts of output would be similarly
effected, such as editting a work with many reviews or parts.
It will be worth remembering this and encouraging ALEG maintainers
to bypass their local caches when connecting to ALEG.
- During dozens of emails between team members on work attributes
and templates, I continued implementing the changes, briefly:
- some reassignment of attributes between worktype/form/genre
clumps. Some records have been updated, some are still to be done.
- many new attributes, some trivial, some significant (especially
reprints, locally-recorded holdings (including institution identification),
contained-as-part-of/contains-as-part-of used for series/sequences)
- many new templates and a revised EditHome and "Add new work" forms
which group these (hopefully logically!)
- popup standard text selection
There is a lot more to do, and I must admit I lost track of all the
emails mid-week, so I have to re-read and work out exactly what
needs to be done.
- Data conversion! Ah, the hubris after the initial LAW load and
the depression of the debacle caused by matching on surname plus
initial if and only if the surname plus initial was unique in
Austlit! Unfortunately, there are many cases such as LAW name
"Feroka, Harry" (b 1856, gender male), but with a work published in
1886) matching ALEG name "Feroka, Holly" (no dob, no gender, 3 short
stories published 1980's).
Meanwhile, the first manual pass of the complete LAW file was finished
and loaded. About 300 hard cases remain!
A report was produced for a first pass at correcting publisher names
and the changes applied. A second pass is underway.
- Judith Pearce and James Bullen from NLA supplied a file listing the
Australian libraries, NUC codes, state and URL which has been
loaded. In the long term we'll somehow get automatic updates
or link directly to the NLA's database.
- With Fran, a little more work setting up the system side of
the loan machine, especially security and automatic boot-up
sequences.
What I haven't done but need to do soon!
- Review recent mail on work/expression/manifestation edit schema and apply final
changes. Some changes to the Agent schema are also expected.
- Add general edits based on XSLT Schema (the Schematron approach).
- Add topic selection to the work/expression/manifestation UI
(Marie-Louise, Kerry and Annette are producing a document
next week containing suggested changes for me to work through).
- Think about user access control
- The maintenance suite does way-to-much unnecessary updating
with works containing other works - investigate and fix.
- Revisit output formatting (especially to show all the newly
added attributes and relationships) and search screens.
Next week
- Maintain the "first known date" field for works automatically based
on manifestation data of the work or containing works.
- Start coding the program to
merge in the BAL data.
- Finalise changes (at least "finalise for the training")
to the edit suite based on detailed feedback.
- Cut the ALEG software over to the new machine mid-week.
Summary
- The data conversion problems never seems to go away! I suppose it
is just difficult trying to match things in the absence of unambiguous
identifiers. The edit suite is steadily improving, and it has to
be pretty-well finalised for the training prepartion next week.
The automatic maintenance of the first-known date for works is
tricky, but should be OK.
Links of the week
- A Love Song For Napster
- Jaron Lanier
"First, let's suppose that the supreme Court declares Napster-like software
illegal this year. Then jump ahead with me to the year 2015 for a look back at how things went..."