ALEG
Weekly Report - Week 26, 26 October 2001
What I've done
- Another week spent performing formatting changes and tarting the
system up ready for the public beta period. With the increased
testing, quite a few actual errors (and not just formatting issues) were detected and fixed.
Some of the major non-formatting changes are:
-
Oracle's interMedia text engine knobs were twiddled to treat apostrophes as
punctuation (rather than part of a word), so searches for "Drover's Dog",
"Drovers' Dog" and "Drovers Dog" will all return the same results.
A parameter to map diacricitcs to their "base" characters was also
enabled, although this is problematic for diacritics without
base characters (oh well, probably better than nothing).
- First known dates for expressions are now calculated and used by the
stylesheets.
- All the stylesheet work has been focussed on IE5, but later in the
week I spent some time improving things for Netscape 6 users.
We have to decide how much effort we want to spend support the now
ancient Netscape 4 browser. Netscape loyalists often have
a strong distaste for Microsoft's Internet Explorer, but at least
if they download
Netscape 6.1, AustLit should look OK (mostly)!
-
A major change was made to subject searches containing booleans which hopefully
has made it more useful and intuitive. Previously, a search for
"apple not orange" found works which had at least one topic which itself
contained the word "apple" but not the word "orange". Now, a
a search for "apple not orange" finds works which had at least one topic which itself
contained the word "apple" but no topics which contain the word "orange".
- Alternative titles stored at the expression and manifestation level
are now being searched again. At some previous point, we changed the
search algorithm to only search titles at the work level
were searched.
- The sort order for works is now always date by title (was just title).
- Lists of work summaries now have checkboxes and individual works or
the whole list can be selected for detailed display.
- Dropped execution priority of long-running queries so that new short queries are given
preference when the machine is very busy.
- The system has been stable this week, so maybe the undocumented
parameter on the Java Virtual Machine (JVM) has done the trick. There
was a press report of Sun about to release a new JVM specifically to boost
performance on our type of Ultra Sparc III processor, but nothing else
has surfaced yet.
The system was quite busy on Thursday and Friday with the increased
traffic. I've been running the Solaris "sar" utility to monitor
CPU, I/O and memory usage, and memory and I/O are fine, but the
CPU is sometimes very busy and facing a queue of work. Some of
this is undoubtably due to the haste with which first-known-publication
details were added to the work summary displays (including reviewing
and critical works), as the volume of XML generated more than doubled.
This should be addressed when we have a moment to tune XML production.
The free Analog log analysis tool by
Stephen Turner was installed and usage statistics are
now being generated.
(WARNING: you must read What the results mean
before attempting to interpret anything in these usage statistics!)
- With the system being launched, many people have said kind words
about the project. I'd like to add a special thanks to everyone
on the team. I've worked on IT projects large and small, applications
and infrastructure, in the public and private sector for over 20 years,
and it has largely been an unedifying exprience observing waste, mismanagement
and a general lack of accountability, enthusiasm, clear goals and results
more often than not led by hardware and software vendors with their often
ludicrous "value propositions" rather than the needs of the systems' users.
But the people working on AustLit have been without
exception focussed, enthusiastic, skilful and above all, a team
dedicated to the best possible outcome using the available resources.
Given the diversity of people and their physical distribution,
this has been an amazing achievement. As with any large project, lots of
things went wrong - the data conversion and merging for example was a disaster,
and such a setback could have scuppered the project. But rather than
panic and finger-pointing, the problem was addressed and solved.
Much of the credit must go to the incredible vision and
leadership of Marie-Louise and the dedication, innovation, planning and operational
abilities of Kerry and Annette, but the hard work and great ideas of every member of the
team from every institution was
instrumental in bringing the project together.
I'd like to think that we'd have 'noticed' the FRBR model a few days after
Judith Pearce subtlely brought it to our intention, but just in
case her intervention wasn't unnecessary, a special "thankyou" to
her for this, and for marshalling the support of the NLA.
Next Week
- Email summary and detail results.
- Implement Advanced and keyword anywhere searching.
- Revise EAD production for Lu Rees Finding Aids (following changes devised by
Megan and Marlene).
Next few weeks
- William noticed that some of the place-of-publication data for
loaded records is wrong where the name of the place of publication (town or
city) occurs in more than one state (eg Richmond, Glebe). I'll investigate
when we map the spatial thesaurus. Dan, Chris and Terry have also noticed
similar incidents, so I think a very careful look at all place assignments
to places occuring in multiple states/countries is warranted. (And Joan
noticed Aberdeen NSW -v- Aberdeen Scotland!)
- Multiple creation events for a work as a mechanism for allowing date ranges
to be associated with agents responsible for works, eg editors of a periodical.
Links of the week
- In the Beginning was the Command Line - Neal Stephenson
A great weekend read if you are curious about computer operating system anthropology
in the wake of the hoopla surrounding the Windows XP launch.
- The Death Knell of Noisy Searches -
IT-director.com.
Topic maps are able to declare a set of labels for topics and then to point
to places where those topics are discussed and addressed. Essentially they provide, under
the umbrella of a designated standard topic area, a way to provide information seekers with
information across a variety documents. Whereas HTML is a metatag bound to the very document
it is describing, topic maps exist independently of the document, allowing users and applications
to understand the relationships between those documents.
- ALICE victorious in AI challenge -
Rupert Goodwins, ZDNet (UK)
ALICE got the highest score at this year's contest,
held at the Science Museum in London on Saturday, although the silver
and gold medals remain unawarded. The silver medal -- and $25,000 -- will go
to any program able to convince half the judges that it is human ...
To date, the behaviour of the humans involved has been considerably
more entertaining than that of the robots
Here's Alice and here's
anti-Alice