ALEG
Weekly Report - Week 34, 22 December 2000
What I've done
- Reviews were linked to works and stylesheets changed to render
reviews and works reviewed.
- Most of the updates to works made since September have been applied,
although changes made to publication details and sources have not been
applied, but have just been 'identified', as they can't be automatically
applied without undoing the changes made manually to ALEG since September.
So, there will be a manual task to identify which source/publication
changes should be applied (maybe 1-2 days work?).
- The "easy" to apply fixes to sources identified by Ben from the
first load in September have been applied. The rest (400+) are too
hard to apply automatically because of the variety of processing
required and content parsing, and again will require manual processing.
Hmm... lots of material with which to test the new user interface!
The system currently has 361,360 source references. Of these, 355,238
are "resolved", that is, pointing to a work recorded on Austlit, and
6,392 remain "unresolved". Most of these 6,392 are probably bogus (maybe around
4,000), caused by an earlier problem when reviews where mistakenly included in a load,
and almost deleted (but not quite - I think a few ended up being "linked" to
sources as well, which I must fix). Of the remaining 2,200 (?) unresolved
source links, there are about 650 separate sources (source titles, not accounting
for separate source works), the top 4 of which comprise approx 300 references
just on their own ("Earth Wings: Outrider 91 Almanach" "Feel: An Anthology", "Despatches"
and "Contemporary Poets of the English Language").
- Some more email discussions regarding formatting, sorting and diacritic
support.
- Started implementing the work/expression/manifestation maintenance interface.
What I haven't done but need to do soon!
- Document how ALEG will handle some tricky cases - The "Poets of the
Month" works from the mid 1970's and "Down the Lake with Half a Chook".
These are amongst the most "difficult" cases Tessa and Kathy can
come up with, so if we think the proposed data model can handle these,
we'll be happy!
Next week
- Continue implementation of the work/expression/manifestation maintenance interface.
Summary
- I'd forgotten what fun it was trying to match the Austlit sources. Although
most of them "match" or can be forced to "match", there is a worrying array of
almost-but-not-quite-the-same names for publications, inconsistent use of "The" in
periodical title (sometimes absent, sometimes handled as a non-sorting prefix, sometimes not)
and a general vagueness that maybe just represents the fuzziness and complexitity of the real world
but disturbs the drill-sergeant within.
- Thanks to everyone for a very challenging and enjoyable 34 weeks. Working with
Annette, Kerry and Tessa has been a great pleasure, and Marie-Louise's leadership is
inspirational. The whole team has a deep dedication and expertise which I've
rarely come across; it is a great privilege to be working on this project.
Links of the week - The Semantic Web, Ontology, Topic Maps, RDF:
- There is a fascinating discussion on the XML-DEV mailing list
this week which has ranged over Tim Berners-Lee's "Semantic Web" vision,
the definition and applicability of ontology to the implementation of
a semantic web, and the possible roles of RDF and Topic Maps for the
identification and classification of resources.
An archive of XML-DEV for December 2000 is located at
http://lists.xml.org/archives/xml-dev/200012/maillist.html.
Some of the relevant threads are:
'"RDF + Topic Maps" = The Future ', 'A Light Rant On Ontological Commitment', 'Ontologies',
'Success factors for the Web and Semantic Web' and
'local, global (was various ontology, RDF, topic maps)'.
This is probably old-hat to the Library profession, but
it is new and interesting to most of your brethren in the other
Information Management and Data Processing professions (such as myself).
One reference I found particularly illuminating was
Conceptual Modeling for
Distributed Ontology Environments by Deborah L. McGuinness from
the Stanford University Knowledge Systems Laboratory.