ALEG
Weekly Report - Week 17, 25 August 2000
What I've done
- Defined a prototype Oracle database structure
- Implemented some basic infrastructure:
- Defined the basics of a Java package structure which will
hold the source code and group the Java classes
- Defined some basic functional primitives to access and update
the database
- Defined some basic utility classes for common functions
- Used the above to build a class for dealing with the
general thesarus hierarchy (broader/narrower and contains/part-of
relationships)
- Used the above to read a dump of 20% of the AUSTLIT data
and extract place names from publisher and source fields,
convert them into a hierarchy (eg, Newcastle, NSW, Australia)
and load them into the database.
- Test load of Agents and their names from a new AUSTLIT
extract produced by Fran. The issues involving parsing AUSTLIT
names turned out to be much harder than I'd thought because
some of the names are very complicated. For example, we have
ti be able to parse and interpret a single name string which
contains embedded within it a pseudonym, references to the
name(s) of the real people using that pseudonym and birth-death
dates, where each of the names may contain serval encoded variant
name parts (including "nee" names). Anyway, it is done and the
test names have been loaded and the name/agent relationships
and agent/pseudoagent relationships have been built.
What I haven't done but need to do soon!
- Document how ALEG will handle some tricky cases - The "Poets of the
Month" works from the mid 1970's and "Down the Lake with Half a Chook".
These are amongst the most "difficult" cases Tessa and Kathy can
come up with, so if we think the proposed data model can handle these,
we'll be happy!
Next week
- Parse and load the rest of the test AUSTLIT data
- If all goes well (think it will), ask Fran to dump the
entire database, and load that!
- Load at least some parts of the BAL/LAW data
- We received the Oracle 8.1.6 releae for NT from UNSW
this week - think about loaded that if I get sick of
typing and debugging code...
Summary
- Finally, some coding and data loading! The complexity of
the names suprised me, but balancing that, I've implemented
a lot of useful infrastructure code.