[Side note: the system makes heavy use of the Xalan XSL stylesheet processor. XSL stylesheets are instructions marked up in XML. The Xalan processor reads the XML and computes an internal form of the stylesheet - sort of like compiling a program. This process takes a while, so we keep a "cache" of "compiled" stylesheets, and reuse them rather than reading them as XML from disk every time. But I've noticed that if we keep them forever, the memory used by the JVM climbs without bound (bad!), so we periodically flush the compiled stylesheets which frees the memory, but requires us to re-read and re-compile them.]As well, there was an unresolved problem a few months ago were occasionally the stylesheet processor would crash reading a stylesheet - this was not reproducible, and seemed to just disappear one day... OK, so maybe something flakey is going on during stylesheet compilation. Specifically, maybe there is a synchronization problem deep in Xalan or Xerces (the XML parser used by Xalan). This is all mere speculation, but in an attempt to rule it out, I've implemented our own synchronization around the Xalan compilation code to ensure that only one stylesheet can be compiled at a time. This change is now running, so we'll see if it "works around" these hangs. Of course, the whole JVM hanging (and not responsing to QUIT signals) should never happen, so this isn't ever going to fix the root cause, but it may avoid the problem.
An article about Brewster Kahle's "WayBack Machine", part of The Internet Archive. The implications of this work are amazing: The project has spurred a kind of enthusiasm that hasn't been seen in a while in the downhearted tech world. Lawrence Lessig, a Stanford University law professor who seeks to explain the interplay of technology and society, was uncharacteristically ebullient. "My brand is pessimism," he said. "This is not something to be pessimistic about. Brewster is my hero." Mr. Lessig not only admires the project for the knowledge it will preserve. He said he also thinks it can shift the balance in the debate over copyrights and access to intellectual property like books, music and movies. Holders of copyrights will eventually drag Mr. Kahle into court, Mr. Lessig predicted. So far the battle over copyright has been fought chiefly by copyright owners and their lawyers on the one side, and college professors and computer technicians on the other. Mr. Lessig says that will change if people use the Wayback Machine. "We finally have a clear and tangible example of what's at stake," he said. "Brewster is defining the public domain." Users will see "how easy and important this technology would be in keeping us sane and honest about where we've been and where we're going." So, it may be "WayBack", but hopefully it propells us "WayAway" from 1984.Update, 5 Nov 2001: Dumpster diving on the web - Katharine Mieszkowski, salon.com
.... For instance, a wire service, such as the Associated Press, might balk when it discovers that thousands of its stories, published on other sites, can be freely visited in the Internet Archive Wayback Machine. The testy members of the National Writers Union may also view the archive as an unauthorized and uncompensated republishing of their work. There's also the tricky question of what happens if a settlement in a lawsuit requires that libelous material be removed from a Web site, yet the original lives on in the archive?