Upconversion into XML from a plain text source, followed by hand editing for production.
XHTML (in your web browser): 128KB
PDF (in a PDF or e-book reader): 1.1MB
The design of the PDF emulates, more or less, the design of the 1904 edition used as source copy, a scan of which (also in PDF) you can see here (6.15MB).
Plain text source, as acquired on the Internet
Stylesheet pipeline for upconversion into a rudimentary TEI format
A .zip file containing an XProc pipeline file and six XSLT 2.0 stylesheets, all quite simple (although using XSLT 2.0 logic for text processing and grouping). Note that some of the logic is already tuned to this text.
“Crude” results of this transformation (XML source)
This is about as far as you can get, practically speaking, before hand work is as efficient as automation.
In the browser, this XML will be shown rendered by a simple “tagged view” XSLT.
CSS stylesheet for editing the TEI in oXygen
This was developed incrementally as I worked, and is not suited for general-purpose TEI encoding (although it makes a start).
Text after hand coding (XML source), with introduction added
See the TEI header for details on the encoding. Shown with the same XSLT as the “crude” XML.
Rendition stylesheet for converting into XHTML (XSLT 1.0)
Note again that neither this stylesheet nor the XSL-FO stylesheet are developed as general-purpose TEI transformations, but instead are tuned for this text.
Additionally, this XHTML version leans (like the PDF version) towards rendering the encoded text in (something like) its original appearance, rather than refactoring the design to take advantage of electronic display. For example, no navigation such as a table of contents is offered (the only links connect footnote references to footnotes and vice-versa). To do so would require only modest enhancements to this XSLT, or alternatively a subsequent transformation over these XHTML results. An EPUB version could be produced similarly.
Rendition stylesheet for converting into PDF via XSL-FO (XSLT 2.0)
The image was generated from a driver file using XSLT; in this .zip are source file (XML), stylesheet (XSLT 1.0) and SVG result, with the PNG version used on this page.
This file was created by hand; link directly to the SVG source.
Works such as this, which have entered the public domain, are not now hard to acquire and study on the Internet — once you know what to look for. (The serendipity by which a text like this one may be discovered at all is a different matter.) This particular book has been available in several formats for some years (it appears to have been first scanned and released by the University of Toronto in 2006), even before several copies in various libraries were accessioned by Google Books.
However, the quality of such offerings is limited by the scale at which these projects operate. A sequence of images of a book's pages, as technologists know, is not the same as (nor so good as) an actual encoded version of the text itself (to say nothing of the significantly larger bandwidth required to communicate it). As for the latter, digital libraries projects are now, for the most part, giving us only such quality (of copy text and encoding) as we can get with machines programmed to recognize text in scanned images; and while we can hope and expect that the intelligence programmed into the machines will continue to become more sensitive and sophisticated, it is also likely that hand work will always be required to get hand finish, even if a text like this is ever touched again.
This project is intended to demonstrate what can be done, at minimal cost and with a modest investment in time, to bring the production values of an electronic (re-) publication up to the level of the original. To be sure, the expertise and experience required to avoid false starts and blind alleys is not negligible. Yet while there will continue to a need for specialists, the best way to learn is by doing it. It does not have to be a black art.
In the hope that this effort can help to make it less so, all the code developed in the course of this project is made available here.
Additionally, this text was chosen specifically because it is not so complex that it demands much special handling (the TEI tag set is capable of much more); its structures are entirely generic for discursive prose. For this reason it may make a good study text and example for students of text encoding.
It hardly needs to be added that an electronic text of reasonable quality such as this can also provide a solid foundation for further work.
RenderX XEP XSL formatter
wapiez (at) mulberrytech (dot) com