Towards Hermeneutic Markup

An architectural outline

Wendell Piez

Digital Humanities 2010

King's College, London

July 9, 2010


Read the abstract as presented in the conference proceedings (use an XML/XSLT-capable browser).

Students of this topic will recognize that I barely skim the surface of the problem here. The overlap problem, although famously difficult and worthy itself of a more extended treatment, must also be considered within the context of broader problems with the currently dominant architecture of document processing, which is designed to support the goals of publishing (especially publishing at scale in multiple formats), not of scholarly interpretation. In brief, what is called for is a data model and architecture supporting the following:

Each of these three points could be elaborated at length. The first two in particular are the subjects of ongoing research. The third one especially is the focus of this presentation. Truly expressive applications of markup to scholarly text processing will be rewarded, it seems to me, by shifting attention from markup as such, to document modeling as a research project in its own right, underlying and enabling markup and applications based on it.

Presentation slides

The presentation slides are in PDF format. This is a very high-level view of the problem, and is not intended to be self-explanatory.


A demonstration shows not hermeneutic markup in anything like its full potential, but only a hint of what will be possible in a markup regimen that does not impose a single unitary hierarchy over a text. The markup here is extremely simple and straightforward, even trivial. The only thing at all remarkable about it (and the fact that this is remarkable is itself somewhat remarkable) is that it identifies phenomena in the texts that overlap, and therefore cannot be directly represented together, at least at the same time, in XML.

The scholarly intent of this demonstration (such as it is) is to depict the way different examples of the sonnet form (mainly in English, but also with German, French and Spanish cases) have different rhythmic profiles in the interplay between their metrical (verse) structures and rhetorical or grammatical (sentence/phrasing) structures. The thesis is that any particular sonnet, and any moment within a sonnet, is more or less quiet or turbulent, turbulence occurring when the speech rhythms proper to the phrasing of the sonnet interfere with the regular flow of the meter. While these differences within and between sonnets are subtly apparent in reading (subject, of course, to the different interpretations provided to them by different enunciations), they can also be more dramatically represented by a graphical rendition in which the correspondence or interference between the two hierarchies is specifically drawn.

This markup, trivial though it is, is manifestly interpretive in at least two respects:

In order to create these representations, a library of sonnets is marked up in XML, with the XML tree structure representing the verse form, namely lines within couplets or quatrains. Another hierarchy, indicating the grammar or phrasing of the poem (elements are s for sentence and phr for phrase) are marked up using a milestone convention (the LMNL CLIX notation) in which XML elements, rather than simple start- or end-tags, indicate the beginnings and ends of structures. This enables pipeline processing to create the following alternative formats and renditions:

Of the stylesheets that perform these conversions, the only ones that are not entirely generic are the two that display the sonnet structures, the arcs and map views. These have been tuned for display of documents with ranges of the types given in the sonnets, namely octave, sestet, quatrain, couplet, line, s, and phr. All other stylesheets will work equally on any documents in which the CLIX notation is used to represent structures overlapping the main hierarchy – a format that is easily generated from many common workarounds used to represent overlap in XML.

More information about LMNL, the Layered Markup and Annotation Language, is available at the LMNL wiki.

Readers who wish to see or adapt the XSLT 2.0 code that performs these conversions are invited to contact the author at wapiez (at)