What LMNL looks like: structured annotations
Brief overview of LMNL object model
Design principle: less is more (especially for now)
Developments and current status
A LMNL processing architecture
But ... we have talked about “layers”
An example limen: a document “view”
An example limen: relating discontinuous ranges
Conceptual introduction to LMNL
Short demonstration
LMNL via XML (ECLIX and CLIX)
Tennison and Piez cook up LMNL as skunkworks “pure research” project
Other occasional contributors, lenders and borrowers:
Matt Palmer, Paul Caton, Bert van Elsacker, Alex Czmiel, Gavin Thomas Nicol
(Apologies to anyone inadvertantly left off this list)
And why not TexMECS, BUVH/JITTs, or other contemporaneous efforts?
“Um, clarify something for me ... are the requirements for TexMECS the same as or different as those of other efforts to deal with overlap such as LMNL, and if they're different, how are they different, and if they're the same, then why aren't you guys collaborating?”
(Paraphrasing Jonathan Robie to Claus Huitfeldt, Extreme 2006)
Better answer:
Early on, more numerous independent efforts have advantages compared to fewer coordinated efforts
Friendly competition between unaffiliated researchers is a feature not a bug
(Allows for meetings like this one!)
Of an unknown continent on Planet Markup ...
If this region is interesting, it's reasonable to expect we're not the only ones to visit ...
Data object model supporting
Overlapping structures
Including arbitrary overlap (“self-overlap”)
Structured annotations
(richer than XML attributes)
A data object model or API, not an abstract (mathematical) model
Analogous to XML DOM, not DAG
An intelligible markup syntax
(Allowing all the advantages we see in XML's plain-text syntax)
A general solution to the “overlap problem”
Open-ended with respect to processing requirements
Not just documents conforming to predefined models, but documents in the midst of the design and modeling process
Supporting flexible, iterative document model and application design
LMNL should be discrete and easily distinguished from what it is not
Look and feel different enough to avoid confusion
This applies to both its representations (syntax) and its terminology
Since LMNL is defined as the model, not the syntax, more than one syntax is possible
LMNL processing is possible without LMNL syntax
LMNL should have a clean relation to XML
LMNL syntax (“sawtooth” syntax) is designed to work with LMNL:
[excerpt [source}The Housekeeper{] [author}Robert Frost{]} [s}[l [n}144{n]}He manages to keep the upper hand{l] [l [n}145{n]}On his own farm.{s] [s}He's boss.{s] [s}But as to hens:{l] [l [n}146{n]}We fence our flowers in and the hens range.{l]{s] {excerpt]
[r=r1}over[r=r2}lapping{r=r1]
ranges{r=r2]
to disambiguate between self-overlap and
enclosure (much like MECS)
Anonymous ranges (and annotations) are allowed:
A range can be [}marked{] without a name
And so are empty ranges, which have no width and may therefore “slide” with respect to neighbor ranges
[excerpt}
[s}[l [n}144{n]}He manages to keep the upper hand{l]
[l [n}145{n]}On his own farm.{s] [s}He's boss.{s] [s}But as to hens:{l]
[l [n}146{n]}We fence our flowers in and the hens range.{l]{s]
{excerpt
[source
[title}The Housekeeper{title]
[loc}lines 144-146{loc]
[source
[title}North of Boston{title]
[date}1915{date]]]
[author
[name}[given}Robert{given] [family}Frost{family]{name]
[date}1874-1963{date]] ]
Annotations
A LMNL document is based on a text layer
(A sequence of zero or more atoms, which generally correspond to Unicode characters, but which can also be represented otherwise in any notation)
Each range has
Each annotation has
A text layer, which may have ranges of its own
(Annotations are isomorphic to documents)
Remove one central assumption of XML
... and see where it takes us
Optimization is left for later
Various desirable and tempting features
(Some, such as respecting tag ordering or virtual elements, could be supported through extensions or at higher levels)
Noteworthy:
Codification of pathways to and from XML
With LMNL processing on that basis (XSLT 2.0)
What we have not done:
Generalized an abstract data model (such as GODDAG)
We expect this should (will) emerge in development as XML tree (infoset) emerged from XML
Meanwhile we benefit from the insights of other researchers
A problem: LMNL ranges don't have much “thingness”
Resulting problems relate to referring to non-character objects such as images
... and marking them up
[a [href}mypage.html{]}[img [src}myicon.jpg{]]{a]
or
[a [href}mypage.html{]][img [src}myicon.jpg{]]
... since range starts and ends are indicated only by character offsets, these are the same in the data model
One solution: support tag ordering
... But then tags (not just ranges) must be “things”
This seems a high price to pay
especially since usually tag order actually shouldn't matter in the data model —
Will we then try to model when tag order matters and when it doesn't?
(Cf. “spurious overlap”, Huitfeldt & Sperberg-McQueen)
Present solution: introduce atoms
Atoms can also be arbitrary objects (with names and annotations), represented directly in syntax
[a [href}mypage.html{]}{{img [src}myicon.jpg{]}}{]
Matt Palmer has been experimenting with parsing LMNL syntax in Python and Java
Development on this front continues
Work inspired by Steve DeRose (2004) and Syd Bauman (2005)
CLIX: “Canonical LMNL in XML”
Initially a tagging convention in XML
“Trojan milestones” (suggested at OSIS by Troy Griffiths)
Refined and dubbed HORSE by Bauman
But adopted (and differently refined) as CLIX by us
CLIX is LMNL represented in flattened XML with milestones
All text and tagging is directly contained in a CLIX document element
Since XML is otherwise flat, structured annotations can be represented in XML element structures
ECLIX (extended CLIX) is a convention for allowing fully-structured XML to represent LMNL
Just use LMNL-namespaced attributes to indicate milestone elements
Given formal specifications, any XML can convert to ECLIX using fairly simple XSLT
... And an off-the-shelf stylesheet can flatten ECLIX into CLIX
Hence any XML with a consistent convention for representing overlap can map into LMNL
References:
Implemented so far:
All:
XML induction
pick
parameter lists ranges to be
selected into a containment hierarchy
For example:
s
and phr
)
drop
parameter lists ranges to be
excluded
Sonnets
We have listed a formal terminology of range relations
http://www.lmnl.org/wiki/index.php/Range_relationships
Hierarchies can be inferred from range relations, as in the demo
(Implicit hierarchies can even be validated)
But LMNL as a set of ranges over text (flat LMNL) does not represent it directly as such
(Although annotations also arrange in hierarchies, they do not arrange text in the same layer)
Because applications and processing languages will need a systematic way of registering higher-level relationships between ranges ...
But limina may be owned not just by the document or an annotation, but by another limen
In this case, its content (ranges and atomic content or text) can be defined by selection of ranges in its owner
Because limina can be derived from limina, LMNL sneaks in dominance relations “through the back door” (subliminally?), and leaves an application to identity hierarchies (whether sacred or profane)
It may prove useful to declare them externally
[excerpt [source}The Housekeeper{] [author}Robert Frost{]} [s}[l [n}144{n]}He manages to keep the upper hand{l] [l [n}145{n]}On his own farm.{s] [s}He's boss.{s] [s}But as to hens:{l] [l [n}146{n]}We fence our flowers in and the hens range.{l]{s] {excerpt]
Define a limen whose owner is the document. Select the
excerpt
and l
ranges. This limen maps to a
clean hierarchy.
The same can be done with any set of ranges that do not overlap (starts or ends). Enclosure implies dominance in the resulting tree.
song
and stanza
limina
[p}The Hatter shook his head mournfully.
[q [sp}Hatter{]}Not I!{q] he replied. [q [cont}Hatter{]}We quarrelled
last March--just before HE went mad, you know--{q] (pointing with his
tea spoon at the March Hare,) [q [cont}Hatter{]}-- it was at the great
concert given by the Queen of Hearts, and I had to sing{p] [song}
[lg [n}1{]}
[l}Twinkle, twinkle, little bat!{l]
[l}How I wonder what you're at!{l]{lg]
[p}You know the song, perhaps?{p]{q]
[p}[q [sp}Alice{]}I've heard something like it,{q] said Alice.{p]
[p}[q [sp}Hatter{]}It goes on, you know,{q] the Hatter continued,
[q [cont}Hatter{]}in this way: --{p]
[lg [n}1{]}
[l}Up above the world you fly,{l]
[l}Like a tea-tray in the sky.{l]
[l}Twinkle, twinkle --{l]{lg]{song]{q]
song
limina could select all the song
ranges from the document (one song
range per limen).
stanza
limina could select the lg
ranges
with the same n
annotation within the song
limina, leaving other ranges (and cosmetic whitespace) behind.
We could then retrieve /%song/%stanza
(limina) for
stanzas and /%song/%stanza/enclosed::l
(ranges) for lines
appearing within stanzas.
quote
limina
[p}The Hatter shook his head mournfully. [q [sp}Hatter{]}Not I!{q] he replied. [q [cont}Hatter{]}We quarrelled last March--just before HE went mad, you know--{q] (pointing with his tea spoon at the March Hare,) [q [cont}Hatter{]}-- it was at the great concert given by the Queen of Hearts, and I had to sing{p] [song} [lg [n}1{]} [l}Twinkle, twinkle, little bat!{l] [l}How I wonder what you're at!{l]{lg] [p}You know the song, perhaps?{p]{q] [p}[q [sp}Alice{]}I've heard something like it,{q] said Alice.{p] [p}[q [sp}Hatter{]}It goes on, you know,{q] the Hatter continued, [q [cont}Hatter{]}in this way: --{p] [lg [n}1{]} [l}Up above the world you fly,{l] [l}Like a tea-tray in the sky.{l] [l}Twinkle, twinkle --{l]{lg]{song]{q]
quote
limina could select each q
with a
sp
annotation along with any following q
with cont
annotations equaling the sp
on the
first, up to the next q
with that sp
(and
ignoring other ranges over the same text).
We could then retrieve /%quote
(limina) for quotes and
/%quote/enclosed::q/@sp
(annotations) for their
speakers.
Note: these semantics are only implicit in flat LMNL, and will require some sort of apparatus (syntax or declarations) to express.
It is important enough to get right
Is related to querying and transformation
Usefulness:
Some fairly nice things are possible
Possibly surprisingly, even on flat LMNL (no limina, no explicit hierarchy)
Implementation of LMNL concepts is mostly not difficult even in XML
SVG illustrations are really easy
XSLT 2.0 grouping methods a “crowbar” for dealing with overlap in XML
(As demonstrated in generalized XML induction code)
One sticky area is mapping structured annotations (and their annotations) into XML
But: these demonstrations are on toy data sets