Chapter 18. Sub-Readings

Table of Contents

Apertium Format
CG Format
Grammar Syntax
Rule Option SUB:N
Contextual Option /N

Sub-readings introduce a bit of hierarchy into readings, letting a reading have a hidden reading attached to it, which in turn may have another hidden reading, and so on. See the test/T_SubReading_Apertium and test/T_SubReading_CG tests for usage examples.

Apertium Format

The Apertium stream format supports sub-readings via the + delimiter for readings. E.g.

      ^word/aux3<tag>+aux2<tag>+aux1<tag>+main<tag>$
    

is a cohort with 1 reading which has a three level deep sub-reading. The order of which is the primary reading vs. sub-readings depends on the grammar SUBREADINGS setting:

      SUBREADINGS = RTL ; # Default, right-to-left
      SUBREADINGS = LTR ; # Alternate, left-to-right
    

In default RTL mode, the above reading has the primary reading "main" with sub-reading "aux1" with sub-reading "aux2" and finally sub-reading "aux3".

In LTR mode, the above reading has the primary reading "aux3" with sub-reading "aux2" with sub-reading "aux1" and finally sub-reading "main".

CG Format

The CG stream format supports sub-readings via indentation level. E.g.

      "<word>"
        "main" tag
          "aux1" tag
            "aux2" tag
              "aux3" tag
    

is a cohort with 1 reading which has a three level deep sub-reading. Unlike the Apertium format, the order is strictly defined by indentation and cannot be changed. The above reading has the primary reading "main" with sub-reading "aux1" with sub-reading "aux2" and finally sub-reading "aux3".

The indentation level is detected on a per-cohort basis. All whitespace counts the same for purpose of determining indentation, so 1 tab is same as 1 space is same as 1 no-break space and so on. Since it is per-cohort, it won't matter if previous cohorts has a different indentation style, so it is safe to mix cohorts from multiple sources.

Grammar Syntax

Working with sub-readings involves 2 new grammar features: Rule Option SUB:N and Contextual Option /N.

Rule Option SUB:N

Rule option SUB:N tells a rule which sub-reading it should operate on and which it should test as target. The N is an integer in the range -2^31 to 2^31. SUB:0 is the primary reading and same as not specifying SUB. Positive numbers refer to sub-readings starting from the primary and going deeper, while negative numbers start from the last sub-reading and go towards the primary. Thus, SUB:-1 always refers to the deepest sub-reading.

Given the above CG input and the rules

          ADD SUB:-1 (mark) (*) ;
          ADD SUB:1 (twain) (*) ;
        

the output will be

          "<word>"
            "main" tag
              "aux1" tag twain
                "aux2" tag
                  "aux3" tag mark
        

Note that SUB:N also determines which reading is looked at as target, so it will work for all rule types.

Contextual Option /N

Context option /N tests the N'th sub-reading of the currently active reading, where N follows the same rules as for SUB:N above. The /N must be last in the context position.

If N is * then the test will search the main reading and all sub-readings.

Given the above CG input and the rules

          ADD (mark) (*) (0/-1 ("aux3")) ; # matches 3rd sub-reading "aux3"
          ADD (twain) (*) (0/1 ("aux1")) ; # matches 1st sub-reading "aux1"
          ADD (writes) (*) (0/1 ("main")) ; # won't match as 1st sub-reading doesn't have tag "main"
        

the output will be

          "<word>"
            "main" tag mark twain
              "aux1" tag
                "aux2" tag
                  "aux3" tag