Chapter 10. Contextual Tests

Table of Contents

Position Element Order
1, 2, 3, etc
@
C
NOT
NEGATE
Scanning
BARRIER
CBARRIER
Spanning Window Boundaries
Span Both
Span Left
Span Right
X Marks the Spot
Set Mark
Jump to Mark
Attach To / Affect Instead
Merge With
Jump to Cohort
Test Deleted/Delayed Readings
Look at Deleted Readings
Look at Delayed Readings
Look at Ignored Readings
Scanning Past Point of Origin
--no-pass-origin, -o
No Pass Origin
Pass Origin
Nearest Neighbor
Active/Inactive Readings
Bag of Tags
Optional Frequencies
Dependencies
Relations

Position Element Order

CG-3 is not very strict with how a contextual position looks. Elements such as ** and C can be anywhere before or after the offset number, and even the - for negative offsets does not have to be near the number.

Examples of valid positions:

        *-1
        1**-
        W*2C
        0**>
        cC
        CsW
      

1, 2, 3, etc

A number indicates the topological offset from the rule target or linked cohort that the test should start at. For scanning tests, the sign of the number also influences which direction to scan. Positive numbers both start right and scan towards the right, and negative both start left and scan towards the left.

      # Test whether the cohort to the immediate right matches the set N
      (1 N)

      # Test whether the cohort two to the left, or any other cohorts leftwards of that, matches the set V
      (-2* V)
    

If the offset exceeds the window boundaries, the match proceeds as-if it was against a null cohort. For normal tests this means they automatically fail, and NOT and NEGATE tests will succeed.

@

'@' is the topological absolute position in the window. It is often used to start a test at the beginning of the sentence without having to scan for (-1* >>>) to find it. Positive numbers are offsets from the start of the window, and negative numbers are from the end of the window. Thus '@1' is the first cohort and '@-1' is the last cohort.

Special cases combined with window spanning: '@1<' is the first word of the previous window, '@-1<' is the last word of the previous window, '@1>' is the first word of the next window, and '@-1>' is the last word of the next window.

      # Test whether the first cohort in the window matches the set N
      (@1 N)

      # Test whether the last cohort in the window is a full stop
      (@-1 ("."))
    

C

'C' causes the test to be performed in a careful manner, this is it requires that all readings match the set. Normally a test such as (1 N) tests whether any reading matches the set N, but (1C N) requires that all readings match N.

NOT

NOT will invert the result of the immediately following test.

      # The following cohort, if it exists, must not match the set N
      (NOT 1 N)
    

NEGATE

NEGATE is similar to, yet not the same as, NOT. Where NOT will invert the result of only the immediately following test, NEGATE will invert the result of the entire LINK'ed chain that follows. NEGATE is thus usually used at the beginning of a test. VISLCG emulated the NEGATE functionality for tests that started with NOT.

Scanning

'*' and '**' start at the given offset and then continues looking in that direction until they find a match or reach the window boundary. '*' stops at the first cohort that matches the set, regardless of whether that cohort satisfies any linked conditions. '**' searches until it finds a cohort that matches both the set and any linked conditions.

      # Find an adjective to the left and check whether that adjective has a noun to the immediate right. Stop at the first adjective.
      (-1* ADJ LINK N)

      # Find an adjective to the left and check whether that adjective has a noun to the immediate right. If it does not have a noun to the immediate right, scan further left and test next adjective position. Repeat.
      (**-1 ADJ LINK N)
    

BARRIER

Halts a scan if it sees a cohort matching the set. Especially important to prevent '**' scans from looking too far.

      # Find an N to the right, but don't look beyond any V
      (*1 N BARRIER V)
    

CBARRIER

Like BARRIER but performs the test in Careful mode, meaning it only blocks if all readings in the cohort matches. This makes it less strict than BARRIER.

      (**1 SetG CBARRIER (Verb))
    

Spanning Window Boundaries

These options allows a test to find matching cohorts in any window currently in the buffer. The buffer size can be adjusted with the --num-windows cmdline flag and defaults to 2. Meaning, 2 windows on either side of the current one is preserved, so a total of 5 would be in the buffer at any time.

Span Both

Allowing a test to span beyond boundaries in either direction is denoted by 'W'. One could also allow all tests to behave in this manner with the --always-span cmdline flag.

        (-1*W (someset))
      

Span Left

'<' allows a test to span beyond boundaries in left direction only. Use 'W' instead, unless it is a complex test or other good reason to require '<'.

        (-3**< (someset))
      

Span Right

'>' allows a test to span beyond boundaries in right direction only. Use 'W' instead, unless it is a complex test or other good reason to require '>'.

        (2*> (someset))
      

X Marks the Spot

By default, linked tests continue from the immediately preceding test. These options affect behavior.

      # Look right for (third), then from there look right for (seventh),
      # then jump back to (target) and from there look right for (fifth)
      SELECT (target) (1* (third) LINK 1* (seventh) LINK 1*x (fifth)) ;

      # Look right for (fourth), then from there look right for (seventh) and set that as the mark,
      # then from there look left for (fifth), then jump back to (seventh) and from there look left for (sixth)
      SELECT (target) (1* (fourth) LINK 1*X (seventh) LINK -1* (fifth) LINK -1*x (sixth)) ;
    

Set Mark

'X' sets the mark to the currently active cohort of the test's target. If no test sets X then the mark defaults to the cohort from the rule's target. See also magic set _MARK_.

Jump to Mark

'x' jumps back to the previously set mark (or the rule's target if no mark is set), then proceeds from there.

Attach To / Affect Instead

'A' sets the cohort to be attached or related against to the currently active cohort of the test's target. See also magic set _ATTACHTO_.

As of version 0.9.9.11032, 'A' can be used for almost all rules to change the cohort to be affected, instead of the target of the rule.

Merge With

'w' sets the cohort to be merged with to the currently active cohort of the test's target. Currently only used with MergeCohorts. It can also be used in contextual tests for With to specify which cohort should be accessible in the subrule context.

Jump to Cohort

'j' jumps to a particular cohort: 'jM' for _MARK_ (identical to 'x'), 'jA' for _ATTACHTO_, 'jT' for _TARGET_, and 'jC1'-'jC9' for _C1_-_C9_.

Test Deleted/Delayed Readings

By default, removed reading are not visible to tests. These options allow tests to look at deleted and delayed readings.

Look at Deleted Readings

'D' allows the current test (and any barrier) to look at readings that have previously been deleted by SELECT/REMOVE/IFF. Delayed readings are not part of 'D', but can be combined as 'Dd' to look at both.

Look at Delayed Readings

'd' allows the current test (and any barrier) to look at readings that have previously been deleted by SELECT/REMOVE/IFF DELAYED. Deleted readings are not part of 'd', but can be combined as 'Dd' to look at both.

Look at Ignored Readings

'I' allows the current test (and any barrier) to look at readings that have previously been ignored by REMOVE IGNORED. Can be combined with 'D' and 'd'.

Scanning Past Point of Origin

By default, linked scanning tests are allowed to scan past the point of origin. These options affect behavior.

--no-pass-origin, -o

The --no-pass-origin (or -o in short form) changes the default mode to not allow passing the origin, and defines the origin as the target of the currently active rule. This is equivalent to adding 'O' to the first test of each contextual test of all rules.

No Pass Origin

'O' sets the point of origin to the parent of the contextual test, and disallows itself and all linked tests to pass this point of origin. The reason it sets it to the parent is that otherwise there is no way to mark the rule's target as the desired origin.

        # Will not pass the (origin) cohort when looking for (right):
        SELECT (origin) IF (-1*O (left) LINK 1* (right)) ;

        # Will not pass the (origin) cohort when looking for (left):
        SELECT (something) IF (-1* (origin) LINK 1*O (right) LINK -1* (left)) ;
      

Pass Origin

'o' allows the contextual test and all its linked tests to pass the point of origin, even in --no-pass-origin mode. Used to counter 'O'.

        # Will pass the (origin) cohort when looking for (right), but not when looking for (middle):
        SELECT (origin) IF (-1*O (left) LINK 1* (middle) LINK 1*o (right)) ;
      

Nearest Neighbor

Usually a '*' or '**' test scans in only one direction, denoted by the value of the offset; if offset is positive it will scan rightwards, and if negative leftwards. In CG-3 the magic offset '0' will scan in both directions; first one to the left, then one to the right, then two to the left, then two to the right, etc. This makes it easy to find the nearest neighbor that matches. In earlier versions of CG this could be approximated with two seperate rules, and you had to scan entirely in one direction, then come back and do the other direction.

Caveat: (NOT 0* V) will probably not work as you expect; it will be true if either direction doesn't find set V. What you want instead is (NEGATE 0* V) or split into (NOT -1* V) (NOT 1* V).

      (0* (someset))
      (0**W (otherset))
    

Active/Inactive Readings

Position modifier 'T' makes the test only consider the currently active reading.

Position modifier 't' makes the test only consider readings other than the currently active one. This can be combined with 'C' to mean all other readings.

These currently only make sense for contexts that look at the target cohort, such as position 0.

Bag of Tags

Position modifier 'B' causes the test to look in a bag of tags, which is like a bag of words but with all tags. 'B' alone looks for tags in the current window, but can be combined with the window spanning modifiers to also test the windows before and/or after. Linking from a 'B' test behaves as if you linked from offset 0.

The bag is greedy and lazy. It will hold all tags added to the window at any time, but will not forget tags as they are removed from the window through various means.

Optional Frequencies

Position modifer 'f' creates two branches based on the current test. In the first, the test remains exactly as is written. In the second, all numeric tags are removed and modifier 'C' is added. This is equivalent to making an inline template with OR'ed tests.

E.g., test (-1*f (N <W>50>)) is equivalent to (-1* (N <W>50>)) OR (-1*C (N)). This is all done at compile time. The numeric tag removal will dig down through the whole target set and create new sets along the way as needed.

Dependencies

CG-3 also introduces the p, c, cc, and s positions. See the section about those in the Dependencies chapter.

Relations

CG-3 also introduces the r:rel and r:* positions. See the section about those in the Relations chapter.