Chapter 14. Making use of Probabilistic / Statistic Input

If your input contains confidence values or similar, you can make use of those in the grammar. See numeric tags for the specific feature.

For example, given the input sentence "Bear left at zoo." a statistical tagger may assign confidence and frequency values to the readings:

      "<Bear>"
        "bear" N NOM SG <Noun:784> <Conf:80> @SUBJ
        "bear" V INF <Verb:140> <Conf:20> @IMV @#ICL-AUX<
      "<left>"
        "leave" PED @IMV @#ICL-N<
        "leave" V PAST VFIN @FMV
      "<at>"
        "at" PRP @ADVL
      "<zoo>"
        "zoo" N NOM SG @P<
      "<$.>"
    

which you could query with e.g.

      # Remove any reading with Confidence below 5%
      REMOVE (<Conf<5>) ;
      # Select N NOM SG if Confidence is above 60%
      SELECT (N NOM SG <Conf>60>) ;
      # Remove the Verb reading if the frequency is under 150
      # and Noun's frequency is above 700
      REMOVE (<Verb<150>) (0 (<Noun>700>)) ;
    

These are just examples of what numeric tags could be used for. There is no reason Confidence values are in % and there is no requirement that they must add up to 100%. The only requirement of a numerical tag is an alphanumeric identifier and a double-precision floating point value that fits in the range -281474976710656.0 to +281474976710655.0.