The VISL Cafeteria of Categories

Acknowledging the need for a common set of grammatical categories for the annotation of its multilingual teaching treebanks, the VISL group of researchers at the Institute of Language and Communication (University of Southern Denmark) has held a large number of terminological workshops over several years, and agreed upon the following set of annotation principles and grammatical labels, known as the Cafeteria Categories. Throughout the system, each VISL language and each VISL annotator has striven to make use of existing Cafeteria core categories whereever possible, even at the price of slight remaining differences in category definitons, adding subcategory extensions where necessary, rather than coining new labels from scratch.
  • Each node in a syntactic tree is annotated with both a function and a form label.
  • Optimally, only branching nodes are used, i.e.the form of the daughter in a non-branching node is raised and expressed as the mother's function.
  • Function labels have upper case key letters, form labels have lower case key letters. A complete node label in constituent grammar notation fuses form and function with a colon, e.g. S:np (subject noun phrase).
  • Subcategories are attached to function labels in lower case, and to form labels with a hyphen. The distinction between adjunct and argument can be optionally marked with a 'b' (bound) or 'f' (free) in front of the upper case function label.
  • In constituent grammar notation, if crossing branches are unwanted, discontinuous constituents (crossing branch nodes) are marked with hyphens pointing towards the constituent's other part(s), e.g. P:vp- fA -P:vp.

The core categories for clause level function are the folloing:

  • S Subject, subcategories e.g.: Ss Situative subject, Sf Formal subject
  • P Predicator or Verbal constituent (function of "small vp")
  • O Object, subcategories e.g.:
    • Od direct object, Oacc accusative object
    • Oi indirect object, Odat dative object
    • Op prepositional object
    • Ogen genitive object
  • C Predicative or complement, subcategories: Cs Subject complement, Co Object complement, fC free (subject) complement
  • A Adverbial, subcatgegories e.g.: fA Free adverbial, As Subject-bound adverbial, Ao Object-bound adverbial
  • SUB Subordinator

Form categories are divided into complex forms and word class forms. Complex forms are clauses (cl), groups (g) and paratagmata or compound units (par). Core categories are:

  • fcl Finite clause, icl Non-finite clause, acl Averbal (verb-elliptic) clause
  • np Noun phrase, adjp Adjective phrase and advp Adverb phrased, pp Prepositional phrase, vp Verb phrase
  • par Paratagma (Co-ordinated unit)

At the group level, the minimal annotation is dependency based, with one H (head) and one or more D (dependent) constituents. Dependents can optionally be subclassified as to valency:

  • Darg Argument dependent
  • Dmod Modifier dependent

Dependent function in groups is defining for group form, and can thus be subdivided accordingly:

  • DN Adnominal dependent (in np's, possibly specified as DNarg or DNmod), with subclasses like e.g.:
    • DNapp Apposition
    • DNc Predicative adnominal dependent
  • DA Adverbial dependent (in adjp's and advp's, can be DAarg or DAmod), subclass example:
    • DAcom Argument of comparator
  • DP Argument or modifier of preposition
  • DC Modifier of conjunction

The paratagma (co-ordinated unit) consists of conjuncts (CJT) and co-ordinators (CO). "Outward" function is assigned to the paratagma node as a whole, rather than to its individual conjunct constituents.

The vp ("little vp") has special constituents, rather than head & dependent, since a syntactic/dependency view and a semantic "main verb" view can't agree on what the head is:

  • Vm Main verb
  • Vaux Auxiliary
  • Vpart Verb integrated particle

Finally, word class form operates with the following cafeteria:

  • n noun
  • prop proper noun
  • v verb, with subcategories such as v-fin finite verb, v-inf infinitive, v-pcp1 present participle, v-pcp2 past participle.
  • adj adjective
  • adv adverb
  • pron pronoun, with numerous subcategories, e.g. pron-pers personal pronoun
  • prp preposition
  • art article
  • num numeral
  • conj conjunction, divided into conj-c (coordinating conjunction) and conj-s (subordinating conjunction)
  • intj interjection

The syntactic top-node receives the default function of UTT (utterance), but may be subdivided into STA statement, QUE question, COM command, EXC exclamation, PER performative.

For undefined or unclear functions, (uppercase) X is used, undefined or unclear forms are (lower case) x. These are also used to handle coordination of parts of constituents (e.g. shared subject, coordinated object-adverbial pairs), where the paratagma receives X-function, while its daughter conjuncts receive x-form, delegating specific function and form to the conjuncts' daughters.

Experimental function categories are:

  • TOP Topic
  • FOC Focus
  • VOC Vocative function
  • fAsta Sentence apposition (comment on whole sentence).