Case Roles in the VISL-project

an introduction by Søren Harder

What are Case Roles

The term 'case roles' covers a layer in linguistic analysis, that has been known by many other names as well: deep case, theta-role, thematic role or relation, deep grammatical function, transitivity role, valency role a.o.

The idea is to extend syntactic analysis beyond surface case (nominative, accusative etc.) and surface function (subject, object etc.) into the semantic domain in order to capture the roles of participants in situations, independent of the linguistic descriptions of these situations. Take a look at these three sentences:

  • She gave him a gift
  • He received a gift from her
  • The gift was given to her by him

The sentences have three different subjects even though they describe the exact same situation containing three participants, each playing the same role in the different sentences: 'she' is an agent, 'he' is a benefactor and 'a gift' is a patient, to use the names of three common case roles.

The purpose of case roles is thus to describe the individual contribution of the actual participants in 'real world' situations, instead of the grammatical function of syntactic constituent in linguistic representations of these situations.

Theories of Case Roles

The answer to how many and which caseroles to work with varies extensively from theory to theory, very much depending on what the theory tries to explain. Some approaches has separate case-roles for each verb. HPSG (Pollard & Sag 1987) for example has an ADMIRER and an ADMIREE for 'admire' and a GIVER, a GIVEN and a RECEIVER for 'give'. Others have a set of case-roles for each verb type: Dixon (1991) works with about 50 verb types (plus subtypes), e.g. DECIDING, LIKING, DARING, each of which has one to about five case roles that are distinct to that verb type and Systemic Functional Grammar (Halliday 1985) works with 14 'participants' (case roles) divided over 5 'process types' (verb types).

Some theorists works with a minimal case system. Jakobsen (1996, p. 36) argues for three highly abstract roles (for valency-governed arguments) on the grounds that the valency of the verb never extends 3 and the only purpose of case roles (she argues) is to show the opposition between them.

'Classic' case grammar (Fillmore 1968, Fillmore 1987) has a set of 5-10 cases, but usually doesn't make any claim, that these are enough to cover all phenomena. When complete coverage is aimed after, the number rises. Bonnie Dorr's machine translation project at University of Maryland (Habash and Dorr 2001) has 31 case-roles and the UNL (UNL 2001) semantic specification language has a set of 41 'binary relations'. Some of these are not case roles as such, but conjunctions and clausal subordinators.

I do not believe in an autonomous level of case analysis consisting of a 'natural' set of case-roles. The role set that any role system presents is always an abstraction and grouping of the 'particular roles': one role for each argument for each use of each verb as in HPSG. The different sets of case roles described above, are at different degrees of abstraction and very often (but not always!) one theory just makes an extra distinction giving two cases, where another only has one.

Case roles and Constraint Grammar

My project is to add a case role-analysis to the syntactic and morphological Constraint Grammar analysis of unlimited text, using the Constraint Grammar formalism. Constraint Grammar is a 'shallow formalism', that does not presuppose complex syntactic or semantic structure, and it is not very well suited for representing structural or semantic dependencies. This also means that Constraint Grammar does not define a 'proper level of analysis'; the analysis can be be as shallow or as deep as the grammarian, who writes the grammar, pleases. Case relations, on the other hand, are truly semantic, possibly bordering to pragmatic, relations and so they, in principle, demand a 'complete' analysis, or a semantic interpretation.

As this case tagger has to work for unlimited text, and to avoid the work involved in constructing a lexicon for all Danish verbs, it does not depend on a listing of the verbs case-frames, but works on general principles, such as the syntactic valency of the verbs, and the syntactic function and semantic tags of the arguments (see list of semantic tags).

My principles for ascribing case roles.

Cases are assigned in demodalized predications. That is, sentences without their modal verbs (may, will, can etc.), adverbials (possibly, fortunately etc.) and embedding clauses ('X believed that, ..', 'wished for, ..' etc.).

I aim for maximal concreteness and minimal metaphoricity. This is to say, I want to ascribe cases dependent on the situation, independent of linguistic construal, and not on the linguistic description of it. Ideally, agents are conscious causal initiators, locations are actual physical locations, patients are physical objects acted upon. But this principle runs into problems on three levels:

  1. Ontological: Most sentences in normal text/speech do not describe physical going-ons, where people act on objects, but relations between concretes or abstracts, or cognitive/social processes or states. What are the participants of these "abstract going-ons" and what set of case roles can describe them?
  2. Linguistical. These abstract going-ons are linguistically and cognitively modelled on the concrete, as has been showed in cognitive linguistics (e.g. Lakoff 1987), and thus the dividing line between the abstract and the concrete is extremely blurry. In 'He gave the pope his hand', which is a physical and concrete going-on, it is doubtful whether any gift is involved.
  3. Algorithmical. As hard it may be for the linguist to ascribe the correct case roles, it is infinitely harder to describe a surface-oriented algorithm to do it, especially because it is exactly the same constructions that are used in concrete and 'metaphorical' language.

Therefore, while I aim for concreteness, I still accept a certain degree of metaphoricity, in order to get a relatively simple system that yields a description that is, on the whole, intuitively consistent, and helpful in the next phase of this project: to make a Danish-English machine translation system and other applications that it may be used in.

The case set

This is work in progress, and the set of cases and principles of case assignment will change as I write the CG for case.

I have started with the classic set of roles: agent, patient, instrument, beneficiary and a set of local ('topological') and temporal roles: Source, Goal and (static) Place. These 'adverbial' roles, TOP-PL, TOP-GL, TOP-SRC, TEMP-PL, TEMP-GL, TEMP-SRC, are concrete denoting topological and temporal points and paths. I also have two roles for complements, QUAL-RES and QUAL-STATE and the role of comitative for co-agents and the like. In addition to these I have a set of roles, that were introduced to be used for clausal constituent, but also are applicable to nominal constituents: mode, finality, condition, concession, cause and effect.

To this set I have added roles when I felt that a group of particular uses were sufficiently big and sufficiently distant from the prototypical use of existing roles. Experiencers, for example, are very common and rather far from the prototypical agent. The same is the case of resultatives in regard to patients.

Some case roles may later be deleted, but have been postulated, because it is easier later to combine cases than to differentiate them.

The set of cases I use and the principles for assigning each, with examples, can be seen here. Also see this (unfinished) description of the early phases of my implementation of case grammar in constraint grammar: HTML or PostScript (The source .tex-doc is optimised for PostScript, so this gives the best rendition.)

Bibliography:

Dixon (1991):
A New Approach to English Grammar on Semantic Principles, Clarendon
Fillmore (1968):
"The Case for Case" in Bach and Harms: Universals in Linguistic Theory
Fillmore (1987)
Fillmore's Case Grammar: A Reader, Julius Groos Verlag. This contain a reprint of all Fillmore's earlier work on case, including Fillmore (1968)
Habash and Dorr (2001)
"Large Scale Language Independent Generation Using Thematic Hierarchies", in Proc. of MT Summit VIII, Spain
Halliday (1985):
An Introduction to Functional Grammar, 2. ed., Arnold
Jakobsen (1992):
"Semantisk roller i tysk - og mine anskuelser om roller" i Lone Schack Rasmussen (ed.) Semantiske Roller, Odense Working Papers in Language and Communication 10
Lakoff (1987):
Women, Fire and Dangerous Things, Chicago Univ. Press
Pollard & Sag (1987):
Information-based Syntax and Semantics: Vol. 1, CSLI
UNL (2001):
The Universal Networking Language (UNL) Specifications v3.0, http://www.unl.ias.unu.edu/unlsys/unl/UNL Specifications.htm