References and credits
VISL's hand-tagged teaching corpus is an ongoing joint effort of the Danish VISL team, supervised first by Bjarne le Fevre Jakobsen (1997-2000) and later by Eckhard Bick (2000 -?), who is also responsible for the automatic annotation and revision of running text corpora, in particular the Corpus 90/2000 and Arboretum treebank projects, with text material provided by DSL.
The automatic ("free-text") Danish NLP system is based on multi-level Constraint Grammar disambiguation and is being developed by Eckhard Bick on the basis of a similar project for Portuguese. The morphological analyzer used (Danmorf) was developed on the basis of the Source Language part of his 1986 Danish-Esperanto Machine Translation system.
The system's lexicon has been enhanced and corrected with the help of large lexical data bases kindly provided by DSL and Mikro Værkstedet, and with additional valency and semantic information compiled by Anders Hougaard and Lone Hegelund.
The PSG-grammar used for generating syntactic tree structures was written by Eckhard Bick and uses Martin Carlsen's cg2tree compiler. The Danish Constraint Grammar currently consists of some 7,000 contextual rules for morphological and syntactic disambiguation, and provides full parses of running Danish text. Add-on Constraint Grammars handle, among other things, case roles (Søren Harder and named entity recognition (Eckhard Bick).
For an introduction to Constraint Grammar theory, see "Fred Karlsson et.al., Constraint Grammar: A language-independent system for parsing unrestricted text, Berlin 1995". The present version of the system runs with both the CG-2 rule compiler developed and licensed by Pasi Tapanainen, and the VISL-CG compiler written by Martin Carlsen.