Corpus Search

Our corpus server (overview) currently has corpora available for the following languages.

The Danish and most Portuguese and Esperanto corpora, as well as the Europarl corpora for all languages can be accessed without a password. Access to the other corpora is currently limited by password to people and projects affiliated with the Institute of Language and Communication at SDU - Odense University.

The VISL project leader, Eckhard Bick, has developed search engines for these corpora which recognize regular expressions and supply search results in the form of concordances, with search hits highlighted in boldface. For those who may be unfamiliar with regular expressions og VISL's grammatical annotation system, the Corpus Search pages offer a brief on-site user manuals, while in-depth definitions and examples of grammatical categories and tags is profided in the info-folders in the relevant language-section at the main VISL site. Further information on regular expressions can be found in the following publication, A Gentle Introduction to Regular Expressions, (pdf-format) by VISL project members, John Dienhart and Henrik Kasch.

On the corpus overview page, rectangular flag links indicate (old) interfaces based on the use of regular expressions (reg.ex.), while round flag buttons indicate (new) menu-based cqp-interfaces, which have been developed with "non-computational" users in mind. Tree flags indicate treebank corpora, allowing strictured constituent searches.

Information about a wide range of additional corpora and on-line search engines can be found by visiting the corpus index developed by Jens Ahlmann Hansen.