|Log in / register or be a guest|
PANORAMA (PArallel NORdic Annotated Multilingual corporA) is a cross-Nordic project aimed at providing high-quality parallel text resources for all Nordic languages: Danish, Norwegian, Swedish, Finnish, Icelandic, Faroese, Sami and Greenlandic, with major EU languages like English or German as optional reference languages. As part of the initiative, the necessary annotation tools (taggers/parsers), as well as corpus formatting and searching tools, will be enhanced or - if necessary - developed from scratch (depending on the language). Both in its methods and its goals, PANORAMA has a strong focus on Human Language Technology (HLT) and strives to further the development of independent Nordic HLT resources, such as multilingual dictionaries and internet search tools, machine translation etc. All corpus data will be made freely accessible/searchable through a specially designed web interface.
PANORAMA was launched in 2007 as a 4-year cooperation between the universities of Odense (University of Southern Denmark), Oslo and Tromsø. All participating researchers are experts in corpus linguistics and contribute with expertise from earlier HLT projects. The project has been supported by the Nordic Council of Ministers with a 1-year networking grant under the NordplusSprog framework.
Administration and central programming is done by VISL staff at the ISK, University of Southern Denmark.The following institutions participate directly in the project (responsible researcher in parenthesis):
PANORAMA has organized 2 workshops in 2007 and has plans for further regular workshops to coordinate activities. The workshops are in principle open to students and interested researchers from other institutions, and PANORAMA may actively invite HLT specialists for specific languages.
PANORAMA-related tools and ressources
PANORAMA annotation standards
For the alignment of bilingual corpora, an xml-style annotation is used, with corpus information in a header section and alignment links at 3 levels: 1. sentence (link), 2. word (wlink) and 3. chunk (clink). Chunks reflect syntactic structural units exploiting constituent or dependency information. In the xml-scheme, word and chunk lines are optional.
<link id="..." xtargets="1;1">