GrammarSoft ApS

World of VISL -> Named Entity Recognition  Visual Interactive Syntax Learning  
 
  person names organisation names place names event names art work titles person names organisation names place names event names art work titles brand names & others
Name Types

Named Entity Recognition

Named Entity Recognition (NER) is an essential part human language technology, useful for a variety of applications, like data mining, summarization, question-answering systems, anonymization of medical journals etc. NER can be divided into two sub-tasks: (a) chunking, i.e. the recognition of which words or multi-word strings constitute names, (b) semantic classification, i.e. name types.

The VISL approach to NER, developed by Eckhard Bick for Danish and Portuguese, is a distributed hybrid method, involving on the one hand traditional techniques like pattern matching, gazeteering and lexicography, and on the other hand a grammatical approach, where context-sensitive CG-rules are used to classify names, based, for instance, on syntactic function, verbal selection restrictions, noun-phrase feature inheritance, coordination, apposition structure etc. The system recognizes about 20 name types, which fall into 6 major categories: (1) people (2) organisations, (3) places, (4) events, (5) art work titles and (6) others, like brands or vehicles. These classe can be defined as feature bundles (cp. table below), and thus be disambiguated also by simply discarding or selecting semantic atomic features, like +LOC or +HUM. Currently, both NER-parsers achieve around 93% correct readings, with 2% chunking errors and 5% subtype classification errors.

For Danish, VISL's NER-system has participated in the Nordic Nomen Nescio research network, funded by the Nordic Council of Ministers. The following is a short list of relevant publications:

  • Bick, Eckhard (2003-1), Named Entity Recognition for Danish, In: Nordisk Sprogteknologi (Ã…rbog 2002). p331-349, , Museum Tusculanum, Copenhaguen University
  • Bick, Eckhard (2003-4), "Multi-Level NER in a CG framework", in Proceedings of NoDaLiDa2003, 30-31. May 2003, Reykjavik, forthcoming
  • Bick, Eckhard (2003-5), "Multi-Level NER for Portuguese in a CG Framework", in: Proceedings of PROPOR2003, Faro, Springer



<vq> COGN siger, tilbyder +LOC (place) være dér ved/i X <cc> (concrete movable object)
bring X
made, built, invented (HUM-cause) +TIME X vare, begynde, slutte
siden X
+LIFE +MOVE
<hum> + (1) - - - - + +
<top> - + - - - - -
<inst><civ> + + - built - - -
<org><media> <party> + (group) - - constituted - metaph. metaph.
<tit><media> + - metaph. authored - - -
<genre> + - - taught - - -
<brand><mat> - - + produced - - -
<V> (<v>) - - + produced - - +
<A> (<a>) metaph. - + - - + +
<B> (<b>) - (-) + - - + -
<astro> - + - -? - - +
<occ> - metaph. - (held) + - -

 


In order to continue using the Java applets, see troubleshooting tips and Download Java.
On Windows use Internet Explorer 11. macOS no longer supports Java applets.
The Chrome extension CheerpJ Applet Runner may work for some use-cases.


Copyright 1996-2024 | Report a Problem / Contact Us | Printable Version