Finite state and Constraint Grammar based analysers, proofing tools and other resources
This page documents the work on the Norwegian Bokmål language model. It was originally made based on a wordform list (in times before lemma lists were available), and thus contains many misclassified lemmas. Here, we use it since it is flexible, partly for e-dictionaries, and partly for generating frequency lists.
The analyser cannot be used for normative purposes.
Giellatekno’s main focus is on the Saami langauges and other circumpolar minority languages. As part of our work we need to build bilingual resources. This is where Norwegian comes in. For analysing Norwegian we either use the Oslo-Bergen tagger, or we use our own resources.
Our Norwegian bokmål language model was an auxiliary device made for analysing Norwegian at a time when the Oslo-Bergen FST was not freely available. It is based upon a huge wordform list, most of which has been manually converted to lemma/stem-based lexc format. The source files are documented below.
Our disambiguator is based upon the Oslo-Bergen tagger disambiguator, with some adjustments. The Oslo-Bergen tagger is available for Bokmål and Nynorsk. It has an official webpage, where it is available under GPL The Giellatekno adapted version of it is documented below.
Below is an autogenerated list of documentation pages built from structured comments in the source code. All pages are also concatenated and can be read as one long text here.
src/
cg3/
fst/
morphology/
affixes/
stems/
phonetics/
transcriptions/
tools/
grammarcheckers/
tokenisers/