Norwegian Bokmål documentation
{
"type": "Feature",
"properties": {
"name": "Norwegian Bokmål",
"radius": 200,
"marker-color": "#ff4444",
"marker-size": "large"
},
"geometry": {
"type": "Point",
"coordinates": [8.886, 61.112]
}
}
Center location data taken from Glottolog. Area extent is local data. Both can be adjusted if wrong - file a pull request!
This page documents the work on the Norwegian Bokmål language model. It was originally made based on a wordform list (in times before lemma lists were available), and thus contains many misclassified lemmas. Here, we use it since it is flexible, partly for e-dictionaries, and partly for generating frequency lists.
The analyser cannot be used for normative purposes.
Giellatekno’s main focus is on the Saami langauges and other circumpolar minority languages. As part of our work we need to build bilingual resources. This is where Norwegian comes in. For analysing Norwegian we either use the Oslo-Bergen tagger, or we use our own resources.
The language model
Morphology and morphophonology
Our Norwegian bokmål language model was an auxiliary device made for analysing Norwegian at a time when the Oslo-Bergen FST was not freely available. It is based upon a huge wordform list, most of which has been manually converted to lemma/stem-based lexc format. The source files are documented below.
The disambiguator
Our disambiguator is based upon the Oslo-Bergen tagger disambiguator, with some adjustments. The Oslo-Bergen tagger is available for Bokmål and Nynorsk. It has an official webpage, where it is available under GPL The Giellatekno adapted version of it is documented below.
Projects
In-source documentation
Below is an autogenerated list of documentation pages built from structured comments in the source code. All pages are also concatenated and can be read as one long text here.
src/cg3/fst/morphology/affixes/- compounding.lexc (src)
- phonology.twolc (src)
- root.lexc (src)
stems/
phonetics/transcriptions/
tools/grammarcheckers/tokenisers/