Norwegian Bokmål documentation

Lemma count

This page documents the work on the Norwegian Bokmål language model. It was originally made based on a wordform list (in times before lemma lists were available), and thus contains many misclassified lemmas. Here, we use it since it is flexible, partly for e-dictionaries, and partly for generating frequency lists.

The analyser cannot be used for normative purposes.

Giellatekno’s main focus is on the Saami langauges and other circumpolar minority languages. As part of our work we need to build bilingual resources. This is where Norwegian comes in. For analysing Norwegian we either use the Oslo-Bergen tagger, or we use our own resources.

The language model

Morphology and morphophonology

Our Norwegian bokmål language model was an auxiliary device made for analysing Norwegian at a time when the Oslo-Bergen FST was not freely available. It is based upon a huge wordform list, most of which has been manually converted to lemma/stem-based lexc format. The source files are documented below.

The disambiguator

Our disambiguator is based upon the Oslo-Bergen tagger disambiguator, with some adjustments. The Oslo-Bergen tagger is available for Bokmål and Nynorsk. It has an official webpage, where it is available under GPL The Giellatekno adapted version of it is documented below.

Projects

The Norwegian Bokmål grammar checker project

In-source documentation

Below is an autogenerated list of documentation pages built from structured comments in the source code. All pages are also concatenated and can be read as one long text here.

src/
- cg3/
  - disambiguator.cg3 (src)
  - functions.cg3 (src)
  - nob-functions.cg3 (src)
- fst/
  - morphology/
    - affixes/
      - abbreviations.lexc (src)
      - adjectives.lexc (src)
      - nouns.lexc (src)
      - numerals.lexc (src)
      - propernouns.lexc (src)
      - symbols.lexc (src)
      - verbs.lexc (src)
    - compounding.lexc (src)
    - phonology.twolc (src)
    - root.lexc (src)
    - stems/
      - adjectives.lexc (src)
      - adverbs.lexc (src)
      - conjunctions.lexc (src)
      - interjections.lexc (src)
      - nob-abbreviations.lexc (src)
      - nob-propernouns.lexc (src)
      - nouns.lexc (src)
      - numerals.lexc (src)
      - nynorsk-stems.lexc (src)
      - prepositions.lexc (src)
      - pronouns.lexc (src)
      - subjunctions.lexc (src)
      - verbs.lexc (src)
  - phonetics/
    - txt2ipa.xfscript (src)
  - transcriptions/
    - transcriptor-abbrevs2text.lexc (src)
    - transcriptor-numbers-digit2text.lexc (src)
tools/
- grammarcheckers/
  - grammarchecker.cg3 (src)
  - grc-disambiguator.cg3 (src)
- tokenisers/

Norwegian Bokmål NLP Grammar