Finnish NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-fin

Page Content

Number transcriptions

Transcribing numbers to words in Finnish is not completely trivial, one reason is that numbers in Finnish are written as compounds, regardless of length: 123456 is satakaksikymmentäkolmetuhattaneljäsataaviisikymmentäkuusi. Another limitation is that inflections can be unmarked in running text, that is digit expression is assumed to agree the case of the phrase it is in, e.g. 27 is kaksikymmentäseitsemän, and 27:lle kahdellekymmenelleseitsemälle but in a phrase: “tarjosin 27 osanottajalle” 27 assumes the allative case without marking and it is preferred grammatical form in good writing.

Flag diacritics

Flag diacritics in number transcribing are used to control case agreement: in Finnish numeral compounds all words agree in case except in nominative singular where 10’s exponential multipliers are in singular partitive.

Morphotactics of digit strings

The morphotactics related to numbers and their transcriptions is that we need to know the whole digit string to know how the length of whole digit string to know what to start reading, and zeroes are not read out but have an effect to readout. The numerals are systematic and perfectly compositional: the implementation of 100 000–999 999 is almost exactly same as 100 000 000–999 000 000 and everything afterwads with the change of word tuhat~tuhatta, miljoona~miljoonaa, miljardia, biljoonaa, biljardia and so forth–that is along the long scale British (French) system where American billion = milliard etc. The numbers are built from ~single word length blocks in decreasing order with the exception of zig-zagging over numbers 11–19 where the second digit comes before first. The rest of this documentation describes the morphotactic implementation by the lexicon structure in descending order of magnitude with examples.

Lexicon HUNDREDSMRD contains numbers 2-9 that need to be followed by exactly 11 digits: 200 000 000 000–999 999 999 999 this is to implement Nsataa…miljardia…

Lexicon CUODIMRD contains numbers 2-9 that need to be followed by exactly this is to implement Nsataa…miljardia…

Lexicon HUNDREDMRD is for numbers in range: 100 000 000 000–199 000 000 000 this is to implement sata…miljardia…

Lexicon TEENSMRD is for numbers with 11 000 000 000–19 000 000 000 this is to implement …Ntoista…miljardia…

Lexicon TEENMRD is for numbers with 11 000 000 000–19 000 000 000 this is to implement …Ntoista…miljardia…

Lexicon TENSMRD is for numbers with 20 000 000 000–90 000 000 000 this is to implement …Nkymmentä…miljardia…

Lexicon TENMRD is for numbers with 10 000 000 000–10 999 999 999 this is to implement …kymmenenmiljardia…

Lexicon LÅGEVMRD is for numbers with 20 000 000 000–90 000 000 000 this is to implement …Nkymmentä…miljardia…

Lexicon ONESMRD is for numbers with 1 000 000 000–9 000 000 000 this is to implement …Nmiljardia…

Lexicon MILJARD is for numbers with 1 000 000 000–9 000 000 000 this is to implement …Nmiljardia

Lexicon OVERMILLIONS is for the millions part of numbers greater than 1 milliard

Lexicon HUNDREDSM contains numbers 2-9 that need to be followed by exactly 8 digits: 200 000 000–999 999 999 this is to implement Nsataa…miljoonaa…

Lexicon CUODIM contains numbers 2-9 that need to be followed by exactly this is to implement Nsataa…miljoonaa…

Lexicon HUNDREDM is for numbers in range: 100 000 000–199 000 000 this is to implement sata…miljoonaa…

Lexicon TEENSM is for numbers with 11 000 000–19 000 000 this is to implement …Ntoista…miljoonaa…

Lexicon TEENM is for numbers with 11 000 000–19 000 000 this is to implement …Ntoista…miljoonaa…

Lexicon TENSM is for numbers with 20 000 000–90 000 000 this is to implement …Nkymmentä…miljoonaa…

Lexicon TENM is for numbers with 10 000 000–10 999 999 this is to implement …kymmenenmiljoonaa…

Lexicon LÅGEVM is for numbers with 20 000 000–90 000 000 this is to implement …Nkymmentä…miljoonaa..

Lexicon ONESM is for numbers with 1 000 000–9 000 000 this is to implement …Nmiljoonaa…

Lexicon MILJON is for numbers with 1 000 000–9 000 000 this is to implement …Nmiljoonaa

Lexicon UNDERMILLION is for numbers with 100 000–900 000 after milliards

Lexicon OVERTHOUSANDS is for the thousands part of numbers greater than 1 million

Lexicon HUNDREDST contains numbers 2-9 that need to be followed by exactly 5 digits: 200 000–999 999 this is to implement Nsataa…tuhatta…

Lexicon CUODIT contains numbers 2-9 that need to be followed by exactly this is to implement Nsataa…tuhatta…

Lexicon HUNDREDT is for numbers in range: 100 000–199 000 this is to implement sata…tuhatta…

Lexicon TEENST is for numbers with 11 000–19 000 this is to implement …Ntoista…tuhatta…

Lexicon TEENT is for numbers with 11 000–19 000 this is to implement …Ntoista…tuhatta…

Lexicon TENST is for numbers with 20 000–90 000 this is to implement …Nkymmentä…tuhatta…

Lexicon TENT is for numbers with 10 000 000–10 999 999 this is to implement …kymmenentuhatta…

Lexicon LÅGEVT is for numbers with 20 000–90 000 this is to implement …Nkymmentä…tuhatta..

Lexicon ONEST is for numbers with 1 000–9 000 this is to implement …Ntuhatta…

Lexicon THOUSANDS is for numbers with 1 000–9 000 this is to implement …Ntuhatta

Lexicon THOUSAND is for the ones-tens-hundreds of numbers greater than thousand

Lexicon UNDERTHOUSAND is for numbers with 100–900 after thousands

Lexicon HUNDREDS contains numbers 2-9 that need to be followed by exactly 2 digits: 200–999 this is to implement Nsataa…

Lexicon CUODI contains numbers 2-9 that need to be followed by exactly this is to implement Nsataa

Lexicon HUNDRED is for numbers in range: 100–999

Lexicon TEENS is for numbers with 11–19 this is to implement …Ntoista

Lexicon TEEN is for numbers with 11–19 this is to implement …Ntoista

Lexicon TENS is for numbers with 20–90 this is to implement …Nkymmentä…

Lexicon LÅGEV is for numbers with 20–90 this is to implement …Nkymmentä

Lexicon JUSTTEN is for number 10 this is to implement …kymmenen

Lexicon ONES is for numbers with 1–9 this is to implement yksi, kaksi, kolme…, yhdeksän

Lexicon ZERO is for number 0 nolla

Lexicon LOPPU is to implement potential case inflection with a colon.


This (part of) documentation was generated from src/fst/transcriptions/transcriptor-numbers-digit2text.lexc