Finite state and Constraint Grammar based analysers, proofing tools and other resources
Transcribing numbers to words in Finnish is not completely trivial, one reason is that numbers in Finnish are written as compounds, regardless of length: 123456 is satakaksikymmentäkolmetuhattaneljäsataaviisikymmentäkuusi. Another limitation is that inflections can be unmarked in running text, that is digit expression is assumed to agree the case of the phrase it is in, e.g. 27 is kaksikymmentäseitsemän, and 27:lle kahdellekymmenelleseitsemälle but in a phrase: “tarjosin 27 osanottajalle” 27 assumes the allative case without marking and it is preferred grammatical form in good writing.
Flag diacritics in number transcribing are used to control case agreement: in Finnish numeral compounds all words agree in case except in nominative singular where 10’s exponential multipliers are in singular partitive.
@U.CASE.SGNOM@ for singular nominative agreement@U.CASE.SGALL@ for singular allative agreementThe morphotactics related to numbers and their transcriptions is that we need to know the whole digit string to know how the length of whole digit string to know what to start reading, and zeroes are not read out but have an effect to readout. The numerals are systematic and perfectly compositional: the implementation of 100 000–999 999 is almost exactly same as 100 000 000–999 000 000 and everything afterwads with the change of word tuhat~tuhatta, miljoona~miljoonaa, miljardia, biljoonaa, biljardia and so forth–that is along the long scale British (French) system where American billion = milliard etc. The numbers are built from ~single word length blocks in decreasing order with the exception of zig-zagging over numbers 11–19 where the second digit comes before first. The rest of this documentation describes the morphotactic implementation by the lexicon structure in descending order of magnitude with examples.
yksikaksikymmentäyksikolmesataakaksikymmentäyksineljätuhattakolmesataakaksikymmentäyksiviisikymmentäneljätuhattakolmesataakaksikymmentäyksikuusisataaviisikymmentäneljätuhattakolmesataakaksikymmentäyksiseitsemänmiljoonaakuusisataaviisikymmentäneljätuhattakolmesataakaksikymmentäyksiLexicon HUNDREDSMRD contains numbers 2-9 that need to be followed by exactly
11 digits: 200 000 000 000–999 999 999 999
this is to implement Nsataa…miljardia…
Lexicon CUODIMRD contains numbers 2-9 that need to be followed by exactly
this is to implement Nsataa…miljardia…
kaksisataamiljardiaLexicon HUNDREDMRD is for numbers in range: 100 000 000 000–199 000 000 000
this is to implement sata…miljardia…
satamiljardiaLexicon TEENSMRD is for numbers with 11 000 000 000–19 000 000 000
this is to implement …Ntoista…miljardia…
Lexicon TEENMRD is for numbers with 11 000 000 000–19 000 000 000
this is to implement …Ntoista…miljardia…
kaksitoistailjardiaLexicon TENSMRD is for numbers with 20 000 000 000–90 000 000 000
this is to implement …Nkymmentä…miljardia…
Lexicon TENMRD is for numbers with 10 000 000 000–10 999 999 999
this is to implement …kymmenenmiljardia…
kymmenenmiljardiaLexicon LÅGEVMRD is for numbers with 20 000 000 000–90 000 000 000
this is to implement …Nkymmentä…miljardia…
kaksikymmentämiljardiaLexicon ONESMRD is for numbers with 1 000 000 000–9 000 000 000
this is to implement …Nmiljardia…
Lexicon MILJARD is for numbers with 1 000 000 000–9 000 000 000
this is to implement …Nmiljardia…
kaksimiljardiaLexicon OVERMILLIONS is for the millions part of numbers greater than 1 milliard
Lexicon HUNDREDSM contains numbers 2-9 that need to be followed by exactly
8 digits: 200 000 000–999 999 999
this is to implement Nsataa…miljoonaa…
Lexicon CUODIM contains numbers 2-9 that need to be followed by exactly
this is to implement Nsataa…miljoonaa…
kaksisataamiljoonaaLexicon HUNDREDM is for numbers in range: 100 000 000–199 000 000
this is to implement sata…miljoonaa…
Lexicon TEENSM is for numbers with 11 000 000–19 000 000
this is to implement …Ntoista…miljoonaa…
Lexicon TEENM is for numbers with 11 000 000–19 000 000
this is to implement …Ntoista…miljoonaa…
kaksitoistamiljoonaaLexicon TENSM is for numbers with 20 000 000–90 000 000
this is to implement …Nkymmentä…miljoonaa…
Lexicon TENM is for numbers with 10 000 000–10 999 999
this is to implement …kymmenenmiljoonaa…
kymmenenmiljoonaaLexicon LÅGEVM is for numbers with 20 000 000–90 000 000
this is to implement …Nkymmentä…miljoonaa..
kaksikymmentämiljoonaaLexicon ONESM is for numbers with 1 000 000–9 000 000
this is to implement …Nmiljoonaa…
Lexicon MILJON is for numbers with 1 000 000–9 000 000
this is to implement …Nmiljoonaa…
kaksisataamiljoonaaLexicon UNDERMILLION is for numbers with 100 000–900 000 after milliards
Lexicon OVERTHOUSANDS is for the thousands part of numbers greater than 1 million
Lexicon HUNDREDST contains numbers 2-9 that need to be followed by exactly
5 digits: 200 000–999 999
this is to implement Nsataa…tuhatta…
Lexicon CUODIT contains numbers 2-9 that need to be followed by exactly
this is to implement Nsataa…tuhatta…
kaksisataatuhattaLexicon HUNDREDT is for numbers in range: 100 000–199 000
this is to implement sata…tuhatta…
Lexicon TEENST is for numbers with 11 000–19 000
this is to implement …Ntoista…tuhatta…
Lexicon TEENT is for numbers with 11 000–19 000
this is to implement …Ntoista…tuhatta…
kaksitoistatuhattaLexicon TENST is for numbers with 20 000–90 000
this is to implement …Nkymmentä…tuhatta…
Lexicon TENT is for numbers with 10 000 000–10 999 999
this is to implement …kymmenentuhatta…
kymmenentuhattaLexicon LÅGEVT is for numbers with 20 000–90 000
this is to implement …Nkymmentä…tuhatta..
kaksikymmentätuhattaLexicon ONEST is for numbers with 1 000–9 000
this is to implement …Ntuhatta…
Lexicon THOUSANDS is for numbers with 1 000–9 000
this is to implement …Ntuhatta…
kaksituhattakolmetuhattaneljäsataaviisikymmentäkuusiLexicon THOUSAND is for the ones-tens-hundreds of numbers greater than thousand
Lexicon UNDERTHOUSAND is for numbers with 100–900 after thousands
Lexicon HUNDREDS contains numbers 2-9 that need to be followed by exactly
2 digits: 200–999
this is to implement Nsataa…
Lexicon CUODI contains numbers 2-9 that need to be followed by exactly
this is to implement Nsataa…
kaksisataakolmesataaneljäkymmentäviisiLexicon HUNDRED is for numbers in range: 100–999
Lexicon TEENS is for numbers with 11–19
this is to implement …Ntoista
Lexicon TEEN is for numbers with 11–19
this is to implement …Ntoista
yksitoistakaksitoistakolmetoistaLexicon TENS is for numbers with 20–90
this is to implement …Nkymmentä…
Lexicon LÅGEV is for numbers with 20–90
this is to implement …Nkymmentä…
kaksikymmentäkolmekymmentäneljäLexicon JUSTTEN is for number 10
this is to implement …kymmenen
kymmenenLexicon ONES is for numbers with 1–9
this is to implement yksi, kaksi, kolme…, yhdeksän
yksikaksikolmeLexicon ZERO is for number 0
nolla
nollaLexicon LOPPU is to implement potential case inflection with a colon.
yhdelle
Note: accepting or rejecting case inflected digit strings without explicit
suffix can be changed here.This (part of) documentation was generated from src/fst/transcriptions/transcriptor-numbers-digit2text.lexc