Finite state and Constraint Grammar based analysers, proofing tools and other resources
Transcribing numbers to words in Finnish is not completely trivial, one reason is that numbers in Finnish are written as compounds, regardless of length: 123456 is satakaksikymmentäkolmetuhattaneljäsataaviisikymmentäkuusi. Another limitation is that inflections can be unmarked in running text, that is digit expression is assumed to agree the case of the phrase it is in, e.g. 27 is kaksikymmentäseitsemän, and 27:lle kahdellekymmenelleseitsemälle but in a phrase: “tarjosin 27 osanottajalle” 27 assumes the allative case without marking and it is preferred grammatical form in good writing.
Flag diacritics in number transcribing are used to control case agreement: in Finnish numeral compounds all words agree in case except in nominative singular where 10’s exponential multipliers are in singular partitive.
@U.CASE.SGNOM@
for singular nominative agreement@U.CASE.SGALL@
for singular allative agreementThe morphotactics related to numbers and their transcriptions is that we need to know the whole digit string to know how the length of whole digit string to know what to start reading, and zeroes are not read out but have an effect to readout. The numerals are systematic and perfectly compositional: the implementation of 100 000–999 999 is almost exactly same as 100 000 000–999 000 000 and everything afterwads with the change of word tuhat~tuhatta, miljoona~miljoonaa, miljardia, biljoonaa, biljardia and so forth–that is along the long scale British (French) system where American billion = milliard etc. The numbers are built from ~single word length blocks in decreasing order with the exception of zig-zagging over numbers 11–19 where the second digit comes before first. The rest of this documentation describes the morphotactic implementation by the lexicon structure in descending order of magnitude with examples.
yksi
kaksikymmentäyksi
kolmesataakaksikymmentäyksi
neljätuhattakolmesataakaksikymmentäyksi
viisikymmentäneljätuhattakolmesataakaksikymmentäyksi
kuusisataaviisikymmentäneljätuhattakolmesataakaksikymmentäyksi
seitsemänmiljoonaakuusisataaviisikymmentäneljätuhattakolmesataakaksikymmentäyksi
Lexicon HUNDREDSMRD
contains numbers 2-9 that need to be followed by exactly
11 digits: 200 000 000 000–999 999 999 999
this is to implement Nsataa…miljardia…
Lexicon CUODIMRD
contains numbers 2-9 that need to be followed by exactly
this is to implement Nsataa…miljardia…
kaksisataamiljardia
Lexicon HUNDREDMRD
is for numbers in range: 100 000 000 000–199 000 000 000
this is to implement sata…miljardia…
satamiljardia
Lexicon TEENSMRD
is for numbers with 11 000 000 000–19 000 000 000
this is to implement …Ntoista…miljardia…
Lexicon TEENMRD
is for numbers with 11 000 000 000–19 000 000 000
this is to implement …Ntoista…miljardia…
kaksitoistailjardia
Lexicon TENSMRD
is for numbers with 20 000 000 000–90 000 000 000
this is to implement …Nkymmentä…miljardia…
Lexicon TENMRD
is for numbers with 10 000 000 000–10 999 999 999
this is to implement …kymmenenmiljardia…
kymmenenmiljardia
Lexicon LÅGEVMRD
is for numbers with 20 000 000 000–90 000 000 000
this is to implement …Nkymmentä…miljardia…
kaksikymmentämiljardia
Lexicon ONESMRD
is for numbers with 1 000 000 000–9 000 000 000
this is to implement …Nmiljardia…
Lexicon MILJARD
is for numbers with 1 000 000 000–9 000 000 000
this is to implement …Nmiljardia…
kaksimiljardia
Lexicon OVERMILLIONS
is for the millions part of numbers greater than 1 milliard
Lexicon HUNDREDSM
contains numbers 2-9 that need to be followed by exactly
8 digits: 200 000 000–999 999 999
this is to implement Nsataa…miljoonaa…
Lexicon CUODIM
contains numbers 2-9 that need to be followed by exactly
this is to implement Nsataa…miljoonaa…
kaksisataamiljoonaa
Lexicon HUNDREDM
is for numbers in range: 100 000 000–199 000 000
this is to implement sata…miljoonaa…
Lexicon TEENSM
is for numbers with 11 000 000–19 000 000
this is to implement …Ntoista…miljoonaa…
Lexicon TEENM
is for numbers with 11 000 000–19 000 000
this is to implement …Ntoista…miljoonaa…
kaksitoistamiljoonaa
Lexicon TENSM
is for numbers with 20 000 000–90 000 000
this is to implement …Nkymmentä…miljoonaa…
Lexicon TENM
is for numbers with 10 000 000–10 999 999
this is to implement …kymmenenmiljoonaa…
kymmenenmiljoonaa
Lexicon LÅGEVM
is for numbers with 20 000 000–90 000 000
this is to implement …Nkymmentä…miljoonaa..
kaksikymmentämiljoonaa
Lexicon ONESM
is for numbers with 1 000 000–9 000 000
this is to implement …Nmiljoonaa…
Lexicon MILJON
is for numbers with 1 000 000–9 000 000
this is to implement …Nmiljoonaa…
kaksisataamiljoonaa
Lexicon UNDERMILLION
is for numbers with 100 000–900 000 after milliards
Lexicon OVERTHOUSANDS
is for the thousands part of numbers greater than 1 million
Lexicon HUNDREDST
contains numbers 2-9 that need to be followed by exactly
5 digits: 200 000–999 999
this is to implement Nsataa…tuhatta…
Lexicon CUODIT
contains numbers 2-9 that need to be followed by exactly
this is to implement Nsataa…tuhatta…
kaksisataatuhatta
Lexicon HUNDREDT
is for numbers in range: 100 000–199 000
this is to implement sata…tuhatta…
Lexicon TEENST
is for numbers with 11 000–19 000
this is to implement …Ntoista…tuhatta…
Lexicon TEENT
is for numbers with 11 000–19 000
this is to implement …Ntoista…tuhatta…
kaksitoistatuhatta
Lexicon TENST
is for numbers with 20 000–90 000
this is to implement …Nkymmentä…tuhatta…
Lexicon TENT
is for numbers with 10 000 000–10 999 999
this is to implement …kymmenentuhatta…
kymmenentuhatta
Lexicon LÅGEVT
is for numbers with 20 000–90 000
this is to implement …Nkymmentä…tuhatta..
kaksikymmentätuhatta
Lexicon ONEST
is for numbers with 1 000–9 000
this is to implement …Ntuhatta…
Lexicon THOUSANDS
is for numbers with 1 000–9 000
this is to implement …Ntuhatta…
kaksituhatta
kolmetuhattaneljäsataaviisikymmentäkuusi
Lexicon THOUSAND
is for the ones-tens-hundreds of numbers greater than thousand
Lexicon UNDERTHOUSAND
is for numbers with 100–900 after thousands
Lexicon HUNDREDS
contains numbers 2-9 that need to be followed by exactly
2 digits: 200–999
this is to implement Nsataa…
Lexicon CUODI
contains numbers 2-9 that need to be followed by exactly
this is to implement Nsataa…
kaksisataa
kolmesataaneljäkymmentäviisi
Lexicon HUNDRED
is for numbers in range: 100–999
Lexicon TEENS
is for numbers with 11–19
this is to implement …Ntoista
Lexicon TEEN
is for numbers with 11–19
this is to implement …Ntoista
yksitoista
kaksitoista
kolmetoista
Lexicon TENS
is for numbers with 20–90
this is to implement …Nkymmentä…
Lexicon LÅGEV
is for numbers with 20–90
this is to implement …Nkymmentä…
kaksikymmentä
kolmekymmentäneljä
Lexicon JUSTTEN
is for number 10
this is to implement …kymmenen
kymmenen
Lexicon ONES
is for numbers with 1–9
this is to implement yksi, kaksi, kolme…, yhdeksän
yksi
kaksi
kolme
Lexicon ZERO
is for number 0
nolla
nolla
Lexicon LOPPU
is to implement potential case inflection with a colon.
yhdelle
Note: accepting or rejecting case inflected digit strings without explicit
suffix can be changed here.This (part of) documentation was generated from src/fst/transcriptions/transcriptor-numbers-digit2text.lexc