Finite state and Constraint Grammar based analysers, proofing tools and other resources
View the project on GitHub giellalt/lang-rmy
Classifying Romani languages for language technology purposes is complicated. There are more Romani languages than there are ISO codes and some ISO codes thus cover several varieties. One and the same language may have several diverging normative bodies.
The table below gives language names, ISO and Glottolog codes (with link to Glottolog) for Romani languages in the Nordic countries, as well as indicated whether the language is standardised by the authorities in Finland, Sweden or Norway. Eventual status in other countries is not included in the table. GiellaLT status indicates whether it has been worked with here. Alpha indicates working language models with some content. Experiment indicates a working setup with no linguistic content.
Glottolog name | Alternate names | Name in official documents | GiellaLT status | ISO code*) | GiellaLT code**) | Glottolog code | Standard in country |
---|---|---|---|---|---|---|---|
Kalo Finnish Romani | Kaale | Suomen romani | Alpha | rmf |
rmf |
kalo1256 | Finland |
Tavringer Romani | Resanderomska | Experiment | rmu |
rmu |
tavr1235 | Sweden | |
Romani arli | arlikane | Arli | Experiment | rmn |
rmn ***) |
arli1238 | Sweden |
Romani kalderaš | kelderašicko | Kalderash | Experiment | rmy |
rmy-x-kalderas |
kald1238 | Sweden |
Romani lovara | lovari, lovaricko | Romanés | no | rmy |
rmy-NO |
lova1240 | Norway |
Romani lovara | lovari, lovaricko | Lovari | no | rmy |
rmy-x-lovara |
lova1240 | Sweden |
Polish Romani | Polsk romska | no | rml |
rml |
poli1261 | Sweden | |
Traveller Norwegian | romani rakkripa | Romani | Alpha | rmg |
rmg |
trav1236 | Norway |
*) Note that three of the ISO codes have a wider coverage than in the table above: rmn – Balkan Romani, rml – Baltic Romani, rmy – Vlax Romani are all used also in a wider European context, and for more varieties than the ones referred to here.
**) BCP47 codes used to name repositories in the GiellaLT infrastructure.
***) rmn
should in the GiellaLT context really be named rmn-SE
, as we presently only work with data and representatives from Sweden.
Starting 2017, the Swedish Language Council has initiated a project aiming at revising the orthographies of Romani languages in Swedan, cf. this orientation. At present (spring 2022), all languages marked Sweden in the table above have their own distinct orthographies, but one possible outcome of the Swedish project is thus that several of them may be unified.