GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started, and our Privacy document.
Participants: Børre, Linda, Sjur, Tomi, Trond
Questions:
Views:
Work process:
Conclusions:
Double linking, iconic id Iconic id decided by the following principle:
With the princ of inheritance (lemma inherited from common file):
common | swe | fin |
India_2 | India | Intia |
->lg=a | ->India_2 | ->India_2 |
(->lg=b Intia)| ->India | |
sem plc
...
Timbuktu | Timbuktu | Timbuktu |
->lg=a id | ->Timbuktu | ->Timbuktu |
->lg=b id | | |
sem plc | <ab>Tmb.</ab>|
...
| sme: | ... | nor: | fin | swe | eng
Tana | Deatnu | ... | Tana | | |
->lg=a id | ->Tana | ... |->Tana | | |
->lg=b id | | ... | | | |
sem plc | ... | | |
... ... | | |
What do we store in the “common” file the iconic id the semantics + info about the world (encyclopedic info) links to the lg specific files
What is stored in the lang-specific ones? Linguistic info:
NATO => OTAN
NRL => NBR, Ap => Bb
KRD
KRD KRD
KRD KRD+N+ACR+Sg+Acc
KRD KRD+N+ACR+Sg+Gen
KRD KRD+N+ACR+Sg+Nom
NATO
NATO NATO+N+ACR+Sg+Acc
NATO NATO+N+ACR+Sg+Gen
NATO NATO+N+ACR+Sg+Nom
NATO NATO+N+Prop+Org+Sg+Acc
NATO NATO+N+Prop+Org+Sg+Attr
NATO NATO+N+Prop+Org+Sg+Gen
NATO NATO+N+Prop+Org+Sg+Nom
"<NATO>" S:1732, 1732, 1732, 1732, 5423, 5849, 5849, 9980
"NATO" N Prop Org Sg Nom <<< S:1285 @HNOUN
Different aspects of abbreviations and acronyms:
Lexicon conclusion:
Transducer conclusions:
<entry id="India" type="full (default)/abr/acr/alt/err">
<sem>
<plc type="xxx" ssrcode="" > <!-- type=5., ssrcode=6. -->
<geo>
<country>IN</country>
<region/> <!-- "fylke" or similar, 11. -->
<munic/> <!-- 10. -->
<coord /> <!-- 14. -->
</geo>
<regul>
<gnr/> <!-- 7. -->
<bnr/> <!-- 7. -->
</regul>
</plc>
</sem>
<!-- These links are convenience entries, to speed up processing -->
<langentry lang="sme" ref="India"/>
<langentry lang="smj" ref="India"/>
...
<langentry lang="fin" ref="Intia"/>
</entry>
<entry id="India_2">
<sem>
<fem/>
</sem>
<langentry lang="sme" ref="India"/>
<langentry lang="smj" ref="India"/>
...
<langentry lang="fin" ref="India"/>
</entry>
<entry id="India">
<!-- Do we need the stem, or can it be inferred/inherited from the id?
NO, only if different from the id. -->
<stem/>
<infl lexc="ACCRA">(example?)</infl>
<name-parts>
<etym/>
<rel-name ref="xyz"/>
<senses>
<sense ref="India_2"/>
<sense ref="India"/>
</senses>
</entry>
(numbers refer to Irene’s draft, see below)
<entry id="Intia"> <!-- 1. -->
<stem lexc="14">(only if different from id/headword)</stem> <!-- 2. and 3. -->
<name-parts> <!-- 4. -->
<variants> <!-- 15. -->
<variant ref="xyz">
</variants>
<etym/> <!-- 24. -->
<rel-name ref="xyz"/> <!-- 18. -->
<senses>
<sense ref="India"/>
</senses>
</entry>
<entry id="India">
<stem/>
<infl lexc="14">(example?)</infl>
<name-parts>
<etym/>
<rel-name ref="xyz"/>
<senses>
<sense ref="India_2"/>
</senses>
</entry>
(numbers refer to Irene’s draft, see the [meeting memo from Nov. 28 | https://giellalt.uit.no/admin/weekly/2005/Meeting_2005-11-28.html#7.+Name+lexicon+infrastructure])
<entry id="Porsanki"> <!-- 1. -->
<stem lexc="14">(only if different from id/headword)</stem> <!-- 2. and 3. -->
<name-parts> <!-- 4. -->
<variants> <!-- 15. -->
<variant ref="xyz">
</variants>
<etym/> <!-- 24. -->
<rel-name ref="xyz"/> <!-- 18. -->
<senses>
<sense ref="Porsanger">
<legal>
<status/> <!-- 8. -->
<decision/> <!-- 9. -->
</legal>
<source>
<informants>
<informant id="some-id"> <!-- 20. -->
<explanation date="" /> <!-- 19. -->
<explanation date="" />
</informant>
</informants>
<collectors>
<collector id="" year=""/> <!-- 21. -->
<collector id="" year=""/>
</collectors>
<archive/>
<other>
<print/>
</other>
</source>
<comment/> <!-- 28. -->
</sense>
</senses>
</entry>
In the case that stem = lemma, we have the entry:
<stem lexc="14"/>
These points from Irene’s list are still open:
Print info - do they belong to the common or language-specific sections?:
12. kartprodukt
13. kartblad
Unclassified:
25. pilhenvisning, nuoliviite, til annen artikkel
-> How is this different from 18.?
Multimedia - do they belong to the common or language-specific sections?:
26. lydfil
27. bilde(r), illustrasjone(r)