GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.
Dokumeanta čilge mo galgá bargat parallelliseremiin ja sirdit fiillaid prestable-katalogii.
Proseassa: Vuos konverteret, ja dan maŋŋil parallelliseret.
Dihtogielalaš gohččumat geavahit nob2sma ovdamearkan
Go leat main/tools/CorpusTools/
python setup.py install --user --install-scripts=$HOME/bin
sihke langs/nob- ja langs/sma-máhpas:
./autogen.sh
./configure --prefix=$HOME/.local --without-xfst --with-hfst --enable-tokenisers --enable-reversed-intersect --enable-alignment
make -j
make install
Doaibmá freecorpusis/ ja boundcorpus/ -máhpain:
grep -rl '"sma" location="..*"' --include=*.xsl orig/nob/science/
convert2xml orig/nob/science/
convert2xml orig/sma/science/
parallelize -l2 sma converted/nob/science/
Go leat rievdadan xsl-fiillas, de dát gohččun sihke konvertere ja parallellisere ođđasit:
reparallelize FIILANAMMA.tmxhtml
reparallelize tmxhtml
Mana buot omd. nob2sma-katalogaid alfabehtalaččat
čađa (vuos freecorpus/tmx/nob2sma/admin/depts/other_files
ja dan maŋŋel čuovvovaš kataloga (udir.no) jna.:
Lea vejolaš rahpat buot fiillaid oktanaga kommandolinjjás: open *.tmx
dahje dušše oasi, jos leat hui ollu fiillat: open a*.tmx
jna.
Jos .tmx-fiillas ii leat parallealla sisdoallu:
open fiila.html
, dahjeopen fiila.pdf
dahjeopen fiila.txt
see fiila.html.xsl
redigeret xsl-fiilla.see fiila.html
redigeret html-fiilla.Mo rievddadit xsl-fiilla jos paralleallafiillas lea feaila:
open fiila.html
see fiila.html
see fiila.html.xml
. Dán fiilla it galgga divvut, danne go dat genererejuvvoNu unnán go vejolaš. Jos lea vejolaš fikset ášši .xsl-fiillas, de daga dan. Jos lea jierpmálaš divvut prestable-tmx-fiillas, de daga dan.
Metateaksta
Sáhtát orig-fiillas sihkkut metateavstta, mii bilida parallelliserema dahje gielladovdama, omd.
Fuom, ahte “buhtis” metateaksta (taggaid haga) lea álkit sihkkut .xsl-fiillas.
html-taggat
Jos parallelliseren manná endorii html-taggaid dihte, lea 4 molssueavttu
Nu
Muhto jos lea álkibut divvut tmx-teavstta, de daga na:
Tagga (+ attribuhtta) lea vejolaš váldit eret das:
$GTHOME/tools/CorpusTools/corpustools/htmlcontentconverter.py
Prinsihpas lea maid vejolaš dahkat dan juohke fiillas, .xsl:as:
<xsl:variable name="skip_elements" select="'.//body/div[1]/h2[1]'"/>
muhto dat ii velge doaimma.
realign --convert fiila.tmx
realign fiila.tmx
Šekke sisa ođđa veršuvnnaid, sihke orig, convert ja prestable, ja merke OK bargolistui.
Šekke sisa ođđa veršuvnnaid, sihke orig, convert ja prestable, ja čále kommentára bargolistui.
Čále kommentára bargolistui, ja sihko tmx-fiilla ja dan html-veršuvnna prestable-katalogas, e.g.
svn rm prestable/tmx/nob2sma/facta/fiila.tmx
svn rm prestable/tmx/nob2sma/facta/fiila.tmx.html
svn ci -m "ii lean parallealla" prestable/tmx/nob2sma/facta/fiila.tmx prestable/tmx/nob2sma/facta/fiila.tmx.html
Lea vejolaš
Čále listui ahte lea bugga, ja makkár dat lea, omd BUG-punktum dahje BUG-mellomtittel dahje BUG-språkgjenkjenning
omd mo listu sáhttá leat (go kommentára lea linnjá álggus, de lea álki sorteret:
OK : prestable/tmx/nob2sma/facta/gielemnastedh.no/apen-barnehagedag.html.tmx.html
Sihkkon : prestable/tmx/nob2sma/bible/osko/index.php_kat_id=102_art_id=88.html.tmx.html
To_be_fixed BUGpunktum (dahton loahpas) : prestable/tmx/nob2sma/admin/sd/samediggi.no/sametinget-inviterer-til-duodjikonferanse-27.-28.-januar-2016.html.tmx.html