GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.
This steps are valid for korp, u_korp and f_korp and need to be repeated for each genre of each language.
Make sure you have the CorpusTools installed.
Copy the *vrt files obtained after updating the monolingual content (see) in a folder named vrt_<ISO>_<DATE>
Change cLang, cDomain, date in extract_time_stamp.xsl
Run the following:
java -Xmx2048m -cp ~/main/tools/TermWikiExporter/lib/saxon9.jar -Dfile.encoding=UTF8 net.sf.saxon.Transform -it:main extract_time_stamp.xsl
cd timestamp_<ISO>_<DATE>
awk '{print $2}' metacheck_<ISO>_<GENRE>_<DATE>.txt |sort|uniq > all_years_<ISO>_<GENRE>.txt
cd ..
Change lang, date, domain in generate_tables.sh and run the following:
sh generate_tables.sh
Run the following:
cat timespan_<ISO>_<GENRE>_<DATE>.sql | mysql -u korp -p korp_DB