GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology.
This steps are valid for korp, u_korp and f_korp and need to be repeated for each genre of each language.
Make sure you have the CorpusTools installed.
Copy the *vrt files obtained after updating the monolingual content (see) in a folder named vrt_<ISO>_<DATE>
Change cLang, cDomain, date in extract_time_stamp.xsl
Run the following:
java -Xmx2048m -cp ~/main/tools/TermWikiExporter/lib/saxon9.jar -Dfile.encoding=UTF8 net.sf.saxon.Transform -it:main extract_time_stamp.xsl
cd timestamp_<ISO>_<DATE>
awk '{print $2}' metacheck_<ISO>_<GENRE>_<DATE>.txt |sort|uniq > all_years_<ISO>_<GENRE>.txt
cd ..
Change lang, date, domain in generate_tables.sh and run the following:
sh generate_tables.sh
Run the following:
cat timespan_<ISO>_<GENRE>_<DATE>.sql | mysql -u korp -p korp_DB