GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.
Vislcg3 is the newest parser generation from Odense. As its predecessor, vislcg, it is open source. Vislcg3 is licensed under GPL.
Starting on March 5th 2008, we have migrated to vislcg3. Rule files for vislcg are still available in older revisions. For vislcg3 documentation we recommend the online documentation at beta.visl.sdu.dk/cg3.html. Here, we only document installation and basic usage.
/opt/local/bin/port search twig
in a Terminal. If that is
successfull, go on to the next step. If not install
MacPorts, then issue the
following commands in your terminal window: sudo port install cmake
sudo port install boost
sudo port install icu
cd /tmp
svn co http://visl.sdu.dk/svn/visl/tools/vislcg3/trunk vislcg3
cd vislcg3/
./cmake.sh --prefix=/usr/local
make -j3
./test/runall.pl
sudo make install
The vislcg3 site has more info on Mac OS installation (including instructions for HomeBrew users), but note that we for our Apertium work assume your vislcg3 is instlled in /usr/local and not in /opt/local.
The vislcg3 site has installation instructions for Linux and Windows
The CG .rle files can be run as text files, or comiled. They will be
compiled with the make TARGET=$LANG
command d:
... | vislcg3 -g src/sme-dis.rle | ...
Vislcg3 can be run with this command:
... | vislcg3 -g src/sme-dis.rle | ...
The list of flags can be obtained by vislcg3 –help. That command prints something like this (use the newest version rather than this list):
-bash-3.00$ vislcg3 -h
VISL CG-3 Disambiguator version 0.9.2.3279
Usage: vislcg3 [OPTIONS]
Options:
-h or -? or --help Displays this list.
-V or --version Prints version number.
-g or --grammar Specifies the grammar file to use for disambiguation.
-p or --vislcg-compat Tells the grammar compiler to be compatible with older VISLCG syntax.
--grammar-out Writes the compiled grammar back out in textual form to a file.
--grammar-bin Writes the compiled grammar back out in binary form to a file.
--grammar-info Writes the compiled grammar back out in textual form to a file, with lots of statistics and information.
--grammar-only Compiles the grammar only.
--trace Prints debug output alongside with normal output.
--prefix Sets the prefix for mapping. Defaults to @.
--sections Number of sections to run. Defaults to running all sections.
--single-run Only runs each section once.
--no-mappings Disables running any MAP, ADD, or REPLACE rules.
--no-corrections Disables running any SUBSTITUTE or APPEND rules.
--no-before-sections Disables running rules from BEFORE-SECTIONS.
--no-sections Disables running rules from any SECTION.
--no-after-sections Disables running rules from AFTER-SECTIONS.
--num-windows Number of windows to keep in before/ahead buffers. Defaults to 2.
--always-span Forces all scanning tests to always span across window boundaries.
--soft-limit Number of cohorts after which the SOFT-DELIMITERS kick in. Defaults to 300.
--hard-limit Number of cohorts after which the window is delimited forcefully. Defaults to 500.
--no-magic-readings Prevents running rules on magic readings.
--dep-allow-loops Allows the creation of circular dependencies.
-O or --stdout A file to print output to instead of stdout.
-I or --stdin A file to read input from instead of stdin.
-E or --stderr A file to print errors to instead of stderr.
-C or --codepage-all The codepage to use for grammar, input, and output streams. Auto-detects default from environment.
--codepage-grammar Codepage to use for grammar. Overrides --codepage-all.
--codepage-input Codepage to use for input. Overrides --codepage-all.
--codepage-output Codepage to use for output and errors. Overrides --codepage-all.
-L or --locale-all The locale to use for grammar, input, and output streams. Defaults to en_US_POSIX.
--locale-grammar Locale to use for grammar. Overrides --locale-all.
--locale-input Locale to use for input. Overrides --locale-all.
--locale-output Locale to use for output and errors. Overrides --locale-all.