GiellaLT

GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.

View GiellaLT on GitHub

vislcg3

Vislcg3 is the newest parser generation from Odense. As its predecessor, vislcg, it is open source. Vislcg3 is licensed under GPL.

Starting on March 5th 2008, we have migrated to vislcg3. Rule files for vislcg are still available in older revisions. For vislcg3 documentation we recommend the online documentation at beta.visl.sdu.dk/cg3.html. Here, we only document installation and basic usage.

Installation

Mac OS X

          sudo port install cmake
          sudo port install boost
          sudo port install icu
          cd /tmp
          svn co http://visl.sdu.dk/svn/visl/tools/vislcg3/trunk vislcg3
          cd vislcg3/
          ./cmake.sh --prefix=/usr/local
          make -j3
          ./test/runall.pl
          sudo make install

The vislcg3 site has more info on Mac OS installation (including instructions for HomeBrew users), but note that we for our Apertium work assume your vislcg3 is instlled in /usr/local and not in /opt/local.

Other operating systems

The vislcg3 site has installation instructions for Linux and Windows

Compilation and usage of CG files

The CG .rle files can be run as text files, or comiled. They will be compiled with the make TARGET=$LANG command d:

... | vislcg3 -g src/sme-dis.rle | ...

Vislcg3 can be run with this command:

... | vislcg3 -g src/sme-dis.rle | ...

Flags

The list of flags can be obtained by vislcg3 –help. That command prints something like this (use the newest version rather than this list):

-bash-3.00$ vislcg3 -h
VISL CG-3 Disambiguator version 0.9.2.3279
Usage: vislcg3 [OPTIONS]

Options:
-h or -? or --help       Displays this list.
-V or --version          Prints version number.
-g or --grammar          Specifies the grammar file to use for disambiguation.
-p or --vislcg-compat    Tells the grammar compiler to be compatible with older VISLCG syntax.
--grammar-out            Writes the compiled grammar back out in textual form to a file.
--grammar-bin            Writes the compiled grammar back out in binary form to a file.
--grammar-info           Writes the compiled grammar back out in textual form to a file, with lots of statistics and information.
--grammar-only           Compiles the grammar only.
--trace                  Prints debug output alongside with normal output.
--prefix                 Sets the prefix for mapping. Defaults to @.
--sections               Number of sections to run. Defaults to running all sections.
--single-run             Only runs each section once.
--no-mappings            Disables running any MAP, ADD, or REPLACE rules.
--no-corrections         Disables running any SUBSTITUTE or APPEND rules.
--no-before-sections     Disables running rules from BEFORE-SECTIONS.
--no-sections            Disables running rules from any SECTION.
--no-after-sections      Disables running rules from AFTER-SECTIONS.

--num-windows            Number of windows to keep in before/ahead buffers. Defaults to 2.
--always-span            Forces all scanning tests to always span across window boundaries.
--soft-limit             Number of cohorts after which the SOFT-DELIMITERS kick in. Defaults to 300.
--hard-limit             Number of cohorts after which the window is delimited forcefully. Defaults to 500.
--no-magic-readings      Prevents running rules on magic readings.
--dep-allow-loops        Allows the creation of circular dependencies.

-O or --stdout           A file to print output to instead of stdout.
-I or --stdin            A file to read input from instead of stdin.
-E or --stderr           A file to print errors to instead of stderr.

-C or --codepage-all     The codepage to use for grammar, input, and output streams. Auto-detects default from environment.
--codepage-grammar       Codepage to use for grammar. Overrides --codepage-all.
--codepage-input         Codepage to use for input. Overrides --codepage-all.
--codepage-output        Codepage to use for output and errors. Overrides --codepage-all.

-L or --locale-all       The locale to use for grammar, input, and output streams. Defaults to en_US_POSIX.
--locale-grammar         Locale to use for grammar. Overrides --locale-all.
--locale-input           Locale to use for input. Overrides --locale-all.
--locale-output          Locale to use for output and errors. Overrides --locale-all.