To enable tags in a natural language instead of the quite cryptic tags we
normally use (like
+N etc.), you need to do the following:
The name of the regexes and the name of the localised fst’s are related,
tagsets/foo.regex corresponds to
Typically you would want to use the language code of the natural language the
tags are written in.
Since the natural language of the tags will vary a lot depending on the
language of the analyser, the regex is language specific. But to get a starting
point, have a look at
langs/sme/, or use it as a starting point:
cp langs/sme/src/tagsets/nob.regex langs/YOURLANG/src/tagsets/LANGCODE.regex
LANGCODE with the relevant language codes.
Then start to edit the file to get what you want. For North Sami (
have tags in Norwegian Bokmål (
For the build system to properly build the fst’s that are going to change the
tags, you need to tell it that there is a source file and some targets to be
built. This you specify in
Open this file, and list the source file(s) in the variable
Then list the
xfst targets in the appropriate sections for the
In the North Sami case, these files are named:
You also need to tell the build system that you have a new set of analysers and
generators you want to build. This is done in the file
$GTLANG/src/Makefile.am. For North Sami, we want to build e.g. the file
analyser-nob-desc.hfst. You tell the build system this by adding that name
to the variable
GT_ANALYSERS_HFST. And similar for other files. Have a look
at North Sámi to see how it is done for the rest of the analysers and generators
you may want.
When these three steps are done, you can type
make, and you should soon be
greeted with a new set of analysers and generators!