On this page
Find unlexicalised compounds in the dictionaries
We have made a python script that goes through all compound nouns in the
dictionary and check whether they are found as lexicalised compounds in the
analyser. The script is find-unlexicalized-compounds.py , and it can be found
in giella-core/scripts. The script should be in your path. To use The script:
Stand in dict-xxx-yyy (here: dict-smn-fin) and collect the unlexicalised smn
compounds, as follows:
find-unlexicalized-compounds.py -i src/N_smnfin.xml -l smn -o missing.txt
The resulting nouns in missing.txt may be made into candidates for addition to
nouns.lexc with another script gtmissing as follows:
cat missing.txt | gtmissing -l smn > missing.lexc