This file gives an nverview of some still ad hoc solutions for disambiguation.
Prerequisites:
- vislcg3 installed A text corpus.
How to analyse
Plains Cree differs from the other languuages in not having an adjusted version of the preprocessor yet. While waiting , we do some ad hoc solutions. Here is a pipeline that gives an analysis.
cat misc/CORR_Dog_Biscuits.txt |preprocess|lookup src/analyser-gt-desc.xfst | lookup2cg | vislcg3 -g src/syntax/disambiguation.cg3
The string lookup src/analyser-gt-desc.xfst might be express as the alias ucrk.
Dog Biscuits does not use the ‘ symbol as a letter, so we may use preprocess. With the Mary Wells text (and possibly other texts) using this letter, we use an ad hoc set of commands instead of preprocess, as below. The command below is there to build a
cat misc/7C_Mary_Wells.txt |sed 's/\([.,:;‘’"]\)/ \1 /g;'](tr '[ )' '\n'|\
grep -v '~$'|grep -v '^$'|lookup src/analyser-gt-desc.xfst | lookup2cg | vislcg3 -g src/syntax/disambiguation.cg3
Missing list
In order to make good analyses, we need the words of the text in the analyser, i.e. we need to build a missing list, and add its word to the analyser. Here is a command for making a missing list (for the two texts, respectively).
cat misc/CORR_Dog_Biscuits.txt |preprocess|lookup src/analyser-gt-desc.xfst |grep '?'|sort|uniq -c|sort -nr|less
cat misc/7C_Mary_Wells.txt |sed 's/\([.,:;‘’"]\)/ \1 /g;'](tr '[ )' '\n'|\
grep -v '~$'|grep -v '^$'|lookup src/analyser-gt-desc.xfst |grep '?'|sort|uniq -c|sort -nr|less
cat misc/PCT.txt|ucrk|grep '?'|sort|uniq -c|sort -nr|less
Strategies for disambiguation
Look at common ambiguity patterns in some texts.
- Grammar ambiguity
- Word ambiguity
To create similar statics, use the sum-cg.pl script (write sum-cg.pl –help
in order to get just that. The input to the script should be the analysed text before
disambiguation:
cat misc/CORR_Dog_Biscuits.txt |preprocess|lookup src/analyser-gt-desc.xfst | lookup2cg > xxdogbiscuits.multi
sum-cg.pl --grammar xxdogbiscuits.multi | less
You may of course also take the disambiguated text as input, and use the sum-cg as a script to find where to go next.
vislcg3 rules
Operators:
- DELIMITERS :
- This will work as an on-the-fly sentence (disambiguation window) delimiter.
- LIST KIN = “daughter” “mother” “father” “aunt” “uncle” ;
- LIST WORD = N ADJ V ADV NUM ;
- LIST NP-MOD = ADJ ADV ;
- LIST @Kinship = @Kinship ;
- SET NOT-NP-MOD = WORD - NP-MOD ;
- SELECT INFM IF (1 INF) ;
- Singles out a reading from the cohort that matches the target, and if all contextual tests are satisfied it will delete all other readings except the matched one.
- REMOVE INFM IF (NOT 1 INF) ;
- Singles out a reading from the cohort that matches the target, and if all contextual tests are satisfied it will delete the mached reading.
- SELECT N IF (*-1 ART BARRIER NOT-NP-MOD) ;
- SELECT N IF (*-1 ART CBARRIER V) ;
- REMOVE INF (NEGATE -1 N LINK -1 INF LINK -1 INFM ) ;
- MAP @Kinship TARGET KIN + N ;
- ADD
- Appends tag to matching readings. Will not block for adding further tags
- MAP @3SERR TARGET (-3S)(-1 (S NOM) OR (3S NOM)) ;
- Appends tags to matching readings, and blocks other MAP, ADD
- SUBSTITUTE (N S NOM) (N S Nom) N ;
- Replaces the tags in the first list with the tags in the second list.
Careful mode:
- 1C NOM
- if the only reading left is NOM
Apply function tags
Make a set for the function tag, and make one or more rules:
- LIST @SUBJ = @SUBJ ;
- MAP @SUBJ TARGET NOM IF (1 VFIN) ;
- LIST @SUBJ> = @SUBJ> ;
- MAP @SUBJ> TARGET NOM IF (1 VFIN) ;
- with arrow towards the finite verb
Aliases
Put these in your .profile or .bashrc folder
alias crkdep="sent-proc.sh -l crk -s dep"
alias crkdept="sent-proc.sh -l crk -s dep -t"
alias crkdis="sent-proc.sh -l crk -s dis"
alias crkdist="sent-proc.sh -l crk -s dis -t"