GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started, and our Privacy document.
The possibility to use an
AppleScript to
drive the spell checker in MS Word is now documented as part of the
regular automated testing documentation. The
applescript itself is found in
gt/script/spellcheckWithMSWord.applescript
. Below follows a short
description of how to run the applescript stand-alone, its input
parameters and data, and the output data.
The script takes the following three input parameters:
[-.: ]
are removed (these chars “breaks” the input
strings into separate “words” in MS Word - it might be possible to
work around, but it isn’t worth the effort right now.It is necessary to run the script from a local terminal session, for the script to be able to open MS Word in the graphical UI of MacOS X.
The actual command to run the script should be something like:
osascript script/spellcheckWithMSWord.applescript sme InFile OutFile
In the standard distribution in the Nordic countries, proofing tools for more languages than listed above is installed (and there are no tools for Icelandic). It should be easy to expand the list to cover all tools installed, or even all languages recognised by MS Word. This is left as an exercise for the future.
The output file is a TAB-delimited text file of the following format:
InputWord<TAB>SpellerStatus<TAB>Suggestions
The suggestions are themselves TAB-separated.
The SpellerStatus field can have two values:
Here follows a simple pseudocode of the AppleScript. See the source file for the gory details.
Since the applescript does not rely on any specific spelling engine, but is just querying MS Word for the spelling status of individual words, it is possible to test any language for which there is an installed speller.
Among other things, this makes it possible to compare the spellers of different languages, both from the same vendor and from different vendors, given that the test documents share sufficiently similar spelling error characteristics.
Although the applescript option provides the benefit of being speller engine independent, it has its limitations.
Scripting the speller in MS Word is slow, even more so as MS Word 2004 is not running natively on the latest and greatest Intel Macs.
Because of Word’s built-in string segmentation in its AppleScript implementation, some possible input words are not feasible test data. This is very unfortunate, since it means that the scripted speller behaves differently from the interactive speller for some types of input. This is at least true for the following chars:
- (hyphen)
: (colon)
. (full stop)
Technically it means that Word’s definition of a word in AppleScript is different from what it is when used interactively. Basically, the above characters are considered non-word chars, and will split strings containing them into separate words on each side of the chars. In practice, this means that we can’t meaningfully test words of the type:
CD:s (in/on the CD)
TV-stasjåvnnå (TV station)
cand.phil.
Since including such words in the test data will just destroy the test results, they are removed before the test is run. The test result is because of this not completely covering all cases, but removing such words is better than destroying the test results.
Because we are using AppleScript, it only runs on the Mac. But the underlying AppleScript implementation in Word is based on VisualBasic (VB), and is very close to the VB scripting found in Word for Windows. It should thus be relatively straightforward for someone knowing VB to port this script to run on Windows using VB.