GiellaLT provides rule-based language technology aimed at minority and indigenous languages
The latest cat from Apple projects.
Faster, especially Mail. All sorts of nice touches and improvements, some software incompatibilities, but no big hurdles. At least some, possibly many of the command line tools are updated to recent or the latest version, e.g. svn, bash, cvs, perl, etc.
This is used to select spell checking language automatically. The quality of this detection is unknown, also the range of languages detected, as well as exactly how it works, and whether it is possible to add new languages.
After some initial testing, it seems that big languages are detected — English and German in my case — but not smaller ones, like Norwegian or Sámi. Others have had success with mixing English, Spanish, French, Italian and Greek.
It is possible to specify the spelling language manually, but then only one language pr doc in most Cocoa applications (explicit multilingual markup must be added specifically for each app, it is not supported by the Cocoa framework).
The OS will recognise aff and dic files placed in
Library/Spelling/, and based on reports on the net (“of the type used in OpenOffice”), it seems to be hunspell aff and dic files.
As soon as you place such files in that location, the language will be recognised in the Language and Text > Text system preference:
Adding North and Lule Saami spell checking in Snow Leopard \Picture: Adding North and Lule Saami spell checking in Snow Leopard.
North and Lule Saami spelling languages turned on \Picture: North and Lule Saami spelling languages turned on.
Turning on Saami spell checking in the system preferences makes the languages available in most Cocoa applications, as shown in the spelling dialog below:
Language list in spelling dialog
And it does actually work:
We finally have system-wide spell checking in Sámi, without having to resort to third-party tools!
It still seems to be a bit fragile. Finder restarted once while I was playing with this, and due to the size of our hunspell
dic+aff files, spelling was sometimes slow. But most of the time it seemed to work, and I had no nasty crashes.
In previous versions of Mac OS X, you had to restart Cocoa applications if you changed the speller language. Now the new language takes effect immediately, which means it is actually possible to switch languages in e.g. iChat.
Earlier, when opening the speller dialog using
Cmd-: the dialog would be stuck, and you had to reach for the mouse to close it. Now you just press the keyboard shortcut again, and the dialog box disappears. (You need the dialog box to change speller language.)
Although quite a few more languages are available in the interface language preference list, there are still very few three-letter language codes there. This means that only North Saami is available, and there is no possibility to add other Saami languages, or any other missing language for that matter.
The few three-letter-coded languages found so far are (ISO 639 code in parenthesis):
An interesting collection indeed. At least it shows that three-letter ISO 639-codes aren’t foreign to Mac OS X. But it also shows a certain language priority in Cupertino…
Although the system-wide spell checking service is vastly improved, it still has one major drawback: when you change the spelling language for one document, the same language is applied to all documents.
As long as you can use the multilingual spelling with automatic language detection, that is fine, but as soon as you have to resort to manually specifying the language of the document, this system breaks down. Then all my open documents in the same application will be spell-checked as the same language. This is at least how it works in SubEthaEdit.
The latest version available is tested and known to work fine on Snow Leopard, earlier versions are problematic:
sudo ports upgrade ICUor by using the GUI client Porticus This disappeared after the new icu was installed.
It seems (?) that Xerox also would like to have the latest ICU. Under Snow Leopard, Xerox had an UTF-8 problem:
a automaton: a a b b ? á Segmentation fault á automaton: á á b b č Segmentation fault