GiellaLT

GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.

View GiellaLT on GitHub

Spellers in the GiellaLT infrastructure

Presentation at the University of Alberta, Edmonton, June 15 2015

By: Sjur Moshagen, UiT The Arctic University of Norway

Presentation Overview

Background

The perfect speller

This tool will never exist, but it is the holy grail we work towards.

One reason it will never exist is the problem of precisely answering the following question:

What is a spelling error?

How to build a speller

Building an fst-based speller in the Giella framework goes like the following:

The acceptor

raw-fst
  |
  |  <- filters
  |
speller-fst (normative, without punctuation)
  |
  |  <- compounding and derivation filters, adding weights
  |
fstspeller-fst
  |
  |  <- remove the upper (analysis) side
  |
acceptor

The error model

The error model is still a bit in the flux, so the following may not hold exactly like described in the future.

The error model is presently built from several indivual parts:

Each part is compiled into an fst, and unioned into one error model file.

Speller Integration

Components

Each component can add restrictions or specific behavior for the speller, and regular maintenance is necessary as individual components are updated or changed. Also the integration with the host OS or application may change.

Overall speller components

What Do We Control?

Who controls which component

The Lexicon

Lexicon Sources

Restrictions On The Grammar

We have a similar system for derivations, based on position in a derivation sequence.

Suggestions - The Interface Of The Speller

Getting good and relevant suggestions is an important aspect of the speller. Even though coverage and recall/precision numbers might be good, the users don’t care if they get strange suggestions.

On the other hand, if they get strange suggestions, it is also indicative of a speller not able to catch all errors.

Designing An Error Model

The infrastructure is built to automatise as much as possible, but here are some aspects to keep in mind: