GiellaLT

GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.

View GiellaLT on GitHub

Hyphenating texts using the API server

For the time being, only North Sámi can be hyphenated as shown below. For other languages than North Sámi, see this document. We hope to add support for more languages soon.

Basic command

curl -s -X POST -H 'Content-Type: application/json' \
     -i 'https://api-giellalt.uit.no/hyphenation/hyphenator-gt-desc' \
     --data '{"text": "mun hálan davvisámegiela"}' |\
     grep '{' |\
     jq '.' 

Comments:

Output:

{
  "text": "mun hálan davvisámegiela",
  "results": [
    {
      "word": "mun",
      "hyphenations": [
        {
          "value": "mun",
          "weight": 0.0
        },
        {
          "value": "mun",
          "weight": 5000.0
        }
      ]
    },
    {
      "word": "hálan",
      "hyphenations": [
        {
          "value": "há^lan",
          "weight": 0.0
        },
        {
          "value": "há^lan",
          "weight": 5000.0
        }
      ]
    },
    {
      "word": "davvisámegiela",
      "hyphenations": [
        {
          "value": "dav^vi#sá^me#gie^la",
          "weight": 0.0
        },
        {
          "value": "dav^vi^sá^me^gie^la",
          "weight": 5000.0
        }
      ]
    }
  ]
}

This is the raw output from the API server. Comments on the output:

Filtered examples

curl -s -X POST -H 'Content-Type: application/json' \
    -i 'https://api-giellalt.uit.no/hyphenation/hyphenator-gt-desc' \
    --data '{"text": "mun hálan davvisámegiela"}' |\
    grep '{' |\
    jq '.results[].hyphenations | map(select(.value)) | first'

Comment:

Output:

{
  "value": "mun",
  "weight": 0.0
}
{
  "value": "há^lan",
  "weight": 0.0
}
{
  "value": "dav^vi#sá^me#gie^la",
  "weight": 0.0
}

The same example, but now with a misspelling; notice the change in weight for the last word:

curl -s -X POST -H 'Content-Type: application/json' \
    -i 'https://api-giellalt.uit.no/hyphenation/hyphenator-gt-desc' \
    --data '{"text": "mun hálan davvisámegiellla"}' |\
    grep '{' |\
    jq '.results[].hyphenations | map(select(.value)) | first'

Output:

{
  "value": "mun",
  "weight": 0.0
}
{
  "value": "há^lan",
  "weight": 0.0
}
{
  "value": "dav^vi^sá^me^giell^la",
  "weight": 5000.0
}

If you only want the hyphenated input text, and not the json stuff, use the following jq filtering:

curl -s -X POST -H 'Content-Type: application/json' \
    -i 'https://api-giellalt.uit.no/hyphenation/hyphenator-gt-desc' \
    --data '{"text": "mun hálan davvisámegiela"}' |\
    grep '{' |\
    jq '.results[].hyphenations | map(select(.value).value) | first'

Output:

"mun"
"há^lan"
"dav^vi#sá^me#gie^la"

Add -r/--raw-output to jq if you want to get rid of the quotes:

curl -s -X POST -H 'Content-Type: application/json' \
    -i 'https://api-giellalt.uit.no/hyphenation/hyphenator-gt-desc' \
    --data '{"text": "mun hálan davvisámegiela"}' |\
    grep '{' |\
    jq -r '.results[].hyphenations | map(select(.value).value) | first'

Output:

mun
há^lan
dav^vi#sá^me#gie^la

If you have a text file that you would like to have hyphenated, do as follows:

cat textfile.txt |\
    (printf '{"text": "' && cat && printf '"}') |\
    curl -s -X POST -H 'Content-Type: application/json' \
    -i 'https://api-giellalt.uit.no/hyphenation/hyphenator-gt-desc' \
    --data @- |\                                    
    grep '{' |\
    jq '.results[].hyphenations | map(select(.value).value) | first'

Comments:

Output (assuming the textfile.txt file has the same content as the example sentence used above):

"mun"
"há^lan"
"dav^vi#sá^me#gie^la"