GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.
For the time being, only North Sámi can be hyphenated as shown below. For other languages than North Sámi, see this document. We hope to add support for more languages soon.
curl -s -X POST -H 'Content-Type: application/json' \
-i 'https://api-giellalt.uit.no/hyphenation/hyphenator-gt-desc' \
--data '{"text": "mun hálan davvisámegiela"}' |\
grep '{' |\
jq '.'
Comments:
curl
to access the REST API, with the -s
parameter to silence it.--data
contains the actual text to be hyphenated. It can be long, but should preferably be restricted to single paragraphs for execution time reasons.grep
is just to get rid of curl
metadata from the processingjq .
to pretty print the outputOutput:
{
"text": "mun hálan davvisámegiela",
"results": [
{
"word": "mun",
"hyphenations": [
{
"value": "mun",
"weight": 0.0
},
{
"value": "mun",
"weight": 5000.0
}
]
},
{
"word": "hálan",
"hyphenations": [
{
"value": "há^lan",
"weight": 0.0
},
{
"value": "há^lan",
"weight": 5000.0
}
]
},
{
"word": "davvisámegiela",
"hyphenations": [
{
"value": "dav^vi#sá^me#gie^la",
"weight": 0.0
},
{
"value": "dav^vi^sá^me^gie^la",
"weight": 5000.0
}
]
}
]
}
This is the raw output from the API server. Comments on the output:
#
: primary hyphenation point (usually a word boundary)^
: secondary hyphenation point0.0
being the best0.0
), and one from the pattern-based fallback (weight 5000.0
or higher). For unrecognised misspellings or unknown words, only the pattern-based fallback is provided.curl -s -X POST -H 'Content-Type: application/json' \
-i 'https://api-giellalt.uit.no/hyphenation/hyphenator-gt-desc' \
--data '{"text": "mun hálan davvisámegiela"}' |\
grep '{' |\
jq '.results[].hyphenations | map(select(.value)) | first'
Comment:
jq
filtering to only retain the most likely hyphenation pattern, with weightsOutput:
{
"value": "mun",
"weight": 0.0
}
{
"value": "há^lan",
"weight": 0.0
}
{
"value": "dav^vi#sá^me#gie^la",
"weight": 0.0
}
The same example, but now with a misspelling; notice the change in weight for the last word:
curl -s -X POST -H 'Content-Type: application/json' \
-i 'https://api-giellalt.uit.no/hyphenation/hyphenator-gt-desc' \
--data '{"text": "mun hálan davvisámegiellla"}' |\
grep '{' |\
jq '.results[].hyphenations | map(select(.value)) | first'
Output:
{
"value": "mun",
"weight": 0.0
}
{
"value": "há^lan",
"weight": 0.0
}
{
"value": "dav^vi^sá^me^giell^la",
"weight": 5000.0
}
If you only want the hyphenated input text, and not the json
stuff, use the following jq
filtering:
curl -s -X POST -H 'Content-Type: application/json' \
-i 'https://api-giellalt.uit.no/hyphenation/hyphenator-gt-desc' \
--data '{"text": "mun hálan davvisámegiela"}' |\
grep '{' |\
jq '.results[].hyphenations | map(select(.value).value) | first'
Output:
"mun"
"há^lan"
"dav^vi#sá^me#gie^la"
Add -r
/--raw-output
to jq
if you want to get rid of the quotes:
curl -s -X POST -H 'Content-Type: application/json' \
-i 'https://api-giellalt.uit.no/hyphenation/hyphenator-gt-desc' \
--data '{"text": "mun hálan davvisámegiela"}' |\
grep '{' |\
jq -r '.results[].hyphenations | map(select(.value).value) | first'
Output:
mun
há^lan
dav^vi#sá^me#gie^la
If you have a text file that you would like to have hyphenated, do as follows:
cat textfile.txt |\
(printf '{"text": "' && cat && printf '"}') |\
curl -s -X POST -H 'Content-Type: application/json' \
-i 'https://api-giellalt.uit.no/hyphenation/hyphenator-gt-desc' \
--data @- |\
grep '{' |\
jq '.results[].hyphenations | map(select(.value).value) | first'
Comments:
printf
stuff after the initial cat
is there to wrap the file content in a simple json
structure, as that is what is expected on the other end.-r
/--raw-output
to jq
if you want to get rid of the quotes (cf above)Output (assuming the textfile.txt
file has the same content as the example sentence used above):
"mun"
"há^lan"
"dav^vi#sá^me#gie^la"