GiellaLT

GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology.

View GiellaLT on GitHub

Page Content

Gut Usage Examples

How to use gut for various operations on many git repositories at once.

NB! Note that many of the commands require at least admin access to the affected repos. Basic git operations like clone, pull, commit and push only requires read and (for push) write access.

Tip 1: git supports way more commands and operations than gut. To apply a non-supported git command to a set of repos, write a simple shell script for the git command you need, and run it using gut apply -r reporegex --script path/to/script.sh. A number of example scripts can be found in giella-core/devtools/gut-scripts/.

Tip 2: when initialising gut, specify your default organisation, so that you don’t have to write it out for each command.

Reponame regexes

The core of gut is to run git commands over a set of repos with reponames matching a regex. gut supports “extended” regexes, so one can easily match complex patterns if needed. In daily use very simple regexes are usually enough. ^ and $ are bound to the beginning and end of the reponame.

Some examples:

If a regex (ie -r) is not specified, the command will match all local repos for commands acting locally, and all repos in a GitHub organisation for commands that asks GitHub for matches.

Regexes are case insensitive.

Task 1: Initialise gut

To set up gut for the first time, with giellalt as your default organisation (so you don’t have to specify it for every gut operation, do as follows (remember to have your GitHub Peronal access token available):

gut init --root /path/to/your/gut/root/dir --token PERSONALACCESSTOKEN \
    --organisation giellalt --use-https

Using https is easiest to set up but less secure.

To use the git/ssh protocol instead, you need to set up an ssh key for GitHub. Follow these instructions.

Non-admin tasks

Task 2: Clone many repos

The very basic task of getting started:

gut clone -o giellalt -r ^lang

This will clone all repos in the giellalt org matching the regular expression ^lang in the repo name. Use option -u to clone using the https protocol instead of ssh/git:

gut clone -u -o giellalt -r ^lang

Task 3: Pull many repos

To pull all repos you have cloned, do this:

gut pull -o giellalt

And if you have defined giellalt as your default GitHub organisation, this can be shortened to:

gut pull

Task 4: See status of many repos

To see the status of all Sámi languages, both keyboard and language model repos, do as follows:

gut status -o giellalt -r '^[kl].*-s[jm]'

The result could be like this:

+--------------------------------------------------------+
| Repo                 branch     ±origin  U  D  M  C  A |
+========================================================+
| keyboard-sjd         main             0  0  0  0  0  0 |
| keyboard-sje         main             0  0  0  0  0  0 |
| keyboard-sju         main             0  0  0  0  0  0 |
| keyboard-sma         main             0  0  0  0  0  0 |
| keyboard-sme         main             0  0  0  0  0  0 |
| keyboard-smj         main             0  0  0  0  0  0 |
| keyboard-smn         main             0  0  0  0  0  0 |
| keyboard-sms         main             0  0  0  0  0  0 |
| lang-sjd             main             0  0  0  0  0  0 |
| lang-sje             main             0  0  0  0  0  0 |
| lang-sjt             main             0  0  0  0  0  0 |
| lang-sju-x-sydlapsk  main             0  0  0  0  0  0 |
| lang-sma             main             0  0  0  0  0  0 |
| lang-sme             main            -9  0  0  0  0  0 |
| lang-smj             main            -1  0  0  0  0  0 |
| lang-smn             main             0  0  0  0  0  0 |
| lang-sms             main             0  0  0  0  0  0 |
| ================                                       |
| Repo Count           Dirty   fetch/push  U  D  M  C  A |
| 17                   0                2  0  0  0  0  0 |
+--------------------------------------------------------+

The table should be read as follows:

Task 5: Commit in many repos

gut commit -o giellalt -r ^lang- -m "Your commit message"

It is ok for the regex to match repos with no changes, gut will just skip them with a message that nothing was changed.

Multiline commit message

gut does accept multiline commit messages. You write them on the command line, starting with the opening quote, entering each line as you go. The important thing is to NOT type the closing quote until the whole message is finished.

You can use this to add a note to skip CI, ie for commits that are non-substantial - no reason to kick of many tens of parallel builds if the changes are minimal. You do this by having the string [skip ci] on a line by itself:

gut commit -r ^lang- -m "Commit message

[skip ci]
"

NB! You need another empty line after this string, or it won’t trigger the non-CI thing.

Task 6: Push all local changes

gut push -o giellalt

It is ok for the regex to match repos with no commits, they will be skipped in the push.

Admin tasks

Task 7: Add a new language

Description moved to a separate page.

Task 8: Update repos from template

Description moved to a separate page.

Task 9: Manage topics, info

Set topics

gut topic set -o giellalt -r "lang-" -t finite-state-transducers constraint-grammar minority-language nlp proofing-tools language-resources

Add more topics

Add one more topic to a subset of the languages:

gut topic add -o giellalt -r "lang-(s|cr)" -t indigenous-languages

Specify website

gut set info -o giellalt -r "(lang-|giella-)" -w https://giellalt.uit.no

Task 10: Make repo(s) public/private

gut make -o giellalt -r "(lang-|giella-)" private

Task 11: Set description dynamically

Use a script to generate the content, including dynamic parts that varies with the repo name, and use the script as follows:

gut set info -o giellalt -r '^lang-' --des-script giella-core/devtools/gut-scripts/reponame2description.sh

NB! Make sure there is no trailing newline at the end of the output of the script, or it will fail. That is, use printf, not echo.

Task 12: Create team with users

gut create team -o giellalt -t "Kainun kieli" \
-d "Team for working with the kveen language." -m Trondtr snomos

Task 13: Add users to existing team

gut add users -o giellalt -t giellaltstaff -u ilm024 leneantonsen

Task 14: Add webhook

gut hook create -m json -o giellalt -r 'lang-' \
-s giella-core/devtools/gut-scripts/reponame2webhook.sh \
-e "*"

Based on experience, it is not advisable to send off all events, at least not if the recipient is IRC, Zulip and similar community tools. The following is a more restricted version that should provide a reasonably balance between staying up-to-date and not being spammed:

gut hook create -m json -o giellalt -r 'lang-smj' \
-u 'https://giella.zulipchat.com/api/v1/external/github?api_key=SECRETKEY&stream=smj' \
-e branch_protection_configuration -e branch_protection_rule \
-e check_run -e code_scanning_alert -e commit_comment -e create \
-e delete -e dependabot_alert -e deploy_key -e discussion \
-e discussion_comment -e fork -e gollum -e issue_comment \
-e issues -e label -e member -e membership -e merge_group \
-e milestone -e organization -e package -e ping -e project \
-e project_card -e project_column -e public -e pull_request \
-e pull_request_review -e pull_request_review_comment \
-e pull_request_review_thread -e push -e release -e repository \
-e repository_advisory -e repository_dispatch -e repository_import \
-e repository_vulnerability_alert -e secret_scanning_alert \
-e secret_scanning_alert_location -e security_advisory \
-e security_and_analysis -e star -e team -e team_add -e watch

This command is most powerful when used together with a script, to set a webhook with dynamic properties (e.g. based on reponame) for a large number of repos at once:

gut hook create -m json -o giellalt -r 'lang-' \
--script giella-core/devtools/gut-scripts/reponame2webhook.sh \
-e branch_protection_configuration -e branch_protection_rule \
-e check_run -e code_scanning_alert -e commit_comment -e create \
-e delete -e dependabot_alert -e deploy_key -e discussion \
-e discussion_comment -e fork -e gollum -e issue_comment \
-e issues -e label -e member -e membership -e merge_group \
-e milestone -e organization -e package -e ping -e project \
-e project_card -e project_column -e public -e pull_request \
-e pull_request_review -e pull_request_review_comment \
-e pull_request_review_thread -e push -e release -e repository \
-e repository_advisory -e repository_dispatch -e repository_import \
-e repository_vulnerability_alert -e secret_scanning_alert \
-e secret_scanning_alert_location -e security_advisory \
-e security_and_analysis -e star -e team -e team_add -e watch

More information about the various webhook events can be found in the GitHub Documentation.

Task 15: Add external repo

There are a lot of FST descriptions of languages out there, one major such source is Apertium. But most of these projects do not make spelling checkers or many other tools based on their morphological description. Since we have the infrastructure and the tools in place to make all languages work, it might be useful to just take those repos, and compile their fst within our infra, and from there make spellers, tokenisers, and a lot of other stuff.

We use git subtree for adding external repos. To do that, add a new language as follows:

  1. create a new language repo as shown above
  2. add the external source using git subtree as follows:
git subtree add --prefix src/fst/morphology/ext-Apertium-nno \
https://github.com/apertium/apertium-nno.git master --squash
  1. Modify src/fst/morphology/Makefile.am as needed to make everything build

When you later want to update the code from the external repository, do as follows:

git subtree pull --prefix src/fst/morphology/ext-Apertium-nno \
https://github.com/apertium/apertium-nno.git master --squash

Task 16: Set team access permission

NB! Requires owner permission by the user doing this!

gut set permission -o giellalt -p push -t GiellaLTstaff 

Result:

NB! Repos not earlier assigned to the team will silently be added!