GiellaLT

GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started, and our Privacy document.

View GiellaLT on GitHub

SVN reorganisation

Cause

We’re not able to restrict access to the svn repository the way we want. We wanted to preserve the present folder structure, but restrict access to certain folders. That has turned out to be very hard.

Solution properties

Any solution will have to isolate the folders to protect in one way or another. That is, we should move the content we want to protect to a protected area outside of the general area.

There are two types of content we need to protect:

Two options as I see it:

Option 1

$SVNROOT/gt/
         xtdoc/
         private/ <== protected
         tts/somefolder <== unprotected
            /someotherfolder <== protected
         ...

svn co https://gtsvn.uit.no/repos gives us all the open ones. To check out private parts we have to do svn co https://gtsvn.uit.no/repos/tts/someotherfolder

Option 2

$SVNROOTPRIVATE/plans
                polderland/
$SVNROOTPUBLIC/gt/
               tts/

svn co https://gtsvn.uit.no/repos/public gives us the free ones.

svn co https://gtsvn.uit.no/repos/private gives us:

plans
polderland
...
$SVNROOT/public/gt/
                xtdoc/
                st/
                ...
         /private/ <== protected

trond/gtsvn/public/ –> the world as we know it without the private /private/ –> the private, taken out of the public

an alisas for checking out both for the privileged class.

The former makes it possible to treat the whole thing as one unit … with or without the forbidden files

The second opinion is, in a way safer. Less risk of malfunction (“oops, you got the whole lot”)

$SVNROOT/scripts/ (here are the tools for the kal-gang and for the rest of us)
         langs/ara -> {dic, mt, fst, tts, ...}
               /bul
               /kal
               /kom
               /... --> lg
         ped/sma/
             sme/
             xxx/
         biggies
         ...
         private/

Pseudocode for a possible setup:

svn co .../repos/scripts/
cd scripts
./setup --folders=gt,tts,private, ...

Files relevant to all (script etc) aside, for all big files aside

two svn ones svn with private a first-generaton mother and one sister ones svn with private a first-generaton mother and many sisters ones svn with private a first-generaton mother and two sisters: the default, and the big

first choole lg, then do all apps (more natural for linguists) first choose app, the do all lgs (better struct for porting?)

(as 54, but without the public folder)

Subinterests as a grid: acc to lg acc to app/

procon analysis pro a: uniform treatment for insiders and outsiders alike con a: vulnerable and difficult, conceptually complex pro b: clean, safe con b: two systems pro one (hassle for the insiders, that is)

$GTPUBLICHOME $GTPRIVATEHOME

Use these to reference the path to each local copy.

1,4G    techdoc
402M    prooftools
354M    words
344M    tts
340M    gt
196M    st
156K    CVSROOT
134M    termdb
102M    mt
 89M    plan
 84M    xtdoc
 65M    ped
 22M    kt
 20M    kvensk
1,7M    sfst
1,3M    tca2


gt+st+kt = 560M + kvensk = 580M


techdoc/
932K    ling
924K    raw
856K    infra
264K    retteprog-plan.pdf
228K    system
136K    architecture.jpg
 48K    dicts
 44K    site-proof-frag.xml
 44K    mt
 32K    site-frag.xml
 24K    ped
 20M    admin
 14M    lang
 12K    index.xml
8,0K    antiword.man
4,0K    tabs-frag.xml
4,0K    howtos.xml
4,0K    header_draft.txt
4,0K    docu-cvs-sys.html
4,0K    corpus_tags_explained.txt
4,0K    corpus_tags.txt
2,7M    tools
1,4G    proof
1,3M    presentations


gt:
280 608k  sme
 23 260k  script
 21 780k  smj
  7 692k  sma
  3 448k  tmp
  2 788k  common
  1 776k  src
  1 448k  smn
    872k  sms
    748k  oarjjelsamien.txt
    552k  smi
    424k  sjd
    388k  moses
    272k  sje
    188k  cwb
    180k  mk-files
    128k  dtd
    108k  Makefile
     52k  Dan-le-danna-infonuorra.correct.doc
     24k  www
     24k  20061130_NSR_bypolitisk_plan_samisk.pdf.xml
     20k  LISENS.txt
     16k  Dan-le-danna-infonuorra.correct.doc.xml
     12k  Markansluska.correct.doc.xml
      4k  userdict
      4k  openkjeldekodekunngjering.txt
      4k  downloadwinlex.sh
      4k  copymaclex.sh


sme:
113 468k  bin
 87 484k  corp
 45 196k  hunspell
 15 040k  src
  9 572k  polderland
  7 708k  int
    628k  tag-stat-temp.txt
    576k  aspell
    468k  art
    316k  testing
     88k  res
     28k  dev

Conclusion: we want the following three separate modules:

Two questions left:

Concerns:

tts ped proof fst mt

lang/ara /bul /fao ped tts fst <– …

ped: <========= kept as ped/ unless stated otherwise 17 040k art <========== to art/ 13 816k images 11 296k userdoc 9 188k oahpa 6 320k sme <========== to lang/sme/ped 5 720k doc 1 280k nob 1 212k src 608k smj 172k sma 60k fin 52k adm 48k dtd

Ped-specific: documentation images

sme:

Migration plan

TODO:

inform all users about the upcoming change

make a copy of the svn repo, work on another machine than gtsvn

follow the instructions in the svn book, section

Filtering Repository History

when everything works ok, then:

inform all users about the upcoming change - again!

inform all users to commit local edits

redo the split with all the latest commits

test that it works

inform users about how to check out and set up their infra

Sitemap