- Date: 10.05.2005
- Time: 10.00 Norw. time
- Place: Wherever we are :-)
- Tools: Phone, iChat, SubEthaEdit
- Opening, agenda review
- Reviewing the task list from a week ago
- Documentation - divvun.no
- Corpus gathering
- Corpus infrastructure
- Term db
- Other issues
- SEE License number
- Sjur going to Joensuu
- Next week + gathering
- Summary, task lists
1. Opening, agenda review, participants
Opened at 10.06. Agenda accepted as is.
Present: Maaren, Sjur, Thomas, Tomi, Trond, Børre
Main secretary: Tomi
2. Reviewing the task list from the last meeting
- Thomas, Maaren, Børre: Translate divvun front page
- Børre has made template pages and Thomas has done his part of the work
- Trond, Tomi, Børre, Sjur: Continue with corpus infrastructure
- Not much has happened since our breakthrough.
- Scripts for word > xml (docbook), xml > gtxml (db to our xml)
- Template for manual additions and script for feeding xml to pp are still
- Will find out who can help us with corpus agreement contracts
in Oslo when Ruth Vadtvedt Fjeld is away.
- The Bokmål part of Leksikografisk institutt at hf in Oslo hired two persons
to collect texts, cf. a presentation at
- Forskingsassistent Cecilie Hauglund samlar inn tekstar, Programmerar
Elisabeth Lien programmerer og handsamar dei vidare.
- Work with Børre on the giellatekno
- Call Kimmo about license text & e-mail from Børre
- Tried last week, no contact, will try again this week, and have a meeting on
Friday if possible
- Also discuss kwicsnt & UTF-8 problem
- More termdb work
- Editing part is completed
- Contact Anne Britt about divvun.no and the Språkstyremøte.
- She suggested that the Sámi Parliament Council sent out a press release
rather than Språkstyret
- Ask Kåre Tjikkom, Harriet Aira, Karin Tuolja, Susanna Kuoljok-Angeus and
Samuel Gælok about Lule Sámi text.
- Contact Leif Åge adding Trond to divvun e-mail alias.
- Prepare the war file for Tomcat deployment next week
- Done. Thor Øivind Johansen will contact Sjur for details about tomcat
- Work with Trond on the giellatekno pages
- Agreed on what to include, work continues this week
- Finish the work on the Termdb
- Done. Modified the skin, converted South Sámi grammar to xml (from Word
- Contact Skolelinux about their webcrawler, also Knut Hofland
- Decide the directory structure for corpus originals
- Proposal: Keep the structure as we get it from the donor
- Script for antiword and UTF-8 fix -processing
- Done. Still some UTF-8 fixes probably coming..
- Template for xsl manual conversion script
- Script for processing the corpus for preprocessor
- Not done yet, should be straightforward kind of task.
- Thomas: continue with verbs and ask Teemu Leskinen for finnish parallel
- Reached verb nr 7000, still 5000 verbs left
- Teemu Leskinen: has sent files, with Finnish parallel names in place. Finnish
parallel names not coupled with the Sámi names
- Worked on proper nouns, corrected the classification of 1747 names…
- Have a look at the closed classes, especially the indefinite pronouns.
- Work more on lexical issues.
3. Documentation - divvun.no
Thor Øivind Johansen will contact Sjur about Forrest integration on the
More content as we progress.
4. Corpus gathering
No new responses, but the relevant persons at HF/UiO identified, and will be
contacted. See above.
5. Corpus infrastructure
Major parts in place. Still missing: gtxml to preprocess, and template for
manual additions and script for processing corpus for preprocessor are still
- Could we use transitivity as a tag for disambiguation? Yes.
- What about removing adverbs that are straightforwardly generated from adjectives,
like +A+Adv? Perhaps by doing it
7. Term db
Progressing towards internal/Sámediggi opening.
8. Other issues
8.1 SEE License number
Received and entered.
8.2 Sjur going to Joensuu
Closing meeting for the Nordic Programme for Language technology. Sjur will talk
about the SALETEK network. Will be away Wednesday and Thursday, perhaps Friday next
week (May 18.-20.).
8.3 Next week + gathering
Next week: No meeting due to holidays and Sjur going to Joensuu
- Programme: what to do
- Programming content:
- Change to 10.4. Open issues within bash (upgrade to 3.00), readline, etc.
- Forrest and i18n
- Make sure video and voice conferencing is working through the Sámediggi firewall
- Linguistic content:
- Julevsáme: Go through the transducer, plan future work (Thomas, Trond, …)
- Hard issues: Oslo>oslolas1, Kárás1johlas1, case agreement within numerals
- Evaluate first half year
- Discuss issues/improvements
- Practical things:
- Where to stay? Villmarkssenteret
- Who shall order rooms? Maren can do it
- Sjur: Tuesday - Saturday
- Trond: Tuesday - Friday
- Tomi: Tuesday - Saturday
- Børre: Tuesday - Saturday
- Schedule. Wednesday, thursday, friday, full days (Trond to leave
somewhat earlier on Friday) and Maren about 11.00.
- Reserve the meeting room
- Maaren has already done it a long time ago! Excellent!
We will put up the programme and other details on a separate page (we = Børre:-)
8.4 cvs-commit mailing to the team members
CVS sends out an e-mail to all members of a list each time something is commited
(to a certain module like gt)
All commits are e-mailed to all team members in both projects. Use e-mail
filtering for uncluttering the inbox.
Børre is setting it up (talk to Roy Dragseth), cvs watch command.
CVS documentation at:
9. Summary, task list
- Contact Univ. of Oslo on the contract issue.
- Finish the divvun alias
- Contact people about Lule Sámí texts
- Contact skolelinux and Knut Hofland about webcrawler.
- Here is Knut’s thoughts on the issue in 1999:
- Finish the setup of divvun.no
- Look at how forrest handles i18n, try to get work done on that.
- Set up programme and other details about the gathering in Guovdageaidnu
25th-27th of May
- Contact Roy Dragseth about cvs-commit mailing
- Finish the setup of [http://giellatekno.uit.no/]
- Template for manual xsl conversion script
- Script for processing the corpus for preprocessor
- Install “Tiger” (10.4), document all issues, together with steps to resolve
- Look at the twolc part of how to handle shortening of the middle part in
three part compounds. Discuss the data / linguistic side with the linguists.
- Look at decapitalisation of proper nouns when compounded or derived to a
- Meet with Kimmo:
- Discuss text licensing, and get a copy of the Helsinki licensing model/text
- Bring up the kwic-snt issue with Kimmo, make it utf-8 compatible (now,
each byte counts as one, instead of each character counting as one (what is
needed is a counting mechanism that counts only bytes starting in 01 or 110,
and ignoring bytes starting in 10.)
- Still some work on the Termdb
- Write presentation for the Joensuu meeting
- Decide upon the public opening of divvun.no with Anne Britt
- Discuss with Thor Øivind Johansen about Forrest/divvun/Tomcat.
Tel: +47 7764 6741
- Ask Leif Åge to check the opening/forwarding of ports number needed for
iChat voice and video conferencing
- continue with proper nouns, work on the Bugzilla bug list
- reserve rooms at Villmarkssenteret
- Translate divvun.no front page to Finnish
- continue with verb valency
- Reply to Teemu Leskinen, thanking for the material but asking that we get a
new version, now with the Finnish names coupled with the Sámi names, if at
- Work with the bug list, the lexicon, disambiguation. Prepare Lule Sámi etc.
for the meeting.
- Have a look at the bug list in Bugzilla (mainly Trond’s bugs, but give
me a hand, will you?)
- Prepare the Guovdageaidnu meeting
- Order tickets to the Guovdageaidnu meeting
10. Next meeting, closing
Closed at 12.00