GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.
As we continue to move to GitHub, we also need to update our documentation infrastructure. The basic ideas are as follows:
gawk -f $GIELLA_CORE/scripts/jspwiki2md.awk WhatIsThis.jspwiki \
> WhatIsThis.md
Or complete loop for all files and some cleanup:
find . -name "*.jspwiki" | while read i; do \
gawk -f ../../giella-core/scripts/jspwiki2md.awk $i \
| awk 'BEGIN{RS="";ORS="\n\n"}1' \
| perl -p -e 'chomp if eof' > $i.tmp; \
mv -f $i.tmp $i; \
done
The files are still named *.jspwiki
. Now commit the changed files, then rename
as follows:
find . -name "*.jspwiki" \
| while read i; do mv $i ${i%.jspwiki}.md; done
Commit the renames.
By doing content change and rename in two steps with commits in between, there is
a greater chance that document history will be preserved (document history is one
of the biggest pain points in git
).
Must be done in two steps:
Also, to have a chance to retain document history across renames, you have to do content change and document renaming as two distinct operations, due to git
s unwillingness to track files. That is, do as follows:
It is still hard for git
to track the file history across file renames, but doing it this way there is at least some hope of retaining it.
The baisc, single file commands are:
saxonXSL -s:docu-smj-lex.xml \
-xsl:$GIELLA_CORE/devtools/forrest_xml2plain_html.xsl \
> test.html
pandoc -f html -t gfm test.html -o test.md
Information on pandoc
is found at the bottom.
To process many files at a time, wrap the above commands in a for
loop or similar:
## Convert xml files to html:
find . -name "*.xml" | while read i; do \
echo $i; saxonXSL -s:$i \
-xsl:$GIELLA_CORE/devtools/forrest_xml2plain_html.xsl \
-o:$i.html; \
mv -f "$i.html" "$i" ; \
done
## git add + commit using your favorite tool
#
## Rename xml files to html:
find . -name "*.xml" | while read i; do \
mv -f "$i" "${i%.*}.html"; \
done
## git add + commit using your favorite tool
#
## Convert html files to Markdown:
find . -name "*.ht*" | while read i; do \
pandoc -f html -t gfm "$i" -o "$i.tmp"; \
mv -f "$i.tmp" "$i"; \
done
## git add + commit using your favorite tool
#
## Rename .html files to .md:
find . -name "*.html" | while read i; do \
mv -f "$i" "${i%.*}.md"; \
done
When all documents are converted, one needs to check and update links. Documentation internal links should point directly to the Markdown files (link to test.md
, not to test.html
), while external links should be complete URL’s.
Beware of html
files that should NOT be converted, e.g. speller test result pages. Such pages will be rendered as is, with the information given in the html source, using CSS, JS and everything. If the automatic processing above have turned such pages into Markdown, the change must be reversed before committing.
Install pandoc
using MacPorts, Brew or download package:
sudo port install pandoc
brew install pandoc
More info on the home page.