Chapter 6. Command Line Reference

Table of Contents

Order of argument sources
vislcg3
cg-conv
cg-comp
cg-proc
cg-strictify
cg3-autobin.pl

A list of binaries available and their usage information.

Order of argument sources

If command line arguments come from multiple sources, they are applied in this order, with later values overriding prior: CMDARGS, environment variable CG3_DEFAULT, arguments passed on the command line, CMDARGS-OVERRIDE, environment variable CG3_OVERRIDE.

vislcg3

vislcg3 is the primary binary. It can run rules, compile grammars, and so on.

Usage: vislcg3 [OPTIONS]

Environment variable:
 CG3_DEFAULT: Sets default cmdline options, which the actual passed options will override.
 CG3_OVERRIDE: Sets forced cmdline options, which will override any passed option.

Options:
 -h, --help                 shows this help
 -?, --?                    shows this help
 -V, --version              prints copyright and version information
     --min-binary-revision  prints the minimum usable binary grammar revision
 -g, --grammar              specifies the grammar file to use for disambiguation
     --grammar-out          writes the compiled grammar in textual form to a file
     --grammar-bin          writes the compiled grammar in binary form to a file
     --grammar-only         only compiles the grammar; implies --verbose
     --ordered              (will in future allow full ordered matching)
 -u, --unsafe               allows the removal of all readings in a cohort, even the last one
 -s, --sections             number or ranges of sections to run; defaults to all sections
     --rules                number or ranges of rules to run; defaults to all rules
     --rule                 a name or number of a single rule to run
     --nrules               a regex for which rule names to parse/run; defaults to all rules
     --nrules-v             a regex for which rule names not to parse/run
 -d, --debug                enables debug output (very noisy)
 -v, --verbose              increases verbosity
     --quiet                squelches warnings (same as -v 0)
 -2, --vislcg-compat        enables compatibility mode for older CG-2 and vislcg grammars
 -I, --stdin                file to read input from instead of stdin
 -O, --stdout               file to print output to instead of stdout
 -E, --stderr               file to print errors to instead of stderr
     --no-mappings          disables all MAP, ADD, and REPLACE rules
     --no-corrections       disables all SUBSTITUTE and APPEND rules
     --no-before-sections   disables all rules in BEFORE-SECTIONS parts
     --no-sections          disables all rules in SECTION parts
     --no-after-sections    disables all rules in AFTER-SECTIONS parts
 -t, --trace                prints debug output alongside normal output; optionally stops execution
     --trace-name-only      if a rule is named, omit the line number; implies --trace
     --trace-no-removed     does not print removed readings; implies --trace
     --trace-encl           traces which enclosure pass is currently happening; implies --trace
     --deleted              read deleted readings as such, instead of as text
     --dry-run              make no actual changes to the input
     --single-run           runs each section only once; same as --max-runs 1
     --max-runs             runs each section max N times; defaults to unlimited (0)
     --profile              gathers profiling statistics and code coverage into a SQLite database
 -p, --prefix               sets the mapping prefix; defaults to @
     --unicode-tags         outputs Unicode code points for things like ->
     --unique-tags          outputs unique tags only once per reading
     --num-windows          number of windows to keep in before/ahead buffers; defaults to 2
     --always-span          forces scanning tests to always span across window boundaries
     --soft-limit           number of cohorts after which the SOFT-DELIMITERS kick in; defaults to 300
     --hard-limit           number of cohorts after which the window is forcefully cut; defaults to 500
 -T, --text-delimit         additional delimit based on non-CG text, ensuring it isn't attached to a cohort; defaults to /(^|\n)</s/r
 -D, --dep-delimit          delimit windows based on dependency instead of DELIMITERS; defaults to 10
     --dep-absolute         outputs absolute cohort numbers rather than relative ones
     --dep-original         outputs the original input dependency tag even if it is no longer valid
     --dep-allow-loops      allows the creation of circular dependencies
     --dep-no-crossing      prevents the creation of dependencies that would result in crossing branches
     --no-magic-readings    prevents running rules on magic readings
 -o, --no-pass-origin       prevents scanning tests from passing the point of origin
     --split-mappings       keep mapped readings separate in output
 -e, --show-end-tags        allows the <<< tags to appear in output
     --show-unused-sets     prints a list of unused sets and their line numbers; implies --grammar-only
     --show-tags            prints a list of unique used tags; implies --grammar-only
     --show-tag-hashes      prints a list of tags and their hashes as they are parsed during the run
     --show-set-hashes      prints a list of sets and their hashes; implies --grammar-only
     --dump-ast             prints the grammar parse tree; implies --grammar-only
 -B, --no-break             inhibits any extra whitespace in output
    

cg-conv

cg-conv converts between stream formats. It can currently convert from any of CG, Niceline CG, Apertium, HFST/XFST, and plain text formats, turning them into CG, Niceline CG, Apertium, or plain text formats. By default it tries to auto-detect the input format and convert that to CG. Currently only meant for use in a pipe.

Usage: cg-conv [OPTIONS]

Environment variable:
 CG3_CONV_DEFAULT: Sets default cmdline options, which the actual passed options will override.
 CG3_CONV_OVERRIDE: Sets forced cmdline options, which will override any passed option.

Options:
 -h, --help          shows this help
 -?, --?             shows this help
 -p, --prefix        sets the mapping prefix; defaults to @
 -u, --in-auto       auto-detect input format (default)
 -c, --in-cg         sets input format to CG
 -n, --in-niceline   sets input format to Niceline CG
 -a, --in-apertium   sets input format to Apertium
 -f, --in-fst        sets input format to HFST/XFST
 -x, --in-plain      sets input format to plain text
     --add-tags      adds minimal analysis to readings (implies -x)
 -C, --out-cg        sets output format to CG (default)
 -A, --out-apertium  sets output format to Apertium
 -F, --out-fst       sets output format to HFST/XFST
 -M, --out-matxin    sets output format to Matxin
 -N, --out-niceline  sets output format to Niceline CG
 -X, --out-plain     sets output format to plain text
 -W, --wfactor       FST weight factor (defaults to 1.0)
     --wtag          FST weight tag prefix (defaults to W)
 -S, --sub-delim     FST sub-reading delimiters (defaults to #)
 -r, --rtl           sets sub-reading direction to RTL (default)
 -l, --ltr           sets sub-reading direction to LTR
 -o, --ordered       tag order matters mode
 -D, --parse-dep     parse dependency (defaults to treating as normal tags)
     --unicode-tags  outputs Unicode code points for things like ->
     --deleted       read deleted readings as such, instead of as text
 -B, --no-break      inhibits any extra whitespace in output
    

cg-comp

cg-comp is a lighter tool that only compiles grammars to their binary form. It requires grammars to be in Unicode (UTF-8) encoding. Made for the Apertium toolchain.

USAGE: cg-comp grammar_file output_file
    

cg-proc

cg-proc is a grammar applicator which can handle the Apertium stream format. It works with binary grammars only, hence the need for cg-comp. It requires the input stream to be in Unicode (UTF-8) encoding. Made for the Apertium toolchain.

USAGE: cg-proc [-t] [-s] [-d] [-g] [-r rule] grammar_file [input_file [output_file]]

Options:
        -d:      morphological disambiguation (default behaviour)
        -s:      specify number of sections to process
        -f:      set the format of the I/O stream to NUM,
                   where `0' is VISL format, `1' is
                   Apertium format and `2' is Matxin (default: 1)
        -r:      run only the named rule
        -t:      print debug output on stderr
        -w:      enforce surface case on lemma/baseform
                   (to work with -w option of lt-proc)
        -n:      do not print out the word form of each cohort
        -g:      do not surround lexical units in ^$
        -1:      only output the first analysis if ambiguity remains
        -z:      flush output on the null character
        -v:      version
        -h:      show this help
    

cg-strictify

cg-strictify will parse a grammar and output a candidate STRICT-TAGS line that you can edit and then put into your grammar. Optionally, it can also output the whole grammar and strip superfluous LISTs along the way.

Usage: cg-strictify [OPTIONS] <grammar>

Options:
 -?, --help       outputs this help
 -g, --grammar    the grammar to parse; defaults to first non-option argument
 -o, --output     outputs the whole grammar with STRICT-TAGS
     --strip      removes superfluous LISTs from the output grammar; implies -o
     --secondary  adds secondary tags (<...>) to strict list
     --regex      adds regular expression tags (/../r, <..>r, etc) to strict list
     --icase      adds case-insensitive tags to strict list
     --baseforms  adds baseform tags ("...") to strict list
     --wordforms  adds wordform tags ("<...>") to strict list
     --all        same as --strip --secondary --regex --icase --baseforms --wordforms
    

cg3-autobin.pl

A thin Perl wrapper for vislcg3. It will compile the grammar to binary form the first time and re-use that on subsequent runs for the speed boost. Accepts all command line options that vislcg3 does.