build

build program

First, you need to tokenize the input:
(not necessary if file already “tokenized”

% cat test
repel the monkey.

% token <test >test.t

% cat test.t
repel
the
monkey
<PERIOD>

Then build the matrix (specifying a vocabulary file):
-y rowfile
-v column file
-l number; limits row to that number of occurrences
-w window size

% build -v /Data/corp/vocab <test.t

vocab is 69718 words.
mapping 0 from file 4, offset = 1695744.
unloadSparse: can’t unload a mapped matrix.
mapping cmat…
mapping 0 from file 5, offset = 1695744.
emptying the spill matrix…6 elements into 0 elements.
crab merge done; wound up at 4018089984 (should be 4018089984).

The matrix winds up in /Data/tmp/matrix.  You can look at it with ‘showmat’:

% showmat /Data/tmp/matrix

monkey   repel     the
<PERIOD   10.00    8.00    9.00
monkey    0.00    9.00   10.00
the            0.00   10.00    0.00