Installation of No Sketch Engine

Download following packages from here.

  1. As root: apt-get install libpcre3 libpcre++-dev apache2 python python-cheetah python-simplejson libltdl7
  2. As root install downloaded package: dpkg -i python-signalfd_<version>.deb
  3. As root install downloaded packages: dpkg -i finlib-<version>.deb manatee-open-<version>.deb manatee-open-python-<version>.deb bonito-<version>.deb bonito-www-<version>.deb
  4. Restart Apache server: service apache2 restart

Installation of example corpora ‘susanne’

  1. Download example corpora (manatee-open-susanne_<version>.deb) from here
  2. As root install example corpora: dpkg -i manatee-open-susanne_<version>.deb
  3. You are done, go to your <ip or localhost>/bonito.

Post-installation steps

Depending on what corpora you have and where they are located you have to edit the file $CGIPATH/run.cgi file (when installing from packages, this file is located in /var/www/bonito/run.cgi):

Adding new corpora

IMPORTANT!Corpus configuration file name must be same as it is written in config file under NAME “<corpus name> and same name must be u5sed when recompiling corpus.

Required corpora as a vertical file (.vert) and corpora configuration file!

  1. Create folders where you will keep your corpora
    1. Folder where data about corpora is stored: mkdir -p /corpora/data/<corpus name>
    2. Folder where corpora configuration files are stored: mkdir -p /corpora/registry
    3. Folder where corpora files are stored: mkdir -p /corpora/vert
  2. Copy or insert your corpus vertical file in the vert folder you created and copy or insert your corpus configuration file in the registry folder you created
    1. NOTE! corpus configuration file name should be same as your <corpus name> to avoid possible further errors
    2. EXAMPLE: “<corpus name>.conf” ← BAD!; should be just “<corpus name>”
  3. In your corpus configuration file correct following paths
    1. PATH “/corpora/data/<corpus name>”
    2. VERTICAL “/corpora/vert/<corpus name>.vert”
    3. If it exists, correct: TERMBASE “corpora/data/<corpus name>/terms-ws”
    4. Correct or delete other variables, such as TERMDEF, DYNLIB,... It's not neccessary to do this step unless you have issues.
  4. Recompile corpus with (run as root, because it needs to create a directory for log file): compilecorp [OPTIONS] CORPNAME [FILENAME]
    1. OPTIONS - you have several options listed here
    2. CORPNAME - corpus name, same name is specified in corpus configuration file, NAME "<corpus name>”
    3. FILENAME - path to corpus vert file (/corpora/vert/<corpus name>.vert); can be omitted, but specifying it might be more reliable
    4. Whole command: sudo compilecorp --recompile-corpus <corpus name> <full path to vertical>
  5. Update “run.cgi" file (by default in /var/www/bonito/run.cgi); assuming example corpus susanne was not installed, after the changes, parameters in run.cgi should look something like this
    1. OPTIONS - you have several options listed here
    2. Add your corpus to corplist
      1. corplist = [u’<corpus name>’, ’<corpus name 2>’, ’<corpus name 3>’]
    3. Change default corpus
      1. corpname = u’<corpus name>’
    4. Change path to corpus configuration files
      1. os.environ[‘MANATEE_REGISTRY’] = ‘/corpora/registry’
  6. If no errors or warnings are reported corpus has been successfully added. To add additional corpuses just add another data folder (show in step 1.a.) and repeat steps from 2. to 5.