What is a rulebase?

Save PDF

Last Updated: May 13, 2026
2 minute read

Semaphore
Documentation

A Rulebase is an XML document which contains rules which Classification Server (CS) uses to classify documents.

Typically many rulebase files are combined into a single rulebase pak file (rather like a zip file) which is sent to CS via some form of a publish request.

This is normally done by the publisher component of semaphore however utilities like ispack and isunpack are available which allow manual packing / unpacking of pak files to access the individual rulebase files if required.

CS itself will use all rulebase files (or paks of them) which have been published to a particular CS instance (These may be published in groups called publish sets which makes it simpler to activate / deactivate a particular named set of rulebase files / pak files).

All rules from these published rulebases will be processed into a network (or graph) of rules which is then used by CS for each individual classification. Because the entire network of rules is used every time it means that the decision of which rules go into which rulebase file is very much a matter of taste/convenience.

This video explains how Semaphore automatically generates and manages rulebases from a semantic model, turning concepts and labels into executable classification logic without manual rule authoring.

Note: To see the video description, resources, and list of links on YouTube, hover over the video player and click the video title at the top.

A sample Rulebase

<?xml version="1.0">
<rulebase language="english" >
   <content>
       <category class="TEST" name="test1" >
          <text data="test" />
      </category>
  </content>
</rulebase>

The full details of each line is given in the details for the rules in other sections in this document.

However as a quick overview.

rulebase - this is just a containing XML element since all XML documents must consist of a single element. It is not a rule and is ignored by CS when building the network of rules. However it is a useful place to write default attributes which you want to apply to all rules in the rulebase. So in this case we have used language=“English” which means that the language for all rules in the rulebase will be “English” (See here for details on attributes).

content - no-one is quite sure why this element is used. It is ignored by CS (except for inheritance of attributes). However all existing rulebases seem to have a content element. Possibly there was some use for this in the past.

category - The category rule (or alternatively the category attribute) is the rule which determines CS classification output - if the score for a category rule is greater than the threshold for the request a meta is added to the output.

text - a rule which will fire if the data “test” is found in the document.

So the aim of the above rulebase is to add


   ...
   <META name="TEST" value="test1" score="1.00" />
   ...

to the response for the classification request if (and only if) the document being classified contains the word “test”

Semaphore Classification Server Rulebase Reference