What is a rule?

Save PDF

Last Updated: July 8, 2026
4 minute read

Semaphore
Documentation

A rule is a node in the evaluation network or graph (specifically a directed acyclic graph) which is evaluated for each document sent for classification. Each node has the following characteristics:

Type - the type of calculation that should be done when the node is evaluated
Score - the result of the calculation as a confidence measure (between -1.00 and 1.00)
Evidence - the phrases in the document which were used to generate the score

Each node has 0 or more connections (or edges) to other nodes which are followed once the node is evaluated and those nodes are then evaluated in turn.

The evaluation calculation is always limited to considering the scores and evidence from its inbound nodes. For example, the type of the node might be a min. In this case, its score is the minimum score of its inbound nodes, and its evidence will be the union of evidence of its inbound nodes which have that minimum score.

In practice, it is often easier to think of the rule as an XML entity in a rulebase file since that is how rules are written. In this case, rules have the following characteristics:

A name (which gives the approximate type of the rule node)
Attributes (which may vary the type of the rule)
Parents/children which are used to define the majority of edges for the graph

The XML parent/child relationship is used in a natural way to express the direction of evaluation. For example:

  <min>
      <text data="test" foreach="1" weight="10" />
      <text data="test2" foreach="1" weight="10" />
  </min>

forms the graph

      MIN
     /   \
   TEXT  TEXT

After text rules are evaluated, the min rule is evaluated, and it will select whichever of those text rules has the least occurrences of their data in the document. For details of these attributes and calculations, see the documentation for the individual rules and attributes.

By using the XML parent/child relationship and expressing much of the information in attributes, we can simplify the writing of the rulebase by inheriting attributes from our parent rule.

    <min stem="1" >
      <text data="test" foreach="1" weight="10" />
      <text data="test2" foreach="1" weight="10" />
    </min>

We have saved some typing by specifying the stem attribute on the min rule. The attribute stem="1" does not have nay affect on the min rule itself. However, since it is an inherited attribute, it will apply to the 2 child text rules where stem="1" changes the type of calculation for the rule (matches the stem or lemma form of the word).

To express evaluation relationships or edges which are not modeled in XML, we use a rule called link. Each node may have a label attached using the label attribute and a link rule will just create an edge to node(s) with a matching label for evaluation exactly as would happen if the rule was a child. For example:

   <min>
      <link label="our first rule" />
      <link label="our second rule" />
   </min>
   <text data="test" foreach="1" weight="10" label="our first rule" />
   <text data="test2" foreach="1" weight="10" label="our second rule" />

This creates exactly the same graph as the non-labeled version. The only significant difference is that inheritance of attributes will not occur across link rules. That is, in the previous example, we had stem="1" on the min rule which was inherited by its text rule children. This inheritance will not happen if a label is used to create the edge in the graph.

Occasionally, the fact that the rules are a network/graph rather than a simple tree does make a difference. However, for the majority of cases, thinking of extra edges (non-parent/child edges) in terms of a link to a labeled rule works as a mental model.

When an XML entity isn’t a Rule?

In most cases, there is a one-to-one relationship between an xml entity and a rule node in the graph. However, some XML rules are expanded to multiple nodes. For the majority of time, this complication happens automatically and so may be ignored as an implementation detail.

For example:

    <text data="Multiple words" />

This rule is invalid since a text rule searches for a matching single token in the document. Multiple tokens are matched by some form of the sequence rule which provides control over handling skipped tokens (punctuation) and whether the sequence is valid across sentences.

However, in general, knowing how some text is tokenized is not trivial and may even vary depending on which tokenizer the CS is configured to use. So instead of creating an error here, it’s much easier to get CS to rewrite the rule automatically during publish. CS will change the above to:

    <sequence>
         <text data="Multiple" />
         <text data="words" />
    </sequence>

This expansion is currently limited to one-to-many relationships: a single XML entity may be replaced by multiple rule nodes if required. CS does not currently optimise the graph which would involve a many-to-one rewriting of the XML.

Optimisation would make debugging some rules rather tricky (but has been considered since it could save significant processing time) so for now rewrites are restricted to expansion only. As such the expansion may (in almost all cases) be ignored since the expanded rules should behave "just like" the original rule. As in the example given above the <text> rule is actually invalid, but by rewriting the resulting 3, nodes will act as if it were a valid text rule.

Semaphore Classification Server Rulebase Reference

What is a rule?

Table of Contents

What is a rule?

When an XML entity isn’t a Rule?