SHORTEST_WHEN_OVERLAPPING

Save PDF

Last Updated: July 8, 2026
3 minute read

Semaphore
Documentation

New in Semaphore 5.10.1

This rule scores its weight if any of its children score. It’s evidence is the union of it’s children’s evidence except when these overlap in which case the shortest (or leftmost in case of a tie in length) is used..

NB the score is not calculated from the children’s actual score just from whether the children score or not.

Modifying Attributes

Children restrictions

Any rule other than those only allowed a specific parent (CONDITION, ELSE, THEN and SKIP).

The DATA attribute may be used which will be expanded out to the appropriate TEXT children rules automatically

This rule has almost identical behaviour to the UNION rule except that the UNION rule will join together overlapping phrase ranges - shortest_when_overlapping will pick the shortest rather than calculating a new phrase range covering the overlap.

The best way of seeing the difference between these two rules is to consider the following example

Example 1

The following rulebase fragment:

<shortest_when_overlapping foreach="1" weight="20" >
    <phrase data="A" />
    <phrase data="A A" />
</shortest_when_overlapping>

Evaluating the following document text:

A A A

Will fire with a score of “0.49” and have 3 evidence phrase ranges attached (each “A”)

However using the UNION rule:

<union foreach="1" weight="20" >
    <phrase data="A" />
    <phrase data="A A" />
</union>

Will fire on the same text with a score of “0.20” and will have only a single evidence phrase range (“A A A”) attached.

So the difference being that <union> creates a new phrase range covering the overlap whilst <shortest_when_overlapping> picks the shortest (or left-most in the event of a tie in length) phrase range in the overlap.

In many cases this difference will be unimportant and it wouldn’t matter which rule you used but where the resulting phrase range is used to group extractions using the <union> can result in unwanted groupings being returned.

In particular <shortest_when_overlapping> is very useful in conjunction with a <near> rule - since the <near> rule returns all near groups (within the given count) these may end up overlapping and using <shortest_when_overlapping> (or <longest_when_overlapping>) can help pick the appropriate near group in a specific case while still letting the <near> apply in other cases.

<shortest_when_overlapping>
   <near count="1" >
       <text data="A" />
       <text data="B" />
   </near>
</shortest_when_overlapping>

A near B near A some more text and another A near B

Here the <near> finds both occurrences of “A near B” and also the “B near A” however the “B near A” overlaps the first “A near B” so using <shortest_when_overlapping> drops the “B near A” (since it has the same length as “A near B” but is further to the right in the document.

Example 2

The shortest_when_overlapping rule will only pick the shortest when an overlap occurs. Non overlapping phrase ranges are passed through untouched. If you are only interested in the overlapping evidence then wrap the children in a SELECT_WHEN_OVERLAPPING rule so only the overlapping phrase ranges are considered

<shortest_when_overlapping>
   <select_when_overlapping>
     <near count="1" >
       <text data="A" />
       <text data="B" />
     </near>
    </select_when_overlapping>
</shortest_when_overlapping>

So here just the first “A near B” is found (since it is the shortest (leftmost) overlapping near group).

Semaphore Classification Server Rulebase Reference