LONGEST_WHEN_OVERLAPPING

Save PDF

Last Updated: May 13, 2026
2 minute read

Semaphore
Documentation

New in Semaphore 5.10.1

This rule scores its weight if any of its children score. It’s evidence is the union of it’s children’s evidence except when these overlap in which case the longest (or rightmost in case of a tie in length) is used..

NB the score is not calculated from the children’s actual score just from whether the children score or not.

Modifying Attributes

Children restrictions

Any rule other than those only allowed a specific parent (CONDITION, ELSE, THEN and SKIP).

The DATA attribute may be used which will be expanded out to the appropriate TEXT children rules automatically

This rule has almost identical behaviour to the UNION rule except that the UNION rule will join together overlapping phrase ranges - longest_when_overlapping will pick the longest rather than calculating a new phrase range covering the overlap.

The best way of seeing the difference between these two rules is to consider the following example

Example 1

The following rulebase fragment:

<longest_when_overlapping foreach="1" weight="20" >
    <phrase data="A" />
    <phrase data="A A" />
</longest_overlap>

Evaluating the following document text:

A A A

Will fire with a score of “0.20” and have 1 evidence phrase ranges attached (the rightmost “A A”)

However using the UNION rule:

<union foreach="1" weight="20" >
    <phrase data="A" />
    <phrase data="A A" />
</union>

Will fire on the same text with a score of “0.20” and will have only a single evidence phrase range (“A A A”) attached.

So the difference being that <union> creates a new phrase range covering the overlap whilst <longest_when_overlapping> picks the longest (or right-most in the event of a tie in length) phrase range in the overlap.

In many cases this difference will be unimportant and it wouldn’t matter which rule you used but where the resulting phrase range is used to group extractions using the <union> can result in unwanted groupings being returned.

Semaphore Classification Server Rulebase Reference