SELECT

Save PDF

Last Updated: July 8, 2026
5 minute read

Semaphore
Documentation

Selects child phrase ranges which have some intersection

Score calculation

Scores the weight of the rule if any selected phrase ranges

Evidence calculation

The evidence is all the phrase ranges which have some intersection between children

Attribute information

any attribute
TYPE (synonyn for selecttype)
SELECTTYPE
WEIGHT - the score if any intersections found

Children restrictions

FROM an optional rule which restricts the selection when an intersection found

Any rule other than those restricted to a specific parent

tagged Evidence

By default any tagged Evidence overlapping either phrase range which have an intersection (or overlap) is bubbled up.

extra

a selecttype of contains will further restrict tagged Evidence to only that which is contained by the containing phrase range - this is still regardless of which child the tagged Evidence is coming from.

The SELECT rule will fire if there is any intersection between the evidence of all of its child rules. This rule is very similar to the INTERSECTION rule except this rule returns the overlapping phrase ranges rather than just the overlapped evidence.

The difference between this rule and INTERSECTION is analogous to the difference between ANY and UNION rules, SELECT and ANY will only return some subset of their children’s evidence phrase ranges whilst INTERSECTION and UNION can create new phrase ranges containing the required evidence.

Showing the the evidence as lines (since a range has length) rather than dots in a venn diagram.

../rules/img/decorated_venn.png

The select rule will “select” the blue lines (ie those that are in the intersection either partially or fully) and those blue lines will be the resulting set

The intersect rule will “cut” those blue lines so only the part which appears in the intersection (the red) will be used

Both rules will ignore the green lines (phrase ranges which have no intersection partial or otherwise) and will have an identical result for the 2nd blue line which is “fully” intersected

Note the diagram above is a simplification - in order for there to be any intersection there should be a corresponding “line” in both sets (but trying to draw this just over-complicates the diagram).

A possibly surprising fact this diagram doesn’t show is that the count for select is the count of intersecting phrase ranges from the 2 sets rather than the count of intersections.

For the intersection rule the result set is just the intersected part. So for the above diagram there would be 3 phrase ranges. For the select rule there will be 5 phrase ranges. The 2 partially intersected blue lines will have both lines (not shown on diagram) added to the result set giving 4 phrase ranges plus the single phrase range for the identical phrase range (the 2nd blue line) making a total count of 5.

If your use case would rely on this count being correct then consider using the SELECT_TAG rule over an INTERSECTION which allows control over which side of the intersection will be used up as evidence.

In some cases a UNION rule may be used over the SELECT since this will collapse intersecting phrase ranges into single phrase ranges and so adjust the count. However for this to work correctly each childs evidence must be distinct (ie non-overlapping) and the only intersection be between the children. SELECT_TAG over INTERSECTION will work in all cases so is probably better to use.

Example

   <select>   
       <sentence>
         <text data="A" />
         <text data="B" />
       </sentence>
       <any not="1" >
           <text data="C" />
       </any>
   </select>

  This is a sentence with A and B in it. This is a sentence without either. This is a sentence with A,B and C in it.

Try it

This finds all sentences with “A” and “B” in them that do not contain “C”.

This is the equivalent to the simpler (and thus often preferred)

       <sentence>
         <text data="A" />
         <text data="B" />
         <text not="1" data="C" />
       </sentence>

and different to

   <intersection>   
       <sentence>
         <text data="A" />
         <text data="B" />
       </sentence>
       <any not="1" >
           <text data="C" />
       </any>
   </intersection>

Which finds all sentences with “A” and “B” in them and then removes any “C” from those sentences (rather than removing the sentences with “C” in them)

In the above example we are selecting against notted evidence and so do not have to use SELECT_TAG to tell which side of the intersection to use since there is only 1 positive side.

Changing this to use postive evidence

   <select>   
       <sentence>
         <text data="A" />
         <text data="B" />
       </sentence>
       <any>
           <text data="C" />
       </any>
   </select>

This time this is not identical to the simpler

       <sentence>
         <text data="A" />
         <text data="B" />
         <text data="C" />
       </sentence>

Since that will only contain the sentences with “A”,“B” and “C” whilst the SELECT version will have “extra” phrase ranges for the “C” which was used to select.

In this case we know that SENTENCES cannot overlap since that is implied in the definition of a sentence - and similarly <text data=“C” /> cannot have overlapping evidence either - so we can use a UNION over the SELECT and get the equivalent to the simple SENTENCE rule

  <union>
   <select>   
       <sentence>
         <text data="A" />
         <text data="B" />
       </sentence>
       <any>
           <text data="C" />
       </any>
   </select>
  </union>

Changing the children of the SELECT so that they do potentially have overlapping evidence

  <union>
   <select>   
       <any>
         <phrase data="A" />
         <phrase data="A B" />
         <phrase data="A B C" />
       </any>
       <any>
           <text data="B" />
       </any>
   </select>
  </union>

By using the UNION we have collapsed the overlapping child sequences which may be problematic if we use some SEQUENCE rule above this so

  <sequence>
  <union>
   <select>   
       <any>
         <phrase data="A" />
         <phrase data="A B" />
         <phrase data="A B C" />
       </any>
       <any>
           <text data="B" />
       </any>
   </select>
  </union>
  <text data="C" />
  </sequence>

This (possibly surprisingly) will not fire on

    A document consisting of A B C

Since the UNION has collapsed the input sequences “A B” and “A B C” into a single “A B C” - this means that the outer sequence will not fire since it is now only looking for the sequence “A B C C” which does not occur in the document. Changing this to use SELECT_TAG will work.

  <sequence>
   <select_tag data="pick_me">   
     <intersection>
       <any tag="pick_me" >
         <phrase data="A" />
         <phrase data="A B" />
         <phrase data="A B C" />
       </any>
       <any>
           <text data="B" />
       </any>
     </intersection>
   </select_tag>
  <text data="C" />
  </sequence>

Here we have been explicit about which side of the intersection we want to select and so will have both “A B” and “A B C” sequences as evidence for the select_tag (since they both contain an intersection with “B”) and so the outer sequence will now find “A B C” as expected.

Semaphore Classification Server Rulebase Reference

SELECT

Table of Contents

SELECT

Score calculation

Evidence calculation

Attribute information

Children restrictions

tagged Evidence

extra

Example