NEAR

Save PDF

Last Updated: July 8, 2026
2 minute read

Semaphore
Documentation

The NEAR rule identifies a group of words near each other.

Score calculation

Scores its given weight if any of its children’s evidence forms a near group.

Evidence calculation

The evidence is the set of near groups found.

Attribute information

Any attribute
COUNT - sets the count of words to skip within a near group
DATA - the text to be parsed and appended as child TEXT rules
FOREACH - adjusts the score by the count of near groups found
NEARTYPE
PUNCTUATION
TYPE
WEIGHT - is the score the rule will have if a near group found

Children restrictions

Any rule other than those restricted to a specific parent

A near group is similar to a sequence or phrase except that order does not matter within a near group.

A B

is both a sequence of A B and a near group for A B, whilst

B A

is only the near group.

Due to the lack of order, SKIP rules do not apply to near groups. Instead use the COUNT attribute to specify the count of skip equivalents.

By default, a near group ignores punctuation within a sentence, but cannot cross a sentence boundary. Use the PUNCTUATION attribute to alter this behaviour if required.

Unlike the handling of skips within a sequence, a NEAR rule finds both short and long overlapping groups.

Example 1

    <near count="1">
        <text data="word1"/>
        <text data="word2"/>
    </near>

would match the following text:

    This contains word1, word2.

This matches because the number of tokens to be ignored is 1 (count="1").

    This contains word1 near word2.

and

    This contains word2 before word1.

Since the default punctuation handling is set to "ignore_in_sentence", the rule would not match:

   This sentence contains word1. Word2 starts the next sentence.

Example 2

The following data attribute:

<near data="in the same words" />

is equivalent to following child TEXT rules containing single words:

<near>
    <text data="in"/>
    <text data="the"/>
    <text data="same"/>
    <text data="words"/>
</near>

and would fire in a document containing

This has a sentence with the same words in it.

=======

Example 3

<near count="10" type="in_order" data="A B" />

Fires in a document containing:

This has A, B and then B again in it

Two near groups will be found - both the short one A, B, and long one A, B and then B.

This is unlike the behaviour of SEQUENCE combined with SKIP:

   <sequence>
       <text data="A" />
       <skip count="10" />
       <text data="B" />
   </sequence>

The latter finds the longer sequence by default. To get the shorter result, the SKIP needs to be marked as non_greedy="1" or count="10?".

To only find the short solution, wrap the NEAR rule with SHORTEST_WHEN_OVERLAPPING.

Conversely, to only find the long solution, wrap the NEAR rule with LONGEST_WHEN_OVERLAPPING.

Semaphore Classification Server Rulebase Reference

NEAR

Table of Contents

NEAR

Score calculation

Evidence calculation

Attribute information

Children restrictions

Example 1

Example 2

Example 3

See also