NEAR
- Last Updated: May 13, 2026
- 2 minute read
- Semaphore
- Documentation
The NEAR rule identifies a group of words near each other.
Score calculation
Scores its given weight if any of its children’s evidence forms a near group.
Evidence calculation
The evidence is the set of near groups found.
Attribute information
- Any attribute
- COUNT - sets the count of words to skip within a near group
- DATA - the text to be parsed and appended as child TEXT rules
- FOREACH - adjusts the score by the count of near groups found
- NEARTYPE
- PUNCTUATION
- TYPE
- WEIGHT - is the score the rule will have if a near group found
Children restrictions
Any rule other than those restricted to a specific parent
A near group is similar to a sequence or phrase except that order does not matter within a near group.
A B
is both a sequence of A B and a near group for A B, whilst
B A
is only the near group.
Due to the lack of order, SKIP rules do not apply to near groups. Instead use the COUNT attribute to specify the count of skip equivalents.
By default, a near group ignores punctuation within a sentence, but cannot cross a sentence boundary. Use the PUNCTUATION attribute to alter this behaviour if required.
Unlike the handling of skips within a sequence, a NEAR rule finds both short and long overlapping groups.
Example 1
<near count="1">
<text data="word1"/>
<text data="word2"/>
</near>
would match the following text:
This contains word1, word2.
This matches because the number of tokens to be ignored is 1 (count="1").
This contains word1 near word2.
and
This contains word2 before word1.
Since the default punctuation handling is set to "ignore_in_sentence", the rule would not match:
This sentence contains word1. Word2 starts the next sentence.
Example 2
The following data attribute:
<near data="in the same words" />
is equivalent to following child TEXT rules containing single words:
<near>
<text data="in"/>
<text data="the"/>
<text data="same"/>
<text data="words"/>
</near>
and would fire in a document containing
This has a sentence with the same words in it.
=======
Example 3
<near count="10" type="in_order" data="A B" />
Fires in a document containing:
This has A, B and then B again in it
Two near groups will be found - both the short one A, B, and long one A, B and then B.
This is unlike the behaviour of SEQUENCE combined with SKIP:
<sequence>
<text data="A" />
<skip count="10" />
<text data="B" />
</sequence>
The latter finds the longer sequence by default. To get the shorter result, the SKIP needs to be marked as non_greedy="1" or count="10?".
To only find the short solution, wrap the NEAR rule with SHORTEST_WHEN_OVERLAPPING.
Conversely, to only find the long solution, wrap the NEAR rule with LONGEST_WHEN_OVERLAPPING.