PUNCTUATION

Save PDF

Last Updated: July 8, 2026
2 minute read

Semaphore
Documentation

Specifies the type of punctuation handling that is to be used in matching the phrase or near group.

This controls whether punctuation in the document is ignored or not when matching.

Punctuation is an inherited attribute - it may be set on a containing rule and that value will be the default for all descendant rules.

If no punctuation attribute has been specified then “ignore_in_sentence” is the default.

A near,phrase or sequence rule will use the default value unless it also has a data attribute and this data contains punctuation. In this case the rule will use punctuation=“none” rather than the default. ie there is a difference between setting the default and setting the attribute on a particular rule (which then becomes the default for that rule’s descendants)

Applies to

Values

“none” - All punctuation marks in document are considered as distinct words
“ignore_in_sentence” - All punctuation marks in document are ignored for phrase or near matching but rule is limited to finding children within the same sentence
“ignore_in_paragraph” - Similar to above but children are restricted to being within the same paragraph
“ignore_all” - All punctuation marks in document are ignored for phrase or near matching purposes.

Example

With the following 4 phrase rules (keys 1,2,3,4) only differing in there punctuation handling

<phrase _key="k1" punctuation="none">
    <text data="word1"/>
    <text data="word2"/>
</phrase>
<phrase _key="k2" punctuation="ignore_in_sentence">
    <text data="word1"/>
    <text data="word2"/>
</phrase>
<phrase _key="k3" punctuation="ignore_in_paragraph">
    <text data="word1"/>
    <text data="word2"/>
</phrase>
<phrase _key="k4" punctuation="ignore_all">
    <text data="word1"/>
    <text data="word2"/>
</phrase>

with the following text:-

This contains word1, word2 within a sentence but separated by a comma.

would not match with phrase 1 but would with 2,3 and 4

with:-

This sentence contains word1. Word2 is the start of the next sentence.

would not match 1 or 2. Phrases 3 and 4 would match

with:-

This paragraph ends with word1.

Word2 is the start of the next paragraph.

would not match 1,2 or 3. Phrase 4 would still match

Example 2

Showing the effect of setting the default value

<any punctuation="ignore_in_sentence" >
    <phrase _key="k1" data="word1 word2" />
    <phrase _key="k2" data="word1, word2" />
    <phrase _key="k3" punctuation="ignore_in_sentence" data="word1, word2" />
</any>

with the following text:-

This contains word1, , word2 within a sentence but separated by 2 commas.

Phrase k1 will match since it is ignoring in sentence punctuation so both commas ignored

Phrase k2 will not match since it has punctuation=“none” due to the occurrence of a comma in the data attribute - so this will not skip the extra comma

Phrase k3 will match since we have specified the attribute directly rather than defaulting it.

Semaphore Classification Server Rulebase Reference

PUNCTUATION

Table of Contents

PUNCTUATION

Applies to

Values

Example

Example 2

See also