STEM

Save PDF

Last Updated: July 8, 2026
3 minute read

Semaphore
Documentation

Specifies that stemmed version of the text is matched.

The stemming functionality depends on whether a language pack is installed. If a language pack is not installed, the Stemmer type may be specified directly. In CS, non stemmed matching is used by default; publisher will use the label default to create rules, which is to use stemmed matching.

The stem attribute is valid wherever a language attribute is valid (all rulebase nodes rather than just rules)

In CS, stem is an inherited attribute - i.e. it may be set on a containing rule and will be the default for all child rules. In template rules (like phraselist) that Publisher processes, the stem attribute will not inherit from parents and must be present on the rule itself; if it is not, it will inherit the setting for the label itself in the model. It must be present on the rule. Therefore, stem is a valid attribute for all rules - however, it only has functionality for text rules.

Applies to

TEXT
Can be inherited from any container rule

Values

“1” - Stemmed value is matched
“0” - Unstemmed value is matched
“N” - Uses stemmer variant N where N is a valid variant for the particular language

When a language pack is installed, the only stemmer available is the one provided by the language pack. Any stem attribute value >0 will use the single stemmer available from the installed language pack.

When the stem attribute is set to “1” (i.e., stem=“1”), then wildcard characters will not work in labels. If you want to use wildcards to find various patterns of labels , then you must:

Create rules in the publishing templates that set the stem value to “0” for handling labels with wildcards.

In the model, change the individual labels settings to turn off stemming for that label only (see working-with-label-settings).

Non-Language Pack Mode

When no language pack is installed for the specified language, the following languages have the following stemmer variants available

English :-

0 - No stem
1 - Original Porter algorithm
2 - Modified Porter algorithm (Marathon stemmer)
3 - Morphological stemmer
4 - Morphological and Derivational

French,Italian,German,Spanish,Dutch,Portuguese,Danish,Norwegian,Swedish :-

0 - No stem
1 - Porter algorithm

For all other languages we have :-

0 - No stem only

It is an error which will stop the loading of the rulebase to specify a stemmer variant which does not exist for that language.

Language Pack mode

Languages are restricted to those for which a valid language pack is installed on the machine - it is an error which will stop the loading of the rulebase to specify a language which is not installed and licensed.

However there is no concept of stemmer variants with language packs since only 1 stemmer is provided with the language pack. Any stem attribute value set to >0 are treated as stem=“1”.

Example

    <text stem="1" data="knit"/>

This will match “knit”, “knitted”, etc. in the document

Backwards Compatibility

When a language pack is not installed, we support setting the language and stemmer variant in a single language attribute as was done previously eg “en2” in this case stem=“1” is used to mean use the StemmerVariant which would have been used previously to support en2 language.

All rulebase files are processed in backwards compatibility mode until either of 2 conditions is met:-

1] a stem=“n” attribute on the rulebase node is found
2] a stem=“n” attribute on a rule node where n is > 1

Generally using 1st mechanism is to be preferred since it does not run the risk of mis-interpretation

see LANGUAGE for more details on backwards compatibility

Semaphore Classification Server Rulebase Reference

STEM

Table of Contents

STEM

Applies to

Values

Non-Language Pack Mode

Language Pack mode

Example

Backwards Compatibility

See also