STEM
- Last Updated: May 13, 2026
- 3 minute read
- Semaphore
- Documentation
Specifies that stemmed version of the text is matched.
The stemming functionality depends on whether a language pack is installed. If a language pack is not installed, the Stemmer type may be specified directly. In CS, non stemmed matching is used by default; publisher will use the label default to create rules, which is to use stemmed matching.
The stem attribute is valid wherever a language attribute is valid (all rulebase nodes rather than just rules)
In CS, stem is an inherited attribute - i.e. it may be set on a containing rule and will be the default for all child rules. In template rules (like phraselist) that Publisher processes, the stem attribute will not inherit from parents and must be present on the rule itself; if it is not, it will inherit the setting for the label itself in the model. It must be present on the rule. Therefore, stem is a valid attribute for all rules - however, it only has functionality for text rules.
Applies to
- TEXT
- Can be inherited from any container rule
Values
- “1” - Stemmed value is matched
- “0” - Unstemmed value is matched
- “N” - Uses stemmer variant N where N is a valid variant for the particular language
When a language pack is installed, the only stemmer available is the one provided by the language pack. Any stem attribute value >0 will use the single stemmer available from the installed language pack.
- Create rules in the publishing templates that set the stem value to “0” for handling labels with wildcards.
OR
- In the model, change the individual labels settings to turn off stemming for that label only (see working-with-label-settings).
Non-Language Pack Mode
When no language pack is installed for the specified language, the following languages have the following stemmer variants available
English :-
- 0 - No stem
- 1 - Original Porter algorithm
- 2 - Modified Porter algorithm (Marathon stemmer)
- 3 - Morphological stemmer
- 4 - Morphological and Derivational
French,Italian,German,Spanish,Dutch,Portuguese,Danish,Norwegian,Swedish :-
- 0 - No stem
- 1 - Porter algorithm
For all other languages we have :-
- 0 - No stem only
It is an error which will stop the loading of the rulebase to specify a stemmer variant which does not exist for that language.
Language Pack mode
Languages are restricted to those for which a valid language pack is installed on the machine - it is an error which will stop the loading of the rulebase to specify a language which is not installed and licensed.
However there is no concept of stemmer variants with language packs since only 1 stemmer is provided with the language pack. Any stem attribute value set to >0 are treated as stem=“1”.
Example
<text stem="1" data="knit"/>
This will match “knit”, “knitted”, etc. in the document
Backwards Compatibility
When a language pack is not installed, we support setting the language and stemmer variant in a single language attribute as was done previously eg “en2” in this case stem=“1” is used to mean use the StemmerVariant which would have been used previously to support en2 language.
All rulebase files are processed in backwards compatibility mode until either of 2 conditions is met:-
- 1] a stem=“n” attribute on the rulebase node is found
- 2] a stem=“n” attribute on a rule node where n is > 1
Generally using 1st mechanism is to be preferred since it does not run the risk of mis-interpretation
see LANGUAGE for more details on backwards compatibility