Powered by Zoomin Software. For more details please contactZoomin

Semaphore Publisher Template Reference

labeltype attribute

  • Last Updated: May 29, 2026
  • 2 minute read
    • Semaphore
    • Documentation

The wordtypes attribute allows the user to specify which labels in the model a rule will be generated for (so different weightings can be applied). A rule will only be generated for a label that has a word type matching that defined in the wordtypes attribute on the template rule.

Note: If no wordtypes is specified on a rule, then all wordtypes are valid.

Note: When combined with labeltypes and/or behaviourtypes, only labels that have all specified wordtypes, labeltypes and behaviourtypes will have rules generated.

Values

These are the base wordtypes attribute values (with example labels):

word type Example Description
SENTENCE “alpha beta gamma” Multiple word labels where all words are WORD word types.
PHRASE “Alpha Beta Gamma” Multiple word labels where all words are NOUN word types, i.e. all words are capitalized.
MIXED “Alpha beta Gamma” Multiple word labels where words are a mix of WORD and NOUN word types.
ACRONYM “ABG” Single word labels that are all uppercase.
ACROLIST “ABG ABG” Multiple word labels that are all uppercase.
WORD “alpha” Single word labels that have a lowercase first letter.
NOUN “Alpha” Single word labels that have an uppercase first letter.
CODE1 “Alpha-123” Labels, such as “Cat-123” (that is, labels with no spaces, at least one number, and a hyphen) or “MultiCaps” (labels with no spaces and multiple capital letters).

Variant generators can produce variants that are different wordtypes than the label. For example, with the acronymHandler on, the ACRONYM “ABG” will create an ACROLIST variant, “A. B. G.”. This is important to note so that you include all the wordtypes possibly generated in your rule.

There are also these super-sets of wordtypes that can be used as wordtypes values:

ALLTYPES = SENTENCE | PHRASE | MIXED | ACRONYM | ACROLIST | WORD | NOUN | CODE

ALLTYPES_BAR_CODE = (ALLTYPES except CODE)

MULTIWORD = PHRASE | SENTENCE | MIXED

1 - The CODE word type is used in Semaphore to get round certain issues with hyphen handling, specifically the issue whereby a label with no spaces in it, a hyphen, and a number, may not have its hyphen dropped by CS. However, a side effect of introducing CODE is the ability to define it using a regular expression, either in the positive, “what is a CODE word type”, or the negative “what is not a CODE word type” connotation. The regex for identifying a CODE is defined in Publisher's advancedConfig.xml file using the following properties:

<object id="CodeTypeHandlerNegative" class="publisher.bean.preprocessor.WordTypeHandler.WordTypeHandler"> <property name="codeRegexNegative" value="(^([-'\w](?!\d))*$)|[\s]" /> </object>

<object id="CodeTypeHandlerPositive" class="publisher.bean.preprocessor.WordTypeHandler.WordTypeHandler"> <property name="codeRegexPositive" value="(^[-'\w]*$)(?![\s])" /> </object>

Applies to

Example

For a model that contains the preferred label 'Drug Trafficking' with alternate labels 'Drug Barons', 'Dealer', 'LSD', 'THC' these labels would get assigned the following word types:

Label label type word type
Drug Trafficking prefLabel MULTIWORD
Drug Barons altLabel MULTIWORD
Dealer altLabel NOUN
LSD altLabel ACRONYM
THC altLabel ACRONYM

Using the following template rule: <phraselist field="body" case="0" stem="1" weight="40" foreach="1" labeltypes="prefLabel|altLabel" wordtypes="MULTIWORD"/>

would generate the output:

<phrase field="body" case="0" stem="1" weight="40" foreach="1" data="Drug Barons"/> <phrase field="body" case="0" stem="1" weight="40" foreach="1" data="Drug Trafficking"/>

TitleResults for “How to create a CRG?”Also Available inAlert