Powered by Zoomin Software. For more details please contactZoomin

Semaphore Classification Server Rulebase Reference

Regex Attribute

  • Last Updated: May 13, 2026
  • 2 minute read
    • Semaphore
    • Documentation

In template mode a category rule does not simply return the “name” attribute if it is scored above the threshold. Instead the text is extracted from descendant rule(s) with the CAPTURE attribute set.

For each data phrase in the document this optional regex replacement will be applied - see here for synposis of regex syntax used.

The syntax as shown above is taken from sed style syntax so will substitute B for all occurrences of A in value given above - note currently sed options /iG etc are not currently supported but may be in the future if required

The grouping of the extracted phrase ranges into distinct firings (with appropriate foreach count) happens after this regex replacement step so this may be used to merge found data into a single firing if appropriate

NB If the extracted phrase range crosses a paragraph boundary remember that a paragraph separator “.\n\n” is present in the data passed to the regex search/replace - When this data is written to the xml output it is part of an xml attribute (a value attribute on the META node). This means the \n is not valid and so is removed. This has been a source of confusion when writing the regex since your search pattern needs to take account of these \n characters (and either leave them or remove as seems appropriate)

Applies to

Values

  • Regex replacement to apply to captured data
  • eg “s/A/B/” replace A by B

Examples

The following:

<category class="DATES" foreach="1" weight="50" template="1" regex="s/1[6789][0-9][0-9].*/Too Early/">
   <expression type="date" capture="1" />
</category>

Evaluated against the following document fragment:

On the morning of Wednesday 21st June 2012 I updated this documentation.  
I'm not quite sure on what date it was originally written but am guessing that is was some time in Q2 2009.  
On the 3/1/2011 it had been updated which is another way of saying 1st March 2011 or possibly 1/3/2011 if using Advanced Language Packs.  
We should ignore 1/12/1993 and 1/3/1932 since we don't care about early dates

Would return:

...
<META name="DATES" value="2011-01-03" score="0.50" />
<META name="DATES" value="2011-03-01" score="0.75" />
<META name="DATES" value="2012-06-21" score="0.50" />
<META name="DATES" value="Too Early" score="0.75" />
...

Note that the above example is to show how distinct values may be merged into a single firing if you wanted to remove dates before 2000 - using a slice restriction on the <expression> would be better than merging the earlier dates using a regular expression:

<category class="DATES" foreach="1" weight="50" template="1" >
   <expression type="date" capture="1" data="[2000-01-01:]"/>
</category>

See also

TitleResults for “How to create a CRG?”Also Available inAlert