EXTRACT_REGEX
- Last Updated: May 13, 2026
- 1 minute read
- Semaphore
- Documentation
Defines an optional regular expression search and replacment (regex) to be applied to an extraction.
The syntax for the regex is modeled on the Sed substitution command syntax.
s/regexp/replacement/
Where regexp (and replacement) obey the perl language syntax for regular expressions - See here for details.
Applies to
Any rule which has a EXTRACT_NAME, EXTRACT_EVIDENCE or EXTRACT_TAGS attribute set
Values
- “s/XXXX/YYYY/” - XXXX is the search part of the regex and YYYY is the replacement
Other attributes having special meaning for any rule with this attribute
Example
The following document:
Jean-Claude Trichet announced today a rise of 1/2 point in interest rates.
In a separate intervention the governor of the European Central Bank announced that the
institution will keep a firm handle on inflation.
Evaluated against the following rulebase fragment:
<expression type="PERSON" extract="1" extract_name="person" extract_regex="s/Trichet/TRICHET/" />
Will return:
....
<META name="person" value="Jean-Claude TRICHET" score="1.00"/>
....
NB using regex substitutions is very powerful (we could for example upper case any extraction rather than simply replace Trichet with TRICHET) however these substitutions are very unreadable and difficult to maintain.
The best advice is to avoid using this attribute unless it is really required (and even then think twice) but it has been made available since there are some occassions when despite regex shortcomings it is the best solution.