EXPRESSION

Save PDF

Last Updated: May 13, 2026
5 minute read

Semaphore
Documentation

The EXPRESSION rule allows rules to access information determined by Zoners run by CS.

Zoners have multiple uses within the CS pipelines and so are difficult to generalise about but put simply they determine areas of text in the document of a particular type which we call a zone. Zones may overlap and are not required to have any structure (unlike say fields which you could consider as similar “types” of text but in this case determined by the input file format and required to be a tree and so cannot overlap may only fully contain).

Individual Zone types may, of course, be tree structured so for example sentence zones cannot overlap by the definition of what a sentence is but another zone type may well start in one sentence and finish in another without this causing any problems.

Some zoners are required by CS to run in all cases for normal operation whilst other zoners are optional and will only be run if there is an appropriate EXPRESSION rule with that zone type in the active rulenet (or debug zones asked for on request). Similarly when a zoner is run it may amortize the amount of calculation across various zone types (for example Language Pack zoners) so if one type of zone is switched on by an expression rule the zoner is able to add in other zone types which essentialy “come for free” with the calculation needed to determine the wanted zone type.

In some ways the EXPRESSION rule is similar to the TEXT rule in that it is a leaf node of the ruletree and is where the appropriate phrase ranges originate which are then considered / altered by other rules further up the tree.

It differs from the TEXT rule in that the phrase range found may be multiple tokens and also may have a normalised form provided by the zoner. Also the data attribute is optional in which case the set of phrase ranges returned is the set of phrase ranges for all zones of that particular type so <expression type=“sentence” /> is valid and simply gives you the entire document split into phrase ranges describing each individual sentence determined by the sentence zoner.

Score calculation

Scores its given weight if any zones of the appropriate type in the document (and which match the data attribute if used)

Evidence calculation

The evidence is the set of (matching) zones of that type.

Attribute information

EXPRESSIONTYPE - sets what type of expression
TYPE - alias for EXPRESSIONTYPE
any attribute

Children restrictions

No children rules allowed

Normalisation

Some types of information discovered by a zoner can have multiple forms in text. To make processing this information simpler in downstream systems the zoner may provide a normalised form for this data which can be extracted when needed.

So for example dates may have wildly differing text forms “2nd September 2017”, “2/9/2017” or “The second day of September in the year two thousand and seventeen AD” but the normailsed form is the ISO form 2017-09-02 (YYYY-MM-DD).

When a normalised form exists for a zone it may be marked for extraction by using the extract_name attribute on the EXPRESSION rule which will prefer the normalised form if it exists otherwise will extract the evidence (use extract_evidence if you do not want the normalised form_.

The opportunity to extract the normalised form is only availble on the EXPRESSION rule itself - if you pass the phrase range up to another rule (say a UNION over the EXPRESSION you cannot access the normalised form on the UNION rule - ie this normal form does not travel up the rule tree with the phrase range.

NB In case you were curious the 3rd form of the date given above requires enabling optional date discovery patterns for date zoner to find it. It is not discovered with OOTB configuration since it requires significant computational resource to determine that this style of date is in fact a date.

Wild Card and Slice Restrictions

Because the zoner has often done a lot of the hard work in finding the relevant data EXPRESSION uses a different way of matching using the data attribute than the TEXT rule does.

The match (whether wildcard or slice restriction) is always performed on the normalised version of the zone when available.

This can be a simple wildcard match (this is simpler than the TEXT version and only supports * meaning any)

<expression type="date" data="2011*" />

Does a wildcard match - ie finds all dates beginning with 2011

Alternively a slice restriction may be used which can provide a better way of controlling the match.

<expression type="date" data="[2011-01-01:2011-04-30]" />

Finds dates in the range 1st Jan 2011 and 30th April 2011

The format of a slice restriction is [X:Y]

and will find values >= X and < Y

If X or Y are not specified then they will default to the minimum (or maximum) possible value so to find all dates after 2000 do

<expression type="date" data="[2000-01-01:]" />

NB Slice restrictions are not only used for DATE but may be used on other expression types eg

<expression type="PERSON" data="[A:M]" />

Finds names of people which start with A up to those starting with M - however this is often not much practical use since the name may well not be normalised so selecting based on first name and or familly name depending on usage in the document is less than ideal.

Example for ‘date’

The following rulebase fragment

  <phrase extract="1" >
    <text data="between" />
    <expression type="date" extract_name="period_start"/>
    <text data="and" />
    <expression type="date" extract_name="period_end" />
  </phrase>

evaluated against the following document fragment

Applications were received between 10th May 2002 and 31st July 2002.

Would return

  ...
  <META name="period_end" value="2020-07-31" score="1.00"/>
  <META name="period_start" value="2020-05-10" score="1.00"/>
  ...

Example showing slice restriction for ‘date’

The following rulebase fragment

    <expression type="date" extract="1" extract_name="In Range" data="[2011-03-01:2011-10-20]"/>

evaluated against the following document fragment

 We should find 30th September 2011 whilst 30/6/2012 and 1/1/2011 should be ignored.

Would return

...
META name="In Range" value="2011-09-30" score="1.00" />
...

Example showing ambiguous date handling

When dates are found and normalised they are normalised according to the first matching normalisation format - the date is added to another expression type of date_ambiguous or date_unambiguous depending on whether the normalisation is unique or not.

This allows fine control over how ambiguous dates are to be treated so may be used like

   <union extract="1" extract_group="date" extract_group_key="date" >
       <expression type="date" extract_name="date" extract_evidence="date raw" />       
       <expression type="date_ambiguous" extract_default="ambiguous:true" />
       <expression type="date_unambiguous" extract_default="ambiguous:false" />
    </union>

  The festival started on 1/2/2017 and continued, with waning amounts of participant enthusiasm, till 1/23/2017

which can give

<META name="date" value="2017-01-23" score="1.00">
   <META name="ambiguous" value="false" score="1.00"/>
   <META name="date" value="2017-01-23" score="1.00"/>
   <META name="date raw" value="1/23/2017" score="1.00"/>
</META>
<META name="date" value="2017-02-01" score="1.00">
   <META name="ambiguous" value="true" score="1.00"/>
   <META name="date" value="2017-02-01" score="1.00"/>
   <META name="date raw" value="1/2/2017" score="1.00"/>
</META>

Here we can see that we have probably misnormalised the first date which should be 2017-01-02. However we have marked that date as ambiguous (and provided the raw text for manual assesment).

By changing the order of the normalisations (or discovery) formats in the configuration files you may alter the CS to pick a different normalisation style when ambiguous

Semaphore Classification Server Rulebase Reference

EXPRESSION

Table of Contents

EXPRESSION

Score calculation

Evidence calculation

Attribute information

Children restrictions

Normalisation

Wild Card and Slice Restrictions

Example for ‘date’

Example showing slice restriction for ‘date’

Example showing ambiguous date handling