Powered by Zoomin Software. For more details please contactZoomin

The Semaphore Fact Extraction Framework (FACTS)

FACTS Concept Metadata

  • Last Updated: May 29, 2026
  • 6 minute read
    • Semaphore
    • Documentation

Capturing Greedy/Non-Greedy Skip Counts

All facts using a Captured Fact class are required to have a capturing greedy skip count or a capturing non-greedy skip count.

  • But, capturing greedy skip count and capturing non-greedy skip count CANNOT be used for the same Captured Fact.
  • One, and only one, value is allowed for each Captured Fact.

The capturing greedy skip count is used to set the number of grammatical units that should be captured by a Captured Fact. Because it is greedy, this metadata will match the longest sequence it can find.

The capturing non-greedy skip count is used to set the number of grammatical units should be captured by the Captured Fact. Because it is non-greedy, this metadata will match the shortest sequence it can find.

Example:

For a sequence where AnchorA starts the sequence, and Anchor B ends it:

  "AnchorA some words we are not interested in AnchorB and wish to skip AnchorB"    

A capturing greedy skip count would find the last AnchorB occurrence, and hence the captured fact would be:

  • “some words we are not interested in AnchorB and wish to skip”

But, a capturing non-greedy skip count would find the first AnchorB occurrence, and hence the captured fact would be:

  • “some words we are not interested in”
Metadata label Domain Range
capturing greedy skip count Captured Fact Integer
capturing non-greedy skip count Captured Fact Integer

Context Position from Document Start/End

Valid for all Context Elements (Anchors, Skips, Facts, and Contexts), but context position from document start and context position from document end are not required by any element.

  • context position from document start and context position from document end CANNOT be used for the same Context Element.
  • One, and only one, value is allowed for each element.

These metadata allows the selection, if there are more than one, of which sequence to look for the facts or facts in.

The context position from document start metadata works from the first occurrence of the extractor in its context.

  • To select the first one, set the metadata to 1
  • To select the second extractor, set the metadata to 2
  • And so on . . .

The context position from document end metadata works from the last occurrence of the extractor in its context.

  • To select the last one, set the metadata to 1
  • To select the penultimate extractor, set the metadata to 2
  • And so on . . .
Metadata label Domain Range
context position from document start Context Elements Integer
context position from document end Context Elements Integer

Document Type as top group

Metadata label Domain Range
Document Type as top group Document Type Boolean

Fact Position from Context Start/End

Valid for all Context Elements (Anchors, Skips, Facts, and Contexts), but fact position from context start and fact position from context end are not required by any element.

  • fact position from context start and fact position from context end CANNOT be used for the same Context Element.
  • One, and only one, value is allowed for each element.

The fact position from context start metadata works from the first occurrence of the extractor in its context.

  • To select the first one, set the metadata to 1
  • To select the second extractor, set the metadata to 2
  • And so on . . .

The fact position from context end metadata works from the last occurrence of the extractor in its context.

  • To select the last one, set the metadata to 1
  • To select the penultimate extractor, set the metadata to 2
  • And so on . . .
Metadata label Domain Range
context position from document start Context Elements Integer
context position from document end Context Elements Integer

Fact Presence

  • If the fact presence metadata is set, then instead of extracting the fact’s either normalised or nonnormalised value, it will return the value TRUE.
  • The value FALSE could be deduced from the absence of the value TRUE.
Metadata label Domain Range
fact presence Fact Context Elements Boolean

Field

  • field - Filter the field(s) we want facts or anchors to all appear in.
  • By default, the extractor will look in all fields.
  • This can be set on any fact or extractor, but it best set as high up as you wish, as essentially it is a filter.
Metadata label Domain Range
field Context Elements String

Greedy/Non-Greedy Repeats

  • The {1::-10::}greedy repeat and {1::-10::}non-greedy repeat metadata can be used with any element of any Context to allow the element to be repeated up to some number set by the metadata.
  • To specify which element of the extractor you wish to be repeated, select the matching prefix sequence order. Currently this goes up to 10.
  • The repeat can either be greedy or non-greedy.
    • The greedy version will look for the maximum number of repeats that satisfy the sequence.
    • The non-greedy version will look for the minimum number of repeats that satisfy the sequence.
  • A greedy repeat and a non-greedy repeat CANNOT be used on the same element.
  • One, and only one, value is allowed for each element.
Metadata label Domain Range
{1::-10::}greedy repeat Contexts Integer
{1::-10::}non-greedy repeat Contexts Integer
  * **Greedy repeats**
    * 1::greedy repeat
    * 2::greedy repeat
    * 3::greedy repeat
    * 4::greedy repeat
    * 5::greedy repeat
    * 6::greedy repeat
    * 7::greedy repeat
    * 8::greedy repeat
    * 9::greedy repeat
    * 10::greedy repeat

  * **Non-greedy repeats**
    * 1::non-greedy repeat
    * 2::non-greedy repeat
    * 3::non-greedy repeat
    * 4::non-greedy repeat
    * 5::non-greedy repeat
    * 6::non-greedy repeat
    * 7::non-greedy repeat
    * 8::non-greedy repeat
    * 9::non-greedy repeat
    * 10::non-greedy repeat

Greedy/Non-Greedy Skip Counts

  • Either a greedy skip count or non-greedy skip is required for elements using the Skip class.
  • But, greedy skip count and non-greedy skip count CANNOT be used for the same Skip.
  • One, and only one, value is allowed.

A greedy skip count eats the maximum number of tokens it can to satisfy the sequence. If there are two anchors that could both end a valid sequence, it will select the last one. A non-greedy skip count eats the minimum number of tokens it can to satisfy the sequence. If there are two anchors that could both end a valid sequence, it will select the first one.

Example: For a sequence where AnchorA starts the sequence, and Anchor B ends it:

  "AnchorA some words we are not interested in AnchorB and wish to skip AnchorB"

A greedy skip would find the last AnchorB occurrence, and hence the selected text would be:

  * "some words we are not interested in AnchorB and wish to skip"

But, a greedy skip would find the first AnchorB occurrence, and hence the selected text would be:

  * "some words we are not interested in"
Metadata label Domain ^Range ^
greedy skip count Skip Integer
non-greedy skip count Skip Integer

Width for Context

  • Valid for Sequenced and Unsequenced Contexts using Sentences or Paragraphs, but width for context is not required by any context.
Metadata label Domain Range
width for context Contexts Integer

Loose

  • loose - loosen the constraint that each element must be in its own grammatical unit.
  • This metadata breaks the restriction for extractors that rely on anchors and facts appearing in different sentences or paragraphs. Anchors and facts can now appear in the same sentence or paragraph.
  • So, if we have a sequenced extractor that is looking for a sequence of paragraphs, the usual default rule is that those elements need to appear in different paragraphs (in sequence).
    • If we set the extractor to be loose, then the elements can appear in the same paragraph (although they do need to appear in order, still).
Metadata label Domain Range
loose Sequenced Contexts Boolean

Near Counts

  • A near count is used to set the number of tokens that may be skipped between grammatical units in a near sequence (ordered or unordered).
  • Mandatory for Contexts using any of the Near Contexts.
  • One, and only one, value is allowed.

If the near sequence is working on paragraphs or sentences, then this metadata still sets the number of word tokens that can appear between those elements across the sequence (that is, it does not count sentences or paragraphs that a sequence extractor would - it is always counting word tokens).

Metadata label Domain Range
near count Near Contexts Integer

negate

Metadata label Domain Range
negate Fact Context Elements Boolean

return context ID

Metadata label Domain Range
return context ID FACTS Framework Boolean

return evidence

Metadata label Domain Range
return evidence Concept Fact Boolean

Return Group Fact ID

  • return group fact ID - Return the GUID for the fact that groups the context.
Metadata label Domain Range
return group fact ID FACTS Framework Boolean

return model name as prefix

Metadata label Domain Range
return model name as prefix Document Type Boolean

Return Raw Text Also

  • return raw text also - As well as the fact found, this will return the raw text of the context where the fact that was found. This can be useful if you wish to see WHERE your fact came from – is it the right one? Was it normalized correctly?.
Metadata label Domain Range
return raw text also Fact Context Elements Boolean

Return Value Using Regex

  • return value using regex - Return the fact after processing it with a regular expression to clean it up or transform it somehow. It is usually used to clean up punctuation and so on.
Metadata label Domain Range
return value using regex Fact Context Elements String
TitleResults for “How to create a CRG?”Also Available inAlert