FACTS Concept Metadata

Save PDF

Last Updated: July 8, 2026
6 minute read

Semaphore
Documentation

Capturing Greedy/Non-Greedy Skip Counts

All facts using a Captured Fact class are required to have a capturing greedy skip count or a capturing non-greedy skip count.

But, capturing greedy skip count and capturing non-greedy skip count CANNOT be used for the same Captured Fact.
One, and only one, value is allowed for each Captured Fact.

The capturing greedy skip count is used to set the number of grammatical units that should be captured by a Captured Fact. Because it is greedy, this metadata will match the longest sequence it can find.

The capturing non-greedy skip count is used to set the number of grammatical units should be captured by the Captured Fact. Because it is non-greedy, this metadata will match the shortest sequence it can find.

Example:

For a sequence where AnchorA starts the sequence, and Anchor B ends it:

  "AnchorA some words we are not interested in AnchorB and wish to skip AnchorB"

A capturing greedy skip count would find the last AnchorB occurrence, and hence the captured fact would be:

“some words we are not interested in AnchorB and wish to skip”

But, a capturing non-greedy skip count would find the first AnchorB occurrence, and hence the captured fact would be:

“some words we are not interested in”

Metadata label	Domain	Range
capturing greedy skip count	Captured Fact	Integer
capturing non-greedy skip count	Captured Fact	Integer

Context Position from Document Start/End

Valid for all Context Elements (Anchors, Skips, Facts, and Contexts), but context position from document start and context position from document end are not required by any element.

context position from document start and context position from document end CANNOT be used for the same Context Element.
One, and only one, value is allowed for each element.

These metadata allows the selection, if there are more than one, of which sequence to look for the facts or facts in.

The context position from document start metadata works from the first occurrence of the extractor in its context.

To select the first one, set the metadata to 1
To select the second extractor, set the metadata to 2
And so on . . .

The context position from document end metadata works from the last occurrence of the extractor in its context.

To select the last one, set the metadata to 1
To select the penultimate extractor, set the metadata to 2
And so on . . .

Metadata label	Domain	Range
context position from document start	Context Elements	Integer
context position from document end	Context Elements	Integer

Document Type as top group

Metadata label	Domain	Range
Document Type as top group	Document Type	Boolean

Fact Position from Context Start/End

Valid for all Context Elements (Anchors, Skips, Facts, and Contexts), but fact position from context start and fact position from context end are not required by any element.

fact position from context start and fact position from context end CANNOT be used for the same Context Element.
One, and only one, value is allowed for each element.

The fact position from context start metadata works from the first occurrence of the extractor in its context.

To select the first one, set the metadata to 1
To select the second extractor, set the metadata to 2
And so on . . .

The fact position from context end metadata works from the last occurrence of the extractor in its context.

To select the last one, set the metadata to 1
To select the penultimate extractor, set the metadata to 2
And so on . . .

Metadata label	Domain	Range
context position from document start	Context Elements	Integer
context position from document end	Context Elements	Integer

Fact Presence

If the fact presence metadata is set, then instead of extracting the fact’s either normalised or nonnormalised value, it will return the value TRUE.
The value FALSE could be deduced from the absence of the value TRUE.

Metadata label	Domain	Range
fact presence	Fact Context Elements	Boolean

Field

field - Filter the field(s) we want facts or anchors to all appear in.
By default, the extractor will look in all fields.
This can be set on any fact or extractor, but it best set as high up as you wish, as essentially it is a filter.

Metadata label	Domain	Range
field	Context Elements	String

Greedy/Non-Greedy Repeats

The {1::-10::}greedy repeat and {1::-10::}non-greedy repeat metadata can be used with any element of any Context to allow the element to be repeated up to some number set by the metadata.
To specify which element of the extractor you wish to be repeated, select the matching prefix sequence order. Currently this goes up to 10.
The repeat can either be greedy or non-greedy.
- The greedy version will look for the maximum number of repeats that satisfy the sequence.
- The non-greedy version will look for the minimum number of repeats that satisfy the sequence.
A greedy repeat and a non-greedy repeat CANNOT be used on the same element.
One, and only one, value is allowed for each element.

Metadata label	Domain	Range
{1::-10::}greedy repeat	Contexts	Integer
{1::-10::}non-greedy repeat	Contexts	Integer

  * **Greedy repeats**
    * 1::greedy repeat
    * 2::greedy repeat
    * 3::greedy repeat
    * 4::greedy repeat
    * 5::greedy repeat
    * 6::greedy repeat
    * 7::greedy repeat
    * 8::greedy repeat
    * 9::greedy repeat
    * 10::greedy repeat

  * **Non-greedy repeats**
    * 1::non-greedy repeat
    * 2::non-greedy repeat
    * 3::non-greedy repeat
    * 4::non-greedy repeat
    * 5::non-greedy repeat
    * 6::non-greedy repeat
    * 7::non-greedy repeat
    * 8::non-greedy repeat
    * 9::non-greedy repeat
    * 10::non-greedy repeat

Greedy/Non-Greedy Skip Counts

Either a greedy skip count or non-greedy skip is required for elements using the Skip class.
But, greedy skip count and non-greedy skip count CANNOT be used for the same Skip.
One, and only one, value is allowed.

A greedy skip count eats the maximum number of tokens it can to satisfy the sequence. If there are two anchors that could both end a valid sequence, it will select the last one. A non-greedy skip count eats the minimum number of tokens it can to satisfy the sequence. If there are two anchors that could both end a valid sequence, it will select the first one.

Example: For a sequence where AnchorA starts the sequence, and Anchor B ends it:

  "AnchorA some words we are not interested in AnchorB and wish to skip AnchorB"

A greedy skip would find the last AnchorB occurrence, and hence the selected text would be:

  * "some words we are not interested in AnchorB and wish to skip"

But, a greedy skip would find the first AnchorB occurrence, and hence the selected text would be:

  * "some words we are not interested in"

Metadata label	Domain	^Range ^
greedy skip count	Skip	Integer
non-greedy skip count	Skip	Integer

Width for Context

Valid for Sequenced and Unsequenced Contexts using Sentences or Paragraphs, but width for context is not required by any context.

Metadata label	Domain	Range
width for context	Contexts	Integer

Loose

loose - loosen the constraint that each element must be in its own grammatical unit.
This metadata breaks the restriction for extractors that rely on anchors and facts appearing in different sentences or paragraphs. Anchors and facts can now appear in the same sentence or paragraph.
So, if we have a sequenced extractor that is looking for a sequence of paragraphs, the usual default rule is that those elements need to appear in different paragraphs (in sequence).
- If we set the extractor to be loose, then the elements can appear in the same paragraph (although they do need to appear in order, still).

Metadata label	Domain	Range
loose	Sequenced Contexts	Boolean

Near Counts

A near count is used to set the number of tokens that may be skipped between grammatical units in a near sequence (ordered or unordered).
Mandatory for Contexts using any of the Near Contexts.
One, and only one, value is allowed.

If the near sequence is working on paragraphs or sentences, then this metadata still sets the number of word tokens that can appear between those elements across the sequence (that is, it does not count sentences or paragraphs that a sequence extractor would - it is always counting word tokens).

Metadata label	Domain	Range
near count	Near Contexts	Integer

negate

Metadata label	Domain	Range
negate	Fact Context Elements	Boolean

return context ID

Metadata label	Domain	Range
return context ID	FACTS Framework	Boolean

return evidence

Metadata label	Domain	Range
return evidence	Concept Fact	Boolean

Return Group Fact ID

return group fact ID - Return the GUID for the fact that groups the context.

Metadata label	Domain	Range
return group fact ID	FACTS Framework	Boolean

return model name as prefix

Metadata label	Domain	Range
return model name as prefix	Document Type	Boolean

Return Raw Text Also

return raw text also - As well as the fact found, this will return the raw text of the context where the fact that was found. This can be useful if you wish to see WHERE your fact came from – is it the right one? Was it normalized correctly?.

Metadata label	Domain	Range
return raw text also	Fact Context Elements	Boolean

Return Value Using Regex

return value using regex - Return the fact after processing it with a regular expression to clean it up or transform it somehow. It is usually used to clean up punctuation and so on.

Metadata label	Domain	Range
return value using regex	Fact Context Elements	String

The Semaphore Fact Extraction Framework (FACTS)