Facts extracted by context extractors

Save PDF

Last Updated: July 8, 2026
5 minute read

Semaphore
Documentation

Facts Extracted by Context Extractors

A Context Extractor is the element in the framework that matches the context in which information appears. Since information can appear in many different contexts, we must have a variety of context extractors. We have defined a collection of context extractors to help extract facts for these different cases.

A context extractor defines a sequence of elements that must match the information’s context. The composition and order of these elements are critical.

To assist with this, we have defined a set of context extractors and elements that cover many typical sequences needed for fact extraction in Semaphore. There are currently 30 context extractors, grouped into families that share common characteristics but differ in how they treat punctuation or which grammatical unit they sequence.

The context extractors are categorized as follows:

Ordered List Item Context Extractors: Impose a strict sequence on the elements (i.e., "Sequenced Contexts").
Context Extractors with Unordered Elements: Allow elements to occur in any order within a number of words (i.e., "Near Unordered Contexts").
Near Ordered Context Extractors: Impose a strict sequence on elements, all within a number of words (i.e., "Near Ordered Contexts").
Unsequenced Context Extractors: Allow elements in any order within a single grammatical unit.
Meta-extractors: Use another context extractor to define the context for other elements (e.g., "Fact In Window").

We discuss the details of each extractor in other sections. The key point is that all sequence rules from Semaphore are available, along with essential meta-extractors such as "Extractor In Window," which requires a more basic extractor to define its context.

Some context extractors differ in the grammatical units they sequence (paragraphs, sentences, phrases, words, etc.) and how they treat punctuation (ignore all, pay attention to all, ignore in sentences, ignore in paragraphs).

Context Extractors work closely with the type of fact being extracted. Some are more suitable for certain types of facts.

The concept class hierarchy of context extractors is shown in the List of Contexts. This is an abstract Concept Class (no concept can be made an instance of it in the model). It represents the group of contexts used to define sequences for fact extraction.

Many properties use this concept class as a domain, as they are common to most child contexts for building sequences.

Below is the concept class hierarchy of predefined contexts. Some are marked as "Abstract" (underlined) in the model structure, meaning they should not be used to create actual contexts in a project model—they serve as organizational classes or to attribute certain properties.

Contexts
- Logical Contexts
  - All Anchors
  - Any Anchor
  - All Facts
  - Any Facts
- Fact In Window
- Near Contexts
  - Near Ordered Contexts
    - Near Ordered Across Paragraphs In Document
    - Near Ordered Across Phrases Contexts
      - Near Ordered Across Phrases In Document
      - Near Ordered Across Phrases In Paragraph(s)
      - Near Ordered Across Phrases In Sentence(s)
    - Near Ordered Across Sentences Contexts
      - Near Ordered Across Sentences In Document
      - Near Ordered Across Sentences In Paragraph(s)
  - Near Unordered Contexts
    - Near Unordered Across Paragraphs In Document
    - Near Unordered Across Phrases Contexts
      - Near Unordered Across Phrases In Document
      - Near Unordered Across Phrases In Paragraph(s)
      - Near Unordered Across Phrases In Sentence(s)
    - Near Unordered Across Sentences Contexts
      - Near Unordered Across Sentences In Document
      - Near Unordered Across Sentences In Paragraph(s)
- Sequenced Contexts
  - Repeated Fact Per Each Taxonomy Fact Contexts
    - Repeated Fact Per Each Taxonomy Fact Across Paragraphs In Document
    - Repeated Fact Per Each Taxonomy Fact Across Phrases Contexts
      - Repeated Fact Per Each Taxonomy Fact Across Phrases In Document
      - Repeated Fact Per Each Taxonomy Fact Across Phrases In Paragraph(s)
      - Repeated Fact Per Each Taxonomy Fact Across Phrases In Sentence(s)
    - Repeated Fact Per Each Taxonomy Fact Across Sentences Contexts
      - Repeated Fact Per Each Taxonomy Fact Across Sentences In Document
      - Repeated Fact Per Each Taxonomy Fact Across Sentences In Paragraph(s)
  - Sequenced Across Contexts
    - Sequenced Across Paragraphs In Document
    - Sequenced Across Phrases Contexts
      - Sequenced Across Phrases In Document
      - Sequenced Across Phrases In Paragraph(s)
      - Sequenced Across Phrases In Sentence(s)
    - Sequenced Across Sentences Contexts
      - Sequenced Across Sentences In Document
      - Sequenced Across Sentences In Paragraph(s)
- Unsequenced Contexts
  - Unsequenced In Document
  - Unsequenced In Paragraph(s)
  - Unsequenced In Sentence(s)

Fact Elements

Use the "fact" hierarchical relationship to model any fact elements (the inverse is "fact in").
Use the Ordered List of the context to model where in the sequence this fact occurs. The fact element can be a fact or another context.

Not Fact Elements

Use the "not fact" hierarchical relationship to model any negated fact elements (the inverse is "not fact in").
Use the Ordered List of the context to model where in the sequence the negative fact occurs. The negated fact element can be a fact or another context. As long as the fact or context's sequence is not found, the parent sequence defining the negation will fire.

Captured Fact Elements

You can extract simple text as-is by capturing it.

Use the "captured fact" hierarchical relationship to model any captured fact elements (the inverse is "captured fact in"). The captured fact element can only be raw text.
Use the Ordered List of the context to model where in the sequence this captured fact occurs.
A capturing greedy skip count or capturing non-greedy skip count metadata is required with all captured facts.

Skip Elements

Use the "skip" hierarchical relationship to model any skip elements (the inverse is "skip in"). The skip element cannot be the first or last element in the context’s sequence.
Use the Ordered List of the context to model where in the skip occurs.
A capturing greedy skip count or capturing non-greedy skip count metadata is required with all skips.

Anchor Elements

Use the "anchor" hierarchical relationship to model any anchor elements (the inverse is "anchor in").
Use the Ordered List on the context to model where in the sequence this anchor should occur.

Not Anchor Elements

Use the "not anchor" hierarchical relationship to model any negative anchor elements (the inverse is "not anchor in").
Use the Ordered List of the context to model where in the sequence this negative anchor comes.

Repeating Elements

Any element in the context (if in a sequence) can be repeated, either greedily or non-greedily.

Use a "greedy repeat" metadata to model how many times the element with the ordering prefix should repeat. A greedy repeat finds the longest match, extracting the repeated element up to the specified number of repeats.
Use a "non-greedy repeat" metadata to model how many times the element should repeat. A non-greedy repeat finds the shortest match, extracting the repeated element up to the specified number of repeats.

Precluding an Extractor

Each context can preclude or be precluded by another context. If two contexts are found, one will take precedence.

Use the "precludes_context" associative relationship to specify when this context should preclude another (if both are found, this one is returned).
Use the "precluded by context" associative relationship to specify when this context should be precluded by another (if both are found, the other is returned).

Changing the Default Punctuation Rules

Contexts have default punctuation settings, but you can override these as needed.

Use the "punctuation rule" associative relationship to specify the punctuation setting for the context (the inverse is "punctuation rule of").

Details about alternative settings are discussed in specific context sections.

The Semaphore Fact Extraction Framework (FACTS)