Introduction
- Last Updated: May 13, 2026
- 2 minute read
- Semaphore
- Documentation
Introduction
FACTS has been developed by Progress to make advanced information extraction easily available to all Semaphore users.
It orchestrates several elements of Semaphore’s technologies to provide the information scientist / taxonomist with a powerful and sophisticated fact extraction and classification framework via a user-friendly interface.
It consists of:
- A methodology for fact extraction and classification that leverages the rule-based linguistic power inherent in Semaphore.
- A user-friendly, browser-based graphical user interface that supports this methodology.
- A special publisher configuration that auto-generates the rules to implement the methodology.
This document focuses on the methodology and framework, not the implementation details. (For those interested, FACTS is implemented using Velocity templates. Please contact your account management team for more information.)
Functional Goal for the Fact Extraction Framework
The functional goal of the FACTS framework is to enable users who understand their content to model aspects of that content for information extraction. For example, if a user knows that their content—such as recipes—always includes a cooking time, the creator, and the publication source, and understands how these are represented, they can expect to extract this information.
The FACTS Methodology in a Nutshell
The FACTS methodology works by first identifying / classifying the document it will extract facts from. This essentially allows us the freedom to write better extractors without having to worry about them firing on content we don’t want to extract from. FACTS does this by looking for textual evidence that uniquely identifies each document type among all the other documents it might encounter.
Once a document type is identified, users can model the types of information, or facts, to extract from each type. Not every document will contain all desired information, but it is present often enough to justify extraction. The working assumption is that the information exists as text in identifiable locations.
How are these locations identified? There are several methods, but generally, the process involves searching for common textual or locational anchors typically found near the information of interest. Alternatively, the information may have enough inherent structure to be identified and extracted without additional context. This requires a deep understanding of the content being processed.
Information in unstructured documents is often presented in various ways. The success of a fact extraction project depends on the information scientist’s skill in modeling these variations within the framework as efficiently, robustly, and accurately as possible.
Next Section: What is a fact?