Facts in content - document metadata

Save PDF

Last Updated: July 8, 2026
2 minute read

Semaphore
Documentation

Document Metadata

Document Metadata is an abstract way of declaring that a document contains information of interest, either as a Document Anchor (to identify with) or as a Document Fact (to extract and/or identify with).

In essence, Document Metadata specifies the information of interest in any Document Type.

Each Document Metadata:

If it is a Document Fact, requires at least one extractor.
If it is a Document Anchor, requires at least one identifier.

A Document Metadata may be related to a Document Type in one of three ways:

Its only purpose is to identify a Document Type, allowing other Document Facts to then extract facts. In this case, it is a Document Anchor.
Its only purpose is to extract a fact, after the document has been identified by another Document Fact. In this case, it is a Document Fact.
It serves both purposes: identifying a Document Type and extracting a fact. In this case, it is a Document Fact.

For a Document Fact to be returned, the Document Type must be identified. Therefore, for any Document Fact to be returned, it or another Document Metadata must have identified the Document Type.

Identifies a Document Type

Document Facts always extract facts. Sometimes, those facts can simply be information used to identify a Document Type. In some sense, these can be considered secondary facts.

If you are using an abstract Document Type (where identifying a specific Document Type is not important or required, and there may be no unique way of identifying it beyond the presence of a certain fact), you still require a Document Fact to identify even the abstract Document Type. In this case, the Document Fact both extracts a fact and identifies the Document Type. If the fact is found in the document, it is enough to identify the Document Type and have the fact returned.

Extracts a Fact

Document Facts always extract facts. If the Document Type is identified and the Document Fact finds a fact, it will be extracted and returned.

A Document Fact has children called Context Extractors. Context Extractors perform the actual extraction of facts from text by matching rule sequences with information contexts. There are many Context Extractors that extract facts from various contexts.

The Semaphore Fact Extraction Framework (FACTS)