Semaphore information architecture

Save PDF

Last Updated: July 8, 2026
2 minute read

Semaphore
Documentation

SKOS-XL and concepts, labels, relationships and metadata - all multi-lingual

At the heart of Semaphore's architecture is its support for SKOS-XL, a W3C standard for representing controlled vocabularies and taxonomies. Semaphore extends this with enterprise-grade features to support multilingual, governed, and richly contextualized knowledge models.

Concepts

A concept is a unit of meaning---such as "Contract," "Customer Complaint," or "Clinical Trial."

Concepts are language-neutral identifiers that serve as anchors for metadata tagging and classification.

Labels

Each concept can have multiple labels, including:

Preferred terms
Synonyms
Acronyms
Abbreviations

Labels are language-aware, allowing the same concept to be recognized in multiple languages using language packs.

Relationships

Concepts are linked through semantic relationships:

Hierarchical: broader/narrower (e.g., "Contract" > "Employment Contract")
Associative: related concepts (e.g., "Contract" ↔ "Obligation")
Equivalence: same-as or alias relationships

These relationships form the basis of ontologies and taxonomies, enabling rich navigation and inferencing.

Metadata

Concepts can carry metadata such as:

Definitions
Governance status (e.g., draft, approved)
Source references
Usage notes

This metadata supports governance, auditability, and explainability.

Rule-base classes, configuration sets, variants

Once the semantic model is defined, Semaphore uses a rule-based engine to apply it to content. This engine is highly configurable and supports multilingual, domain-specific, and context-sensitive classification.

Rule Classes

Logical groupings of classification rules that apply to specific domains or content types.

For example, a "Legal" rule class might include rules for identifying contract types, clauses, and parties.

Configuration Sets

Bundles of rule classes, language settings, and model versions.

Configuration sets define how classification is executed in a given context (e.g., for a specific business unit or region).

Variants

Variants allow rules to behave differently based on:

Language
Document type
Metadata values

This enables contextual classification, such as tagging "agreement" differently in legal vs. procurement documents.

Multilingual Support

Rules can be authored and executed in multiple languages using language packs.

This ensures consistent classification across global content repositories.

Documents, articles, paragraphs, sentences, words, tokenisation and NLP - all multi-lingual using language packs

Semaphore processes content at multiple levels of granularity, enabling both broad classification and fine-grained fact extraction.

Document-Level

Entire files such as PDFs, Word documents, HTML pages, or XML feeds.

Metadata is applied at the document level for indexing, routing, and governance.

Article-Level

Logical sections within documents (e.g., news articles, policy sections).

Useful for content that aggregates multiple topics or entities.

Paragraph and Sentence-Level

Enables precise tagging and extraction of localized information.

Supports use cases like clause detection in contracts or sentiment analysis in feedback.

Word and Token-Level

Semaphore uses tokenization to break text into words, phrases, and symbols.

Tokens are analyzed using:

Named Entity Recognition (NER)
Part-of-Speech (POS) tagging
Pattern-based extraction

These techniques enable fact extraction, such as identifying "Company A acquired Company B for $X."

Multilingual NLP

All NLP operations are language-aware and powered by language packs.

This includes tokenization, grammar rules, and entity recognition tailored to each supported language.

Putting It All Together

Semaphore's information architecture is not just a technical framework---it's a semantic foundation for enterprise intelligence. It enables:

Consistent metadata enrichment across languages and systems
Explainable AI by grounding outputs in structured knowledge
Scalable automation through rule-based classification and NLP
Governed knowledge management with full auditability and version control

Semaphore Overview