Semaphore information architecture
- Last Updated: May 13, 2026
- 2 minute read
- Semaphore
- Documentation
SKOS-XL and concepts, labels, relationships and metadata - all multi-lingual
At the heart of Semaphore's architecture is its support for SKOS-XL, a W3C standard for representing controlled vocabularies and taxonomies. Semaphore extends this with enterprise-grade features to support multilingual, governed, and richly contextualized knowledge models.
Concepts
A concept is a unit of meaning---such as "Contract," "Customer Complaint," or "Clinical Trial."
Concepts are language-neutral identifiers that serve as anchors for metadata tagging and classification.
Labels
Each concept can have multiple labels, including:
-
Preferred terms
-
Synonyms
-
Acronyms
-
Abbreviations
Labels are language-aware, allowing the same concept to be recognized in multiple languages using language packs.
Relationships
Concepts are linked through semantic relationships:
-
Hierarchical: broader/narrower (e.g., "Contract" > "Employment Contract")
-
Associative: related concepts (e.g., "Contract" ↔ "Obligation")
-
Equivalence: same-as or alias relationships
These relationships form the basis of ontologies and taxonomies, enabling rich navigation and inferencing.
Metadata
Concepts can carry metadata such as:
-
Definitions
-
Governance status (e.g., draft, approved)
-
Source references
-
Usage notes
This metadata supports governance, auditability, and explainability.
Rule-base classes, configuration sets, variants
Once the semantic model is defined, Semaphore uses a rule-based engine to apply it to content. This engine is highly configurable and supports multilingual, domain-specific, and context-sensitive classification.
Rule Classes
Logical groupings of classification rules that apply to specific domains or content types.
For example, a "Legal" rule class might include rules for identifying contract types, clauses, and parties.
Configuration Sets
Bundles of rule classes, language settings, and model versions.
Configuration sets define how classification is executed in a given context (e.g., for a specific business unit or region).
Variants
Variants allow rules to behave differently based on:
-
Language
-
Document type
-
Metadata values
This enables contextual classification, such as tagging "agreement" differently in legal vs. procurement documents.
Multilingual Support
Rules can be authored and executed in multiple languages using language packs.
This ensures consistent classification across global content repositories.
Documents, articles, paragraphs, sentences, words, tokenisation and NLP - all multi-lingual using language packs
Semaphore processes content at multiple levels of granularity, enabling both broad classification and fine-grained fact extraction.
Document-Level
Entire files such as PDFs, Word documents, HTML pages, or XML feeds.
Metadata is applied at the document level for indexing, routing, and governance.
Article-Level
Logical sections within documents (e.g., news articles, policy sections).
Useful for content that aggregates multiple topics or entities.
Paragraph and Sentence-Level
Enables precise tagging and extraction of localized information.
Supports use cases like clause detection in contracts or sentiment analysis in feedback.
Word and Token-Level
Semaphore uses tokenization to break text into words, phrases, and symbols.
Tokens are analyzed using:
-
Named Entity Recognition (NER)
-
Part-of-Speech (POS) tagging
-
Pattern-based extraction
These techniques enable fact extraction, such as identifying "Company A acquired Company B for $X."
Multilingual NLP
All NLP operations are language-aware and powered by language packs.
This includes tokenization, grammar rules, and entity recognition tailored to each supported language.
Putting It All Together
Semaphore's information architecture is not just a technical framework---it's a semantic foundation for enterprise intelligence. It enables:
-
Consistent metadata enrichment across languages and systems
-
Explainable AI by grounding outputs in structured knowledge
-
Scalable automation through rule-based classification and NLP
-
Governed knowledge management with full auditability and version control