Architectural overview
- Last Updated: May 13, 2026
- 4 minute read
- Semaphore
- Documentation
Semaphore architecture
After completing the steps outlined in the “About this guide” document a functioning installation of the core Semaphore software should be available. The following software is installed:
- Semaphore Studio - This application includes user access to Semaphore tools such as Knowledge Model Management and Document Analyzer.
- Classification Server - Application that automatically classifies documents sent to it (against a rulebase set).
- Semantic Enhancement Server - This software provides model information to external applications such as a CMS and/or search interfaces.
- Reconciliation Server - This software provides the back end for the mapping functionality within KMM or the third party OpenRefine application
- Precision and Recall Server - This software provides a regression tool for measuring classification quality.
- (Optional) Publisher - Although Publishing functionality is available within KMM out of the box (from Semaphore 5.6), there are some circumstances in which it is desirable to run the publisher from the command line or from a server other than that running Studio. For these cases, the Publisher standalone is available.
Semaphore client applications
This section details the various “client” applications of the Semaphore application suite. It should be noted, however, that these applications can all, of course, also be installed on the server for testing purposes.
Semaphore server applications
This section details the various “server” applications of the Semaphore application suite. These applications provide services that are utilized by other components of the suite as well as any specific integrations. Generally these applications do not have user interfaces.
Semaphore Studio
This is an application consisting of three sub-components:
- Knowledge Model Management - This component provides the user with the ability to update models that are stored in either a build-in (Jena TDB) or external (MarkLogic) triple store. This component is controlled with the Linux service “semaphore-kmm”.
- Document Analyzer - This component provides the user with the ability to perform advanced classification analysis on documents. This component is controlled with the Linux service “semaphore-da”.
- SES SDK - This component demonstrates some common use cases for “Semantic Enhancement Server” including providing sample code that can be copied and pasted into any integration.

The application can be controlled with the Linux service “semaphore” which will controls all components.
Classification Server
Classification Server takes documents submitted to it, analyses their content and generates a set of terms indicating the relevance of those terms to the content. Terms returned are based upon the rules stored in the “rulebases” created by “Publisher” which, in turn, is based upon the information stored in the model. The application can be controlled with the Linux service “semaphore-cs”.
Rulebases come in two formats:
- XML (raw) rulebase files, one for each (classification) term, and
- PAK files which are compressed, single-file versions of (1) (typically generated as one file for each class).
The interface to Classification Server is via a standard HTTP GET/POST request sent in XML format (via the port the server is listening on) with results returned in XML format. A test interface for Classification Server is accessible via a web browser by accessing the URL “http://<server>:<port>”

Classification Server, generally, runs as a Linux “systemd” service and listens on standard TCP/IP port(s). These ports are, by default 5059 for the above admin interface and 5058 for the back-end classification interface, in earlier versions Classification Server listens on port 5058 only. The “admin” interface is only accessible from outside of the installation server if the IP address restrictions have been altered in the default configuration (by default only “localhost” access is allowed).
Classification Server can be executed manually from the command line by running:
ClassificationServer <configuration file>
The default configuration file is “/etc/opt/semaphore/CS/conf/config.xml” (which, amongst other things, configures the TCP/IP port number).
Note: The optimal use of system resources has Classification Server configured to use all CPUs installed on the server (set the “workers” value in the configuration file to be the number of cores). Consideration should be given to allow for any other active programs on the same computer to ensure they have adequate resources (for example, use a setting one less than the number of CPUs to allow for these other programs to execute). Additional consideration should be made to any restrictions imposed by the Progress licence which may only allow a certain number of workers/instances of Classification Server to be configured.
See the “welcome” for further information.
Semantic Enhancement Server
Semantic Enhancement Server contains “indexes” that are copies of models that can be queried by integrations. The information is provided in a highly optimised and performant format to be used in end-user applications. The application can be controlled with the Linux service “semaphore-ses”.
Semaphore architectural summary
The following shows the flow of information through the various components of the system:

The end user accesses “Studio” (via the web browser) then updates the model information in the Studio “Knowledge Model Management” component, via the “Models” link (or any other link to models found in Studio). The model information is stored in a “model database” which is a “triple store” that can be the built-in Jena TDB or external MarkLogic server. The “Publisher” pushes the information from “Knowledge Model Management” into an index which is used by “Semantic Enhancement Server”. A CMS then accesses “Semantic Enhancement Server” to display model information to the (CMS) user. Publisher also pushes model information into rulebases stored in “Classification Server”. “Classification Server” is accessed by the CMS to classify documents whose classification information is then returned for use during the search process.