Synchronized Publishing of Models Hosted in the Cloud

Save PDF

Last Updated: May 13, 2026
9 minute read

Semaphore
Documentation

It is possible to have a model hosted in the cloud published both to Semantic Enhancement Server or Classification Server in the Cloud and to Semantic Enhancement Server or Classification Server hosted in your own premise behind a firewall.

Because it is not possible for the cloud-hosted instance to initiate a connection to the on-premise servers, we need to start the process in the cloud and set the on-premise instance to monitor the cloud-hosted server for publish events. If the on-premise instance, detects that a publish has occurred in the cloud, then it will download the model and configuration from that publish event and run a publish locally.

To ensure that rules are generated identically between the on-premise and cloud-hosted instances, we can share a considerable amount of the Publisher configurations. However, clearly the locations of the Classification and Semantic Enhancement Servers will be different between the cloud-hosted and on-premise instances (in fact from 5.2 the locations of these services will be read from the environments defined within the Semaphore application, but these are not available to the on-premise instance) and the cloud-hosted publish event will need to generated a packet of data that they the on-premise Publisher can use so there are parts of the configurations that cannot be hosted.

We also need to configure a small utility application - the Remote Publisher Wrapper that runs in the on-premise environment. This wrapper application should be run frequently (via a cronjob or a windows scheduled task). It will ask the cloud-hosted instance when it last was published. If the cloud-hosted last publish date is later than the on-premise last publish date, then the wrapper will download the artifacts from that publish event and then start the on-premise Publisher.

This on-premise Publisher is configured in the same way as the the cloud-hosted one, except that the model is loaded from the data downloaded, and for the changes noted above.

When setting up this configuration (which is admittedly a little complex) it is important to remember for each task that the publisher needs to do, is it part of the cloud-hosted only process (put it in the cloud only configuration), part of the on-premise publish only (put it in the on-premise only configuration file), part of both processes (put it in the shared configuration file) or part of the wrapper function (put it in the remote wrapper configuration file).

The common configuration file

Here we show a very simple configuration file. It defines an abstract list of all the configuration sets that are to be used (generally there will be different configuration sets describing different parts of the model to be treated differently at publish time).

We then have each of the configuration sets defined. In this case, it is kept simple, but for general applications, it is this configuration file that mush be kept synchronized between the cloud-hosted and on-premise installations - and so is downloaded as part of the data transfer between the two.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring-beans.dtd">

<beans default-lazy-init="true">
    <bean id="configurationSetList" abstract="true" >
        <property name="configurationSets">
            <list>
                <ref bean="Topics" />
                <ref bean="Locations" />
            </list>
        </property>
        <property name="modelUpdater" ref="OEUpdater" />
    </bean>

    <bean id="Topics" parent="ConceptTree" >
        <property name="startIdentifiers">
            <list>
                <value>c41152a6-311f-3759-89b4-e9901f35703f</value> <!-- Topics concept scheme -->
            </list>
        </property>
        <property name="includeStartPoints" value="false" />
        <property name="outputProcessors">
            <list>
                <bean id="TopicsRules" parent="RulebaseWriterTemplate">
                    <property name="rulebaseClassName" value="Topics" />
                    <property name="templateFileName" value="TopicTemplate.kid" />
                </bean>
                <ref bean="rulebasePublisher" />
            </list>
        </property>
    </bean>

    <bean id="Locations" parent="ConceptTree" >
        <property name="startIdentifiers">
            <list>
                <value>cc88d81d-a838-3260-8a92-4adf19167e15</value> <!-- Locations concept scheme -->
            </list>
        </property>
        <property name="includeStartPoints" value="false" />
        <property name="outputProcessors">
            <list>
                <bean id="TopicsRules" parent="RulebaseWriterTemplate">
                    <property name="rulebaseClassName" value="Locations" />
                    <property name="templateFileName" value="LocationTemplate.kid" />
                </bean>
                <ref bean="rulebasePublisher" />
            </list>
        </property>
    </bean>

    <import resource="file:${resources.directory}/import/ConfigurationSets.xml" />

</beans>

The cloud configuration file

This is the configuration file that should be selected in the Knowledge Model Management Publish interface in the cloud-hosted instance.

Note we here define the “Publisher” bean, this inherits the configuration set list from the imported common file above. It adds one configuration set to this list, which is the backup configuration defined below.

The backup configuration creates a zip file containing the listed models (and the TCH graphs if the appropriate value is set to true). We also package here all the XML configuration files and all the files in the template directory. This processor also requires access to the predefined model updater object so that the details of created artifacts can be written back to the model for remote determination. If you have linked models into the one you have published, then you should add them to the “modelsToInclude” list.

In this example, we have chosen to define the Classification Server instance. This means that we can use a different classification server location between the cloud and on-premise installations. (If in both cases you were using localhost, you could put this definition in the common configuration.)

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring-beans.dtd">

<beans default-lazy-init="true">
    <bean class="com.smartlogic.workbench.publisher.Configuration">
        <property name="description" value="Hybrid Publisher (Cloud configuration)" />
        <property name="environments">
            <list />
        </property>
    </bean>

    <bean class="com.smartlogic.publisher.Publisher" parent="configurationSetList" >
        <property name="model" ref="SparqlEndpoint" />
        <property name="configurationSets">
            <list merge="true" >
                <ref bean="backupConfiguration" />
            </list>
        </property>
        <property name="modelUpdater" ref="OEUpdater" />
    </bean>

<!-- create a backup copy of the model that can be retrieved by the remote server -->
    <bean id="backupConfiguration" parent="AllResources">
        <property name="outputProcessors">
            <list>
                <ref bean="environmentSolrWriterJson" />
                <bean class="com.smartlogic.publisher.backup.PackageBuilder">
                    <property name="outputFileName" value="${model.name}.zip" />
                    <property name="filesToInclude">
                        <list>
                            <value>${config.directory}/${model.name}/templates/*.kid</value>
                            <value>${config.directory}/${model.name}/templates/*.vm</value>
                            <value>${config.directory}/${model.name}/*.xml</value>
                        </list>
                    </property>
                    <property name="includeTch" value="true" />
                    <property name="format" value="TTL" />
                    <property name="modelsToInclude">
                        <list>
                            <value>${model.name}</value>
                            <value>LinkedModel</value>
                        </list>
                    </property>
                </bean>
            </list>
        </property>
    </bean>
    
    <bean id="rulebasePublisher" parent="environmentCSWriter" />

    <!-- Import configuration sets -->
    <import resource="file:${config.directory}/${model.name}/Semaphore-Publisher_Common.xml" />
    
    <!-- The following import lines import many default configuration settings
         that will not usually be altered.
         Therefore be careful editing anything below here -->
    <import resource="file:${resources.directory}/import/ModelInterface.xml" />
    <import resource="file:${resources.directory}/import/ModelDefinition.xml" />
    <import resource="file:${resources.directory}/import/RulebaseStructure.xml" />
    <import resource="file:${resources.directory}/import/SESConfiguration.xml" />
    <import resource="file:${resources.directory}/import/ConfigurationSets.xml" />

</beans>

Note 1, in this configuration file we are importing the common config file. This effectively means that the two config files are considered one when assembling the overall configuration.

Note 2, in this configuration we have selected to download all the xml files in the model configuration directory and all the kid and vm files in the templates directory. If you have other files that are required in the publish, then ensure that you add these to the list.

Note 3, we are publishing to SES in the cloud. We only need to define the environmentSolrWriterJson as the details of the SES instance will be read off of the environment defined within the Semaphore application.

Once these files are configured we can publish in the cloud. It’s best to do this first until you are getting the results you are expecting in the cloud.

Publishing on-premise - the Remote Wrapper Configuration

As the cloud environment is Linux, it is easier to publish on a Linux on-premise than a windows box. However, it is entirely possible, just be careful with file paths - especially in the common configuration file.

When we publish on-premise we use a Remote Publisher Wrapper. This is a Java application that sits around the actual Publisher application. It checks the status of the local and remote publishes and if necessary, downloads the package built on the cloud and uses it to publish.

We need a configuration file defined for this wrapper.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring-beans.dtd">

<beans default-lazy-init="true">

    <description>An example configuration file for the remote publisher wrapper</description>

    <bean class="com.smartlogic.publisher.remote.RemotePublisherWrapper" >
        <property name="localDateStorageFile" value="/opt/semaphore/Publisher/LocalDateStorage.bin" />
        <property name="tchGraph" value="urn:x-evn-master:MyModel.tch" />
        <property name="modelUri" value="urn:x-evn-master:MyModel" />
        <property name="tempDirectory" value="/opt/semaphore/Publisher/remotetmp/MyModel" />
        <property name="fileDownloader" >
            <bean class="com.smartlogic.publisher.remote.OEArtefactFileDownloader" >
                <property name="sparqlEndpoint" value="https://cloud.smartlogic.com/semaphore/073641fc-bdb7-39f1-b774-82e8c5dcc5f3/kmm/api/model:MyModel/sparql" />
                <property name="tokenFetcher" ref="TokenFetcher" />
            </bean>
        </property>
        <property name="configurationFilePath" value="/opt/semaphore/Publisher/config/MyModel/Semaphore-Publisher-OnPremise.xml" />
        <property name="sparqlEndpoint" value="https://cloud.smartlogic.com/semaphore/073641fc-bdb7-39f1-b774-82e8c5dcc5f3/kmm/api/model:MyModel/sparql" />
        <property name="environmentVariables" >
            <map>
                <entry key="model.name" value="MyModel" />
                <entry key="task.name" value="Task Name" />
                <entry key="user.name" value="RemoteUser" />
                <entry key="model.profile" value="Master" />
                <entry key="resources.directory" value="/opt/semaphore/Publisher/resources" />
                <entry key="config.directory" value="/opt/semaphore/Publisher/config" />
                <entry key="PubSES.mainDataPath" value="/var/opt/semaphore/Publisher/" />
                <entry key="root.directory" value="/opt/semaphore/Publisher" />
                <entry key="sparql.endpoint" value="https://cloud.smartlogic.com/semaphore/073641fc-bdb7-39f1-b774-82e8c5dcc5f3/kmm/api/model:MyModel/sparql" />
            </map>
        </property>
        <property name="tokenFetcher" ref="TokenFetcher" />
    </bean>     

    <bean id="TokenFetcher" class="com.smartlogic.cloud.TokenFetcher" >
        <constructor-arg index="0" value="https://cloud.smartlogic.com/token"/>
        <constructor-arg index="1" value="asdgahryaryya44rayary+Q=="/>

    </bean>
</beans>

For this configuration file, you will need to update the

The SPARQL endpoint - as read from the Cloud Knowledge Model Management - in three places.
The model name (MyModel in this example) in several places
The Cloud API Key which is the second contructor-arg value of the TokenFetcher.

The other properties may be edited, but the values supplied are probably fine.

Note, we store the date of the last cloud publish in the file pointed for by the “localDateStorageFile”. If you need to publish on-premise without publishing on-cloud (for instance if you want to change the on-premise configuration) then just delete this file. It will be recreated next time the on-premise wrapper is run.

The tempDirectory is a workspace to where the publish artifacts are downloaded.

The configurationFilePath is the path to the on-premise configuration file detailed below.

Note the “environmentVariables” section. When running Publisher from Knowledge Model Management (KMM), KMM creates these environment variables. Because we are not running from within KMM, the wrapper needs to define them itself. The set to be created is listed here. You’ll see that many of them are referenced within the other configuration files.

Publishing on-premise - the On-Premise Publisher Configuration

We have one remaining configuration file, the On-Premise configuration file.

This is referenced by the Remote Publisher Wrapper file - to keep things simple I have chosen to keep it within the configuration directory for model under the Publisher installation.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring-beans.dtd" >

<beans default-lazy-init="true" >
    <description>The on-premise specific part of the configuration - use this for publishing on-premise</description>
    <bean id="propConfig" class="org.springframework.context.support.PropertySourcesPlaceholderConfigurer" />
    
    <bean class="com.smartlogic.publisher.Publisher" parent="configurationSetList" >
        <property name="model" ref="localTTL" />
    <property name="configurationSets">
        <list merge="true" >
            <ref bean="sesConfiguration" />
        </list>
    </property>
    </bean>

    <bean id="sesConfiguration" parent="AllResources">
        <property name="outputProcessors">
            <list>
                <!-- The simplest one-field index writer -->
                <bean parent="SolrWriterTemplate">
                    <!-- The name of the index to be generated -->
                    <property name="indexName" value="MyModel"/>
                    <!-- The URL of the local solr instance -->
                    <property name="solrURL" value="http://localhost:8983/solr"/>
                    <!-- The URLs that should be called if a versioned model is published -->
                    <property name="sesModelsVersionsURLs">
                        <list>
                            <value>http://localhost:8983/ses/modelversions</value>
                        </list>
                    </property>
                    <!-- If using a SES index, it may be necessary to specify the Zookeeper host - note the format-->
                    <property name="zkHost" value="localhost:9983"/>
                </bean>
            </list>
        </property>
    </bean>

    <bean id="localTTL" class="com.smartlogic.publisher.model.LocalJenaModel" parent="modelInterface">
        <property name="modelFilePaths">
            <list>
                <value>/opt/semaphore/Publisher/remotetmp/MyModel/MyModel.ttl</value>
                <value>/opt/semaphore/Publisher/remotetmp/LinkedModel/LinkedModel.ttl</value>
                <value>${resources.directory}/CoreTTLFiles/semaphore-core.ttl</value>
                <value>${resources.directory}/CoreTTLFiles/system-triples.ttl</value>
                <value>${resources.directory}/CoreTTLFiles/skos-core.ttl</value>
                <value>${resources.directory}/CoreTTLFiles/skos-xl.ttl</value>
            </list>
        </property>
    </bean>

    <bean id="rulebasePublisher" class="com.smartlogic.publisher.pak.PakFilePublisher">
        <property name="classificationServerHost" value="localhost" />
        <property name="csToken" value="eyJhbGciOiasdfdafrge3J21me61dSar2NXgIkG-tj_2oxr77K1A5Bdu6JaCBszu_VPTmA9vIBNAdUdQvsVVexxN0J9o733aZCFh1yQzXnXF-j5fk4Wby3FjwPrCpqMkf2KpivJltaed7IC8LQin-\zasdfdsafdsf-8U5TngJrfBiy-sk8vSk9B1XvWp7PVldhwoypUbh1uvH9XcnqFYLQIG3rlp5KfvdDryXe-LVmUk6kJrX93TzGB-g" />
        <property name="classificationServerPort" value="8000" />
        <property name="publishSetName" value="${model.name}" />
    </bean>

    <import resource="file:/opt/semaphore/Publisher/remotetmp/MyModel/config/MyModel/Semaphore-Publisher_Common.xml" />
    <import resource="file:${resources.directory}/import/ModelInterface.xml" />         
    <import resource="file:${resources.directory}/import/RulebaseStructure.xml" />         
    <import resource="file:${resources.directory}/import/UpdaterDefinition.xml" />         
    <import resource="file:${resources.directory}/import/SESConfiguration.xml"/>
    <import resource="file:${resources.directory}/import/ConfigurationSets.xml"/>
</beans>

This is a fairly simple configuration file - all the configuration sets are in the included common file, we do define the local classification server instance and SES instance - note these need to be defined in full as the environment lookup is not available. The only novel thing here is the definition of the model - this is a set of ttl files. Most of the files are core files supplied with the publisher installation, but there is also one - <<Model Name>>.ttl. In this case we are also importing the ttl for a linked model that was downloaded at the same time. If you have no linked models, then delete this or add more similar ones if you have multiple linked models.

Note, the include of the common configuration file. This file will be part of the downloaded zip file. Ensure that this file location is correct with reference to the directory to which the zip file is exploded. Probably the easiest way to determine this is to run the remote publisher wrapper and see! (The configuration files on this page are a coherent set - with the root of the exploded zip file being the publisher root directory - therefore the configuration files will be exploded to the configuration directory for that model and the ttl files will be exploded to that root.)

Note, the CS token is required only if the CS has been “adopted” by a local studio instance. If it has not been adopted, then this token should not be provided. If you do need to get a value for this token details of how to get it can be found here.

Running on-premise

Once these on-premise configuration files have been set up, you can run the Remote Publisher Wrapper using the command:

#!/bin/bash

java -cp "/opt/semaphore/PublisherSES/libs/*:/opt/semaphore/PublisherSES/resources/logging/fromOE/" com.smartlogic.publisher.remote.RemotePublisherWrapper Semaphore-Publisher-RemoteWrapper.xml

If you are going to run this as a cron job, then you will want to create this in a .sh file. Running this file will cause the remote publisher wrapper to interrogate the cloud Knowledge Model Management for the last publish event. If this exists, then the corresponding artifacts will be downloaded and exploded. The publisher will then run on-premise from this set. (Please ensure that you have messaging disabled in your cronjob otherwise you are liable to fill up your disk space.)

When the on-premise publisher has completed, the last publish date will be stored. This means that the next time the remote publisher wrapper is run if the on-premise publish is at the same point as the cloud publish, then no work will be done. If you want to force a publish, just delete the file defined in the remote publisher wrapper configuration as the “localDateStorageFile”.

Semaphore Publisher