Processing carried out after variant generation

Save PDF

Last Updated: July 8, 2026
4 minute read

Semaphore
Documentation

Word Type Processing

Any processors that should be run after variants are generated should be referenced in the postVariantGenerationProcessors list. There are three processor types that could sensibly go here.

 <bean class="com.smartlogic.publisher.preprocessing.WordTypeProcessor" />

If rulebases are being generated, then there must be a word type processor. This assigns to all variants generated for each label the appropriate word type. This word type is referenced within many of the kid files and so must be generated before the rulebases are output. If rulebases are to be output and there is no word type processor in the set of postVariantGenerationProcessors then Publisher will fail with an error.

Preclusion

<bean id="preclusionProcessor" class="com.smartlogic.publisher.preclusion.PreclusionCalculator">
  <property name="embeddedSolrSourceDir" value="resources/PreclusionSolrIndex" />
  <property name="threadCount" value="1" />
  <property name="global" value="true" />
</bean>

When generating rulebases, it is usually the case that preclusion is required. This where we prevent a concept list “President” firing if the evidence actually comes from a phrase like “Vice President” and “Vice President” is another concept in the model. The preclusion is done using an internal Lucene engine. The threadCount is the number of threads to be run when querying the index to determine the terms actually precluded by the current variant. If it is found that preclusion is slow, then setting this thread count higher will speed up the process - at the cost of taking more resources out of the machine.

The global property determines whether preclusion should take place within configuration sets or across all of them. If global is set to true, then any label of any concept or concept scheme in any collection set may preclude any for the current configuration set. If global is set to false, then only labels of concepts or concept schemes within the current configuration set will be able preclude within the current configuration set. (Note, if a concept or concept scheme of the model is not in any configuration set, then it will not preclude anything.)

As a performance measure, the maximum number of preclusions that be attached to any particular variant is configurable. The property maxPreclusions is 1000 by default. It is unlikely that this will need to altered. If a concept label is precluded by more than 1000 others, then it is likely that that concept should be better thought out.

As with the “variantGenerators” property, out of the box, the “postVariantGenerationProcessors” are specified in the ConfigurationSets.xml file, and can be overridden on a Config Set by Config Set basis by adding the “postVariantGenerationProcessors” property and list of processors directly in the Config Set.

Count Updater

If at publish time you wish to look up the number of documents have been tagged with a particular concept in you content, then you can add a countUpdater processor to the list of post-variant generation processors. (By default the count updater is not run.)

Currently there is only one type of count updater bean available - the SOLR count updater. This will interrogate an external SOLR index to determine the number of records in the index tagged with each concept identifier.

To query by identifier, use a bean like:

        <bean id="solrCountUpdater" class="com.smartlogic.publisher.countupdater.CountUpdater">
                <property name="countFinder">
                        <bean class="com.smartlogic.publisher.countupdater.SolrCountFinder">
                                <property name="solrURL" value="http://<machine name>:8983/solr/<document index name" /> <!-- The URL of the SOLR index being interrogated -->
                                <property name="solrFields">
                                        <list>
                                                <value>The solr field in which the identifier may be present</value>
                                        </list>
                                </property>
                        </bean>
                </property>
        </bean>

To query by concept name, use a bean like:

        <bean id="solrCountUpdater" class="com.smartlogic.publisher.countupdater.CountUpdater">
                <property name="countFinder">
                        <bean class="com.smartlogic.publisher.countupdater.SolrCountFinder">
                                <property name="solrURL" value="http://<machine name>:8983/solr/<document index name" /> <!-- The URL of the SOLR index being interrogated -->
                                <property name="solrFields">
                                        <list>
                                                <value>The solr field in which the identifier may be present</value>
                                        </list>
                                </property>
                                <property name="useNames" value="true" /> <!-- This is false by default -->
                                <property name="languageCode" value="en" /> <!-- The language in of the concept names that are stored in the remote index  -->
                        </bean>
                </property>
        </bean>

There is an additional property that may be overridden “query”. By default this is “*:*” - the query to return all documents in the SOLR index - it is from this set that SOLR calculates the facet counts. If you want only a subset of the SOLR index to be analysed, then change this query to one that will return that subset.

If you have an alternative content management system, then Progress could add a custom count updater, or you could add your own. You merely need to implement the interface:

  package com.smartlogic.publisher.countupdater;
  public interface CountFinder {
      void setCount(CountUpdatableObject countUpdatableObject);
      void init() throws CountFinderException;
      void commit();
  }

where CountUpdatableObject is defined as:

  package com.smartlogic.publisher.countupdater;
  public interface CountUpdatableObject {
      public String getIdentifier();
      public String getName(String languageCode);
      public void setDocumentCount(long count);
  }

The init() method is called before any processing of the model takes place before the Semaphore model is loaded and can be used to do any initialization that you require. If the CountFinderException is thrown, publishing will be halted and the error reported.

The setCount method will be called once per concept or concept scheme. It is the responsibility of the count updater to set this value correctly.

Once all counts have been set, the commit() method is called. This is a chance for the count updater to close any resources that it might have opened.

After generating a jar file with your new class, add it to the class path of the publisher (the easiest way is to add it to the libs directory) and then add a reference to it in your Publisher configuration file.

Semaphore Publisher