Additional Functionality

Save PDF

Last Updated: May 13, 2026
4 minute read

Semaphore
Documentation

“Test” Request

By switching on the diagnostics flag in the configuration (or using the TEST request) the rule evidence and appropriate mark-up will be displayed. For example, a request of:

<?xml version="1.0" ?>
<request op="TEST">
  <document>
    <title>Dumped Cars</title>
    <body type="UNKNOWN">Dumped cars are great, abandoned vehicles are cool too.</body>
    <multiarticle />
    <feedback />
    <stylesheet />
    <clustering type="RMS" threshold="20" />
    <threshold>48</threshold>
    <min_average_article_pagesize>1.0</min_average_article_pagesize>
    <char_count_cutoff>500000</char_count_cutoff>
    <document_score_limit>0</document_score_limit>
  </document>
</request>

Returns something like the following:

Classification Server Test Interface - TEST Output

As can be seen, the specific rulebases that are firing (and how) is displayed as part of the XML output.

The amount of information returned in test mode may be altered using specific syntax for “use_generated_keys” and “evaluate_node” parameters - however a full explanation of this is beyond the scope of this document - If additional information is required contact Progress.

It is highly recommended that the “stylesheet” parameter be specified otherwise the XML returned may be difficult to manually process.

Note: Adding the “<feedback/>” tag to the “TEST” request adds the raw text content of the original file to the output which is highlighted to indicate which individual words have contributed to a rule firing. This is often useful in diagnosing classification issues.

“Stats” (Statistics) Request

There are various low level “counters” that are kept and updated by Classification Server that can be used to return statistical information regarding the on-going operation of Classification Server. The counters are reset to “0” only when the entire Classification Server service is restarted. The meaning of individual counters is determined by the generating area (for example a particular “bean”) but generally are self descriptive, that is, their name indicates what they are actually counting/recording and their names are grouped into areas so that counters for similar information are easily found next to each other within the top level “response” XML element.

So for example the “finalisation” bean will have several counters which will be named appropriately within the “Finalisation” section of the output:

articles_processed - The number of articles successfully processed by the finalisation bean
documents_processed - The number of documents successfully processed by the finalisation bean (nb a document will always contain 1 or more articles)
exceptions - The number of exceptions which have occurred whilst finalising

As well as “simple” (single valued) counters “statistic” counters are “count”/“value” paired. In this case when the counter (“count”) is updated the “value” is increased by a specific amount so, for example, if a counter was required to record the number of characters processed there would be a “count” which increased by one every time a document was processed and the “value” would be the cumulative sum of all the characters processed so “value” / “count” would give you the average characters per document (etc).

Note: Currently the only “count”/“value” pairs that are used are timers where the “value” field is the cumulative time in 10ths of a millisecond (ie 1/10000 of a second).

Statistics are displayed by submitting the following XML request:

<?xml version="1.0" ?>
<request op="stats">
</request>

Using our finalisation bean example we could have the following counters/timers returned by this request:

<response>
...
<Finalisation>
      <count>10</count>
  <value>160</value>
  <articles_processed>27</articles_processed>
  <documents_processed>9</documents_processed>
  <exceptions>1</exceptions>
</Finalisation>
...
</response>

This means that the finalisation bean (the pipeline component which is responsible for formatting a response to a request in the particular format used by the requestor) has been called 10 times since the service was started, the total duration of processing within this bean is 16 (the “160” value above divided by 10) milliseconds (approx. 2 milliseconds per document), 9 documents have been successfully finalised (i.e. a classification request for a document successfully processed), 27 articles have been finalised (ie on average 3 articles per document), and 1 exception has occurred during finalisation (a finalisation exception would is rare since it is only re-formatting data so unless there is a bug this will not occur) - In this case the exception was actually caused by the client closing the socket connection during finalisation processing (in this case the exception can occur in any pipeline component since it is simply whenever CS notices that the socket has been closed without waiting for CS to reply).

Note: The most useful statistics counters currently kept are probably in the Overall section which measure the time and number of events across the entire operation of the pipelines - e.g. “Overall::Classify::count” is the count of classification requests with “Overall::Classify::value” the cumulative time taken (in 1/10000 of a second) to process the entire requests (from reception of request to successful transmission of response).

Semaphore Classification and Language Service (CLS)