The Classification & Language Service Client

Save PDF

Last Updated: July 8, 2026
5 minute read

Semaphore
Documentation

The “Classification & Language Service Client” (Classification Server command line client or CLS-Client) is a Java application that can be run from the command line on any machine which has Java (version 11 or later) installed. It is available as part of the downloads for Semaphore 5 from software downloads for anyone licenced to use the “Classification & Language Service (CLS)” module.

It will submit to one or more Classification Server instances all the files or URLs specified (either in the command itself or via a file that contains a list of resources to classify).

Using the Classification Server client

It is run by executing the command

java -jar Semaphore-CLSClient-<<Version>>.jar ...arguments...

To get the full list of arguments, run the command

java -jar Semaphore-CLSClient-<<Version>>.jar --help

The client can be run against multiple CLS instances in parallel (by specifying multiple –url=<<CS URL>> parameters). If this is the case, then the returned data for each CLS instance will be written to a separate location. For single file outputs, the file will be pre-pended with a cleaned up version of the CLS URL, for multiple file output, the directory to which the files are written will be pre-prepended with that cleaned up CLS URL.

Note: If the CLS instance is running in Semaphore Cloud, then use the “–cloud-api-key” parameter to specify the “API Key” that can be obtained from Semaphore Cloud from your “Account Settings” page. If you are not accessing services in the “East US” instance of Semaphore Cloud (https://cloud.smartlogic.com/) then you will need to also specify the “token” endpoint URL in the “-c” parameter with a value of https://<instance URL>/token, for example, for Canada Central this would be -c http://ca.cloud.smartlogic.com/token or for US Government this would be -c http://usgov-cloud.smartlogic.com/token. This is required as the CLS client defaults are for East US.

The output methods available are:

Standard out (stdout) - if no other output method is specified then the data returned from CLS will be written to standard out.
File - If a file if specified using the “–output-file” parameter, then the CS responses when classifying all the specified resources will be written as a single xml document to that file.
Files to match input directory structure - If an output directory is specified using the “–output-directory” argument, then one file will be written containing the output from each classified file. (Note, URLs cannot be classified using this output method as the corresponding file structure cannot be determined.)
CSV output - The “–csv-output-file” parameter is used to specify a CSV file where classification results are to be saved.
Multiple Column Output - The “–multiple-column-configuration-file” parameter is used to specify a configuration file and –multiple-column-output-file an output file, then the configuration file can be used to specify the paths to select metadata expected within the CS response. The data returned therein will be written to the specified output file. (If multiple values are found with the same path, then the values will be concatenated using a semi-colon as a separator.)

These are just a few of the options available. The full range can be obtained from the application itself.

Using the Classification Server client for regression

From version 5.0.3 forwards, the CS-Client can be used for regression analysis.

To do this, create an Excel spreadsheet containing your expected results. This will consist of a single sheet Excel workbook.

There will be a header row in this spreadsheet, column 1 of this header row is ignored, but all other cells should contain a rulebase class, or the path to a nested meta item.

In subsequent rows, there should be a file name, followed by the results expected for each of the rulebase class of the column. The results for each rulebase class should be a semicolon separated list. In the example below, we have a sample of five wiki articles saved as pdf, and an example set of tags which a cataloguer has manually assigned to each article from our ‘Subjects model’ plugin. After publishing the model, we can want to compare our classification strategy against the human assigned tags. Our cataloguer’s tags are thus:

The regression syntax is:

java -jar Semaphore-CSClient-<<Version>>.jar --exemplar-input-file <<Excel File Name>> <<directory containing files>>

will then run through the files listed in the spreadsheet and classify them (in this case against localhost:5058, if you want a different CS URL then you can supply it).

Two new spreadsheets are created,

“results” which is a spreadsheet similar to the input one, but with the actual results. This can be used as a reference for later comparisons if you initially provided an empty spreadsheet with just the filenames and rulebase classes defined.
“comparison” which is a multi-sheet workbook. After a brief details page, we get the actual and exemplar data presented with differences highlighted. We then get a document overview of precision and recall across all rulebase classes. We then get for each rulebase class a by-document and a by-term analysis of the expected versus actual results.

If you don’t specify the names of these two spreadsheets, they will be generated automatically from the name of the input file and date/time of the run.

Looking at the results, we can see differences in red.

These are mainly false positives; that is the human did not assign ‘Civil Engineering’ to the Wiki Engineering page, ‘Air Pollution’ to Asbestos, ‘Recycling (waste) to ’Recycling.pdf’ etc. This brings up an important point about manual classification in general - just because a human has assigned tags, does not mean they are correct and anything they have not assigned is a false positive.

Looking at per record, we can see ‘Bagpipes’ has also been returned. Unlike the rest of the tags, this does look like a correct false positive. To investigate why and how this particular tag has fired, we‘d use Document Analyzer to see that ’Pipes’, an Alternative Label of ‘Bagpipes’ has fired for the phrase ‘concrete, bricks, pipes’.

Semaphore Classification and Language Service (CLS)