Powered by Zoomin Software. For more details please contactZoomin

Semaphore Classification and Language Service (CLS)

Using “Bulk Classification” (classify.py/classify.exe)

Using “Bulk Classification” (classify.py/classify.exe)

  • Last Updated: May 13, 2026
  • 4 minute read
    • Semaphore
    • Documentation

Note: From 5.0 on the Classification & Language Service Client (CLS-Client) is the recommended tool for doing bulk classification tests. This is available in your downloads area of this portal. Documentation for its use can be found here.

Python scripts to classify documents are included in the standard Classification Server distribution (in the “utils” folder under the installation directory). A compiled version for windows can be obtained from (Windows - download and unzip). These scripts allow multiple files to be classified using Classification Server. The Python source that can be executed in the “example_scripts” sub-folder (of course requiring that Python be installed to be able to use). This command has the following syntax if using Python (on Linux):

python classify.py [options] [files(s) to classify]

Or (if using the Windows executable):

classify.exe [options] [files(s) to classify]

Notes:

  • Files may include wildcards (eg *.*) which will classify matching files
  • ~file means exclude particular file or pattern (eg *.* ~*.pdf classifies all files except pdf files)
  • @file means classify the files specified in the file (same as –filelist)
  • ~@file means exclude the files specified within the file (same as –exclude)
  • See “python classify.py –help” for a full list of command-line options (or see "classify.py/classify.exe" full command reference).

Example “classify.py/classify.exe” requests

The following are some examples of how “classify.py” or “classify.exe” (on Windows) can be used.

Example 1 (using Python):

python classify.py d:\test\*.pdf

Or (using Windows classify.exe):

classify.exe d:\test\*.pdf

This will classify all pdf files in d:\test (no recursion) and returns the results in xml to the command prompt.

Example 2:

classify -sTEST -p1051 -r -fCSV -od:\test.txt d:\test e:\test

This will classify all files in d:\test e:\test (including any subdirectories) with a classification server running on port 1051 on machine TEST and writes the output in a comma separated list format to d:\test.txt

Example 3:

classify --singlearticle -t d:\test\example1.doc --outputfile d:\test.xml

This will classify the specified document as a singlearticle, transfers the data for the file within the request (mainly used when working with remote server which cannot access the specified file(s)) and writing the output to d:\test.xml

Example 4:

classify --debug Timings "d:\test word documents\*.doc" -r --limit 10 -od:\test.xml

This classifies up to 10 .doc files from “d:\test word documents” (will recurse if required) requesting bean timings information and outputting results to d:\test.xml

Example 5:

classify -r *.* ~*.doc --exclude d:\classifiedfiles.txt --record d:\classifiedfiles.txt --limit 100 --formatfile d:\ipsv.xslt -o d:\results.csv

This classifies up to 100 files from current directory (will recurse) excluding any .doc files and any files specified in d:\classifiedfiles.txt. The results are written to d:\results.csv after being transformed by d:\ipsv.xslt (must exist) The names of the files that are classified are appended to d:\classifiedfiles.txt

Example 6:

classify d:\test\*.doc --use_csreq_files

With the following files in d:\test

A.doc
A.doc.csreq
B.doc

Will classify A.doc and B.doc however the request for A.doc will be read from A.doc.csreq which must be a valid CS XML request (no checking is performed - use Classification Server diagnostics to debug). Other command line flags which are useful for csreq files are (using Python):

classify d:\test\*.doc --only_csreq_files

Only those files with a parallel csreq file are classified (A warning response is generated for files which do not have a “csreq” file). For this request A.doc is classified (since file A.doc.csreq exists) but a warning will be generated for B.doc (since B.doc.csreq does not exist). Using Python:

classify d:\test\*.doc --make_csreq_files

Will not attempt to classify but just write the request to appropriate “csreq” file which is possibly useful to get example of format. This request will overwrite A.doc.csreq and create B.doc.csreq

Classification comparison using “classify.py/classify.exe”

“classify” parameters “–cf” and “–cfi” allow comparison of classification results between Classification Server instances. For example:

classify @fileset --cf=bluto -record fileset2 -feedback -o data

Would compare all the output of classification from localhost:5058 against bluto:5058 (of all files listed in file “fileset”) and any files which differ would be recorded in file “fileset2” (the differentiated output is written to data). This particular comparison is at a very low level, for example any difference in the feedback would trigger a change, however the comparison can be done purely on the document level scores (or in fact on any feature in the xml by supplying appropriate xslt) by doing the following:

classify @fileset --cf=bluto -record fileset3 -o data.csv -fCSV

This performs the comparison on the CSV format of the returned data (whichonly includes classification scores) so will only mark files as different if they show a different score (which is useful if you have made changes to a rulenet and want to determine a set of files which have differences). So

classify @fileset ~fileset3 -record fileset4 -feedback -o data 

Would give in file “fileset4” only those files which differ in feedback but have the same score (since we have excluded the “fileset3” files which from above are those which differ in score).

Using “classify.py/classify.exe” with Smartlogic Cloud

The following is an example of using the classify utility with Semaphore Cloud that classifies all pdf files from current directory. To use this example, update with the correct Classification Server URL (from the Basic API configuration) and Semaphore Cloud API Key as the values in this example are random and will not work:

classify -s https://cloud.smartlogic.com/svc/5af949a1-5661-44f1-b9de-a38981b6a46f/ *.pdf -a u+n+qJfcXIye2/1c6TLcuw== -c https://cloud.smartlogic.com/token

Note that the “-c” parameter is optional and in most cases the default value is correct.

Note: if you are a tenant admin on multiple tenancies, your default tenancy must be set to the tenancy you are connecting to in the -s parameter. See here for instructions for changing your default tenancy.

TitleResults for “How to create a CRG?”Also Available inAlert