Appendix - XML DTD
- Last Updated: May 13, 2026
- 6 minute read
- Semaphore
- Documentation
Request XML DTD
The XML DTD of requests to Classification Server is:
<!-- Classification Server (Version 7.13) Request DTD -->
<!-- co 2011 Smartlogic Semaphore Ltd -->
<!-- High level element is normally "request" -->
<!ELEMENT request (document?)>
<!ATTLIST request
op (CLASSIFY | PUBLISH | TEST | PUBLISH_ADDITION | STATS | LISTRULENETCLASSES) #REQUIRED>
<!-- Legacy requests have a high level element of "document" -->
<!ELEMENT document (
title?,
path?,
body?,
feedback?,
singlearticle?,
multiarticle?,
min_average_article_pagesize?,
num_articles_processed_in_singlepass?,
char_count_cutoff?,
stylesheet?,
use_generated_keys?,
language?,
debug?,
splitting_template?,
operation_mode?,
clustering,
document_score_limit?,
empty_article_ignores_metadata?,
threshold?,
META* )>
<!ELEMENT title (#PCDATA)>
<!ELEMENT path (#PCDATA)>
<!ELEMENT body (#PCDATA)>
<!ATTLIST body
type (TEXT | HTML) "TEXT" >
<!ELEMENT feedback (#PCDATA)>
<!ELEMENT singlearticle EMPTY>
<!ELEMENT multiarticle EMPTY>
<!ELEMENT min_average_article_pagesize (#PCDATA)>
<!ELEMENT num_articles_processed_in_singlepass (#PCDATA)>
<!ELEMENT char_count_cutoff (#PCDATA)>
<!ELEMENT stylesheet EMPTY>
<!ELEMENT use_generated_keys EMPTY>
<!ELEMENT language (#PCDATA)>
<!ELEMENT debug (#PCDATA)>
<!ELEMENT splitting_template (#PCDATA)>
<!ELEMENT operation_mode (#PCDATA)>
<!ELEMENT clustering EMPTY>
<!ATTLIST clustering
type (ALL | AVERAGE | COMMON | NONE | RMS | AVERAGE_INCLUDING_EMPTY | COMMON_INCLUDING_EMPTY | RMS_INCLUDING_EMPTY) "RMS"
threshold CDATA "48" >
<!ELEMENT document_score_limit (#PCDATA)>
<!ELEMENT empty_article_ignores_metadata (#PCDATA)>
<!ELEMENT threshold (#PCDATA)>
<!ELEMENT META EMPTY>
<!ATTLIST META
name CDATA #REQUIRED
value CDATA #REQUIRED >
Note: This DTD is accessible via URL cs_7_13_request.dtd.
The elements and attributes have the following meaning:
- OPERATION has the following values (case insensitive):
- “CLASSIFY” - Classify a document
- “PUBLISH” - Publish/republish a rulebase
- “COLLECT” - Collect the classification statistics
- “TEST” - Classify a document with diagnostics mode on
- “STATS” - Return statistics regarding classification.
- TITLEdefines the title of the document. If a title is found in the document defined in PATH the value of TITLE will override it.
- BODYwill be treated as the body of the document if PATH is not specified or if the document defined by PATH cannot be fetched. The nature of the BODY is defined by @TYPE
- BODY@TYPE indicates how the provided BODY should be treated:
- “UNKNOWN” - Have Classification Server guess the format of the BODY (this is the default if no TYPE is specified).
- “TEXT” - Treat the BODY as text data.
- “HTML” - Treat the BODY as HTML data.
- PATH is treated as the URI of the document to be classified. Supported protocols are FTP, FTPS, TFTP, HTTP, HTTPS, GOPHER, TELNET, DICT, FILE and LDAP. Note that FTPS & HTTPS are only available if an SSL implementation is available on the server.
- If SINGLEARTICLE is present in the request, the classification will not attempt to split the document into articles, resulting in the document being classified as a whole. Ssimilarly, if SINGLEARTICLE is present the system will attempt to split the document into articles, this is the default behaviour (these two options are mutually exclusive).
- If FEEDBACK is present in the request, the output will not only be populated with the classification results, but will also include the text from the document with auditing information.
- If STYLESHEET is present in the request, the classification server output will include a stylesheet definition so the requesting browser can perform a client-side XSL transform if supported.
- THRESHOLD defines the minimum score a category should reach before being included in the results, if this not specified the THRESHOLD defined in the configuration is used.
- CLUSTERING defines how the classification of articles is propagated at the document level when multiple articles are found in a document and processed separately. If CLUSTERING is not present in the request the values defined in the Classification Server configuration file are used.
- CLUSTERING@TYPE has the following values (case insensitive):
- “ALL” - Indicates that all the categories defined for all articles will be propagated at document level.
- “AVERAGE” - Indicates that the average score (by article contribution) across all non-empty articles will be recalculated for each category and the category propagated at document level if its standard average is above the clustering threshold.
Error XML DTD
The XML DTD of the error response generated by Classification Server is as follows:
<!-- Classification Server (Version 7.8) Error DTD -->
<!-- co 2010 Smartlogic Semaphore Ltd -->
<!-- High level element is "results" -->
<!ELEMENT results (error)>
<!ATTLIST results
name CDATA #REQUIRED>
<!ELEMENT error (#PCDATA)>
<!ATTLIST error
id CDATA #REQUIRED>
Note: This D TD is accessible via URL cs_7_8_error.dtd.
Response XML DTD
The XML DTD of the response generated by Classification Server is as follows:
<!-- Classification Server (Version 4.1.20) Response DTD -->
<!-- co 2017 Smartlogic Semaphore Ltd -->
<!-- High level element is "response" -->
<!ELEMENT response (#PCDATA | STRUCTUREDDOCUMENT | Overall | Acquisition | DateZoner | Evaluation | Finalisation | Lexer | Parser | Splitter | languages)*>
<!-- Standard "classify" request output -->
<!ELEMENT STRUCTUREDDOCUMENT (URL, HASH?, (META | SYSTEM | rule_evidence)*,
ARTICLE*,( PARAGRAPH | OBJECT | FIELD | EMAIL_FIELDS )*)>
<!ELEMENT URL (#PCDATA)>
<!ELEMENT META (META*)>
<!ATTLIST META
name CDATA #REQUIRED
value CDATA #REQUIRED
id CDATA #IMPLIED
score CDATA #IMPLIED
CandidateKey CDATA #IMPLIED
original_key CDATA #IMPLIED
key CDATA #IMPLIED >
<!ELEMENT SYSTEM EMPTY>
<!ATTLIST SYSTEM
name CDATA #REQUIRED
value CDATA #REQUIRED >
<!ELEMENT HASH EMPTY>
<!ATTLIST HASH
value CDATA #REQUIRED >
<!ELEMENT ARTICLE (TITLE?, (META | SYSTEM | rule_evidence)*, (PARAGRAPH|OBJECT|FIELD|EMAIL_FIELDS)* ) >
<!ELEMENT EMAIL_FIELDS ((PARAGRAPH|FIELD)* )>
<!ELEMENT rule_evidence ( Clustered | (rule | EvidenceTruncated)*)>
<!ATTLIST rule_evidence
category CDATA #REQUIRED
class CDATA #REQUIRED >
<!ELEMENT Clustered ( ArticleDetails* )>
<!ATTLIST Clustered
type CDATA #REQUIRED >
<!ELEMENT ArticleDetails EMPTY>
<!ATTLIST ArticleDetails
Index CDATA #REQUIRED
score CDATA #REQUIRED
NonEmptyScores CDATA #REQUIRED
>
<!ELEMENT rule EMPTY>
<!ATTLIST rule
key CDATA #REQUIRED
type CDATA #REQUIRED
score CDATA #REQUIRED
index CDATA #IMPLIED
RuleBase CDATA #IMPLIED
id CDATA #IMPLIED
depth CDATA #IMPLIED
original_key CDATA #IMPLIED
evaluated CDATA #IMPLIED
CandidateKey CDATA #IMPLIED
NodeIndex CDATA #IMPLIED
Offset CDATA #IMPLIED
triggers CDATA #IMPLIED
subtype CDATA #IMPLIED
data CDATA #IMPLIED >
<!ELEMENT EvidenceTruncated EMPTY>
<!ATTLIST EvidenceTruncated
TotalEvidenceRules CDATA #REQUIRED
EvidenceRulesAdded CDATA #IMPLIED>
<!ELEMENT FIELD (#PCDATA|KEY|FIELD|PARAGRAPH)*>
<!ATTLIST FIELD NAME CDATA #REQUIRED >
<!ELEMENT TITLE (PARAGRAPH*)>
<!ELEMENT PARAGRAPH (#PCDATA|KEY|FIELD)*>
<!ELEMENT OBJECT ((PARAGRAPH|OBJECT|FIELD|EMAIL_FIELDS)*)>
<!ELEMENT KEY (#PCDATA|KEY)*>
<!ATTLIST KEY
ID CDATA #REQUIRED >
<!-- Statistics response (request "stats") -->
<!ELEMENT Overall (Classify | Exception | Publish | count | http | value)*>
<!ELEMENT Classify (count | value)*>
<!ELEMENT count (#PCDATA)>
<!ELEMENT value (#PCDATA)>
<!ELEMENT Exception (#PCDATA | Data_Kept)*>
<!ELEMENT Data_Kept (#PCDATA)>
<!ELEMENT Publish (count | value)*>
<!ELEMENT http (#PCDATA)>
<!ELEMENT Acquisition (Exception | count | value)*>
<!ELEMENT DateZoner (count | value)*>
<!ELEMENT Evaluation (Processed | count | value)*>
<!ELEMENT Processed (#PCDATA)>
<!ELEMENT Finalisation (articles_processed | count | documents_processed | value)*>
<!ELEMENT articles_processed (#PCDATA)>
<!ELEMENT documents_processed (#PCDATA)>
<!ELEMENT Lexer (Units_Processed | count | value)*>
<!ELEMENT Units_Processed (#PCDATA)>
<!ELEMENT Parser (count | pdf | processed | text | value)*>
<!ELEMENT pdf (#PCDATA)>
<!ELEMENT processed (#PCDATA)>
<!ELEMENT text (#PCDATA)>
<!ELEMENT Splitter (ArticlesMade | DocumentsSplit | count | value)*>
<!ELEMENT ArticlesMade (#PCDATA)>
<!ELEMENT DocumentsSplit (#PCDATA)>
<!-- Legacy response -->
<!ELEMENT results (class?, error?, error_detail?, version?, warnings?)>
<!ATTLIST results name (error | processdocument | version) "processdocument">
<!ELEMENT warnings (warning*)>
<!ELEMENT warning (warning_detail)>
<!ATTLIST warning id CDATA #REQUIRED>
<!ELEMENT warning_detail (#PCDATA)>
<!-- NOTE: Non-legacy requests will return errors using this format also. -->
<!ELEMENT error (#PCDATA)>
<!ATTLIST error id CDATA #REQUIRED>
<!ELEMENT error_detail (#PCDATA)>
<!ELEMENT version EMPTY>
<!ATTLIST version number CDATA #REQUIRED>
<!ELEMENT class (term*)>
<!ATTLIST class name CDATA #REQUIRED>
<!ELEMENT term EMPTY>
<!ATTLIST term
name CDATA #REQUIRED
score CDATA #REQUIRED>
<!-- Elements used by language request response -->
<!ELEMENT languages (language)*>
<!ATTLIST languages
type (Language_Pack|Standard) #REQUIRED>
<!ELEMENT language EMPTY>
<!ATTLIST language
id CDATA #REQUIRED
name CDATA #REQUIRED
display CDATA #IMPLIED
default (true) #IMPLIED
has_rules_defined (true) #IMPLIED>
Note: This DTD is accessible via URL cs_4_1_20_response.dtd.
Notes:
- The information returned from a “debug” request that is included in this schema may change at any time due to the nature of the information being provided so for this type of request this schema should be treated as simply a guideline that may not be strictly adhered to.
- META elements are used to store the classification results. META elements can be found at STRUCTUREDDOCUMENT or ARTICLE level. If the STRUCTUREDDOCUMENT contains articles then the document level META elements are derived from the ARTICLE level ones, as per the aggregation parameters set out in the XML request. CandidateKey is used as an alternative key for markup when the scored rule is a template rule. This is because a particular rule may be scored for several candidates and using this key allows the evidence for each scored candidate to be displayed.
- SYSTEM elements are used to store some properties of the document, such as its nature (PDF, WORD etc), its author when available etc…
- Rule evidence is only output in diagnostics mode and provides the full list of subsidiary rules which evaluate up to the particular category level - these are individually marked up in the text where appropriate.
- The “KEY” elements are used to surround elements of text identified as evidence by the rulebases for auditing purposes. Their identifier matches the key attribute of the META element.
- “OBJECTS” are returned when the incoming document contains embedded objects (typically within Microsoft Office documents). If the nested object is of an unrecognised format (e.g. an embedded sound or image) then the returned object will contain a single warning message within a paragraph object.
- “OBJECT”, “TITLE” and “PARAGRAPH” elements are only present if feedback was requested in the incoming request.