Using the Thesaurus Functions
- Last Updated: April 15, 2026
- 11 minute read
- MarkLogic Server
- Version 10.0
- Documentation
MarkLogic Server includes functions that enable applications to provide thesaurus capabilities. Thesaurus applications use thesaurus (synonym) documents to find words with similar meaning to the words entered by a user. A common example application expands a user search to include words with similar meaning to those entered in a search. For example, if the application uses a thesaurus document that lists car brands as synonyms for the word car, then a search for car might return results for Alfa Romeo, Ford, and Hyundai, as well as for the word car.
This chapter describes how to use the thesaurus functions and contains the following sections:
- The Thesaurus Module
- Function Reference
- Thesaurus Schema
- Capitalization
- Managing Thesaurus Documents
- Expanding Searches Using a Thesaurus in XQuery
The Thesaurus Module
There is an XQuery module to perform thesarus functions. You can use this module either in XQuery or in Server-Side JavaScript. The thesaurus functions are installed into the following XQuery module file:
- install_dir
/Modules/MarkLogic/thesaurus.xqy
where install_dir is the directory in which MarkLogic Server is installed. The functions in the thesaurus module use the thsr: namespace prefix, which you must specify in your XQuery program (or specify your own namespace). To use any of the functions in XQuery, include the module and namespace declaration in the prolog of your XQuery program as follows:
import module namespace thsr="http://marklogic.com/xdmp/thesaurus"
at "/MarkLogic/thesaurus.xqy";
To use any of the functions in a JavaScript program, include a line similar to the following in your Server-Side JavaScript program:
const thsr = require("/MarkLogic/thesaurus");
Function Reference
The reference information for the thesaurus module functions is included in the MarkLogic XQuery and XSLT Function Reference and the MarkLogic Server-Side JavaScript Function Reference available through docs.marklogic.com.
Thesaurus Schema
Any thesaurus documents loaded into MarkLogic Server must conform to the thesaurus schema, installed into the following file:
- install_dir
/Config/thesaurus.xsd
where install_dir is the directory in which MarkLogic Server is installed.
Capitalization
Thesaurus documents and the thesaurus functions are case-sensitive. Therefore, a thesaurus term for Car is different from a thesaurus term for car and any lookups for these terms are case-sensitive.
If you want your applications to be case-insensitive (that is, if you want the term Car to return thesaurus entries for both Car and car), your application must handle the case of the terms you want to lookup. There are several ways to handle case. For example, you can lowercase all the entries in your thesaurus documents and then lowercase the terms before performing the lookup from the thesaurus. For an example of lowercasing terms in a thesaurus document, see Lowercasing Terms When Inserting a Thesaurus Document.
Managing Thesaurus Documents
You can have any number of thesaurus documents in a database. You can also add to or modify any thesaurus documents that already exist. This section describes how to load and update thesaurus documents, and contains the following sections:
- Loading Thesaurus Documents in XQuery
- Loading Thesaurus Documents in JavaScript
- Lowercasing Terms When Inserting a Thesaurus Document
- Loading the XML Version of the WordNet Thesaurus
- Updating a Thesaurus Document
- Security Considerations With Thesaurus Documents
- Example Queries Using Thesaurus Management Functions
Loading Thesaurus Documents in XQuery
To use a thesaurus in a query, use the thsr:load function or the thsr:insert function to load a document as a thesaurus. For example, to load a thesaurus document with a URI /myThsrDocs/wordnet.xml, execute a query similar to the following:
xquery version "1.0-ml";
import module namespace thsr="http://marklogic.com/xdmp/thesaurus"
at "/MarkLogic/thesaurus.xqy";
thsr:load("c:\thesaurus\wordnet.xml", "/myThsrDocs/wordnet.xml")
This XQuery adds all of the <entry> elements from the c:\thesaurus\wordnet.xml file to a thesaurus with the URI /myThsrDocs/wordnet.xml. If the document already exists, then it is overwritten with the new content from the specified file.
If you have a thesaurus document that is too large to fit into an in-memory list, you can split the thesaurus into multiple documents. If you do this, you must specify all of the thesaurus documents in the thesaurus APIs that take URIs as a parameter. Also, ensure that there are no duplicate entries between the different thesaurus documents.
Loading Thesaurus Documents in JavaScript
To use a thesaurus in a Server-Side JavaScript program, use the thsr.load function or the thsr.insert function to load a document as a thesaurus. For example, to load a thesaurus document with a URI /myThsrDocs/wordnet.xml, execute a query similar to the following:
const thsr = require("/MarkLogic/thesaurus");
declareUpdate();
thsr.load("c:\thesaurus\wordnet.xml", "/myThsrDocs/wordnet.xml")
This JavaScript program adds all of the <entry> elements from the c:\thesaurus\wordnet.xml file to a thesaurus with the URI /myThsrDocs/wordnet.xml. If the document already exists, then it is overwritten with the new content from the specified file.
If you have a thesaurus document that is too large to fit into an in-memory list, you can split the thesaurus into multiple documents. If you do this, you must specify all of the thesaurus documents in the thesaurus APIs that take URIs as a parameter. Also, ensure that there are no duplicate entries between the different thesaurus documents.
Lowercasing Terms When Inserting a Thesaurus Document
You can use the thsr:insert function to perform transformation on a document before inserting it as a thesaurus document. The following example shows how you can use the xdmp:get function to load a document into memory, then walk through the in-memory document and construct a new document which has lowercase terms.
xquery version "1.0-ml";
import module namespace thsr="http://marklogic.com/xdmp/thesaurus"
at "/MarkLogic/thesaurus.xqy";
thsr:insert("newThsr.xml",
let $thsrMem := xdmp:get("C:\myFiles\thesaurus.xml")
return
<thesaurus xmlns="http://marklogic.com/xdmp/thesaurus">
{
for $entry in $thsrMem/thsr:entry
return
(: Write out and lowercase the term, then write out all of
the children of this entry except for the term, which was
already written out and lowercased :)
<thsr:entry>
<thsr:term>{lower-case($entry/thsr:term)}</thsr:term>
{$entry/*[. ne $entry/thsr:term]}
</thsr:entry>
}
</thesaurus>
)
Loading the XML Version of the WordNet Thesaurus
You can download an XML version of the WordNet from the MarkLogic Developer site (developer.marklogic.com/code/dictionaries). Once you download the thesaurus file, you can load it as a thesaurus document using the thsr:load XQuery function or the thsr.load JavaScript function.
Perform the following steps to download and load the WordNet Thesaurus:
-
Go to the code section of developer.marklogic.com and find the following page:
http://developer.marklogic.com/code/dictionaries -
Click the GitHub link.
-
Navigate to the thesaurus document section and find the
thesaurus.xmldocument. -
Save
thesaurus.xmlto a file (for example,c:\thesaurus\thesaurus.xml). Alternately, clone the GitHub repository. -
Load the thesaurus with an XQuery statement similar to the following:
xquery version "1.0-ml"; import module namespace thsr="http://marklogic.com/xdmp/thesaurus" at "/MarkLogic/thesaurus.xqy"; thsr:load("c:\thesaurus\thesaurus.xml", "/myThsrDocs/wordnet.xml")
Or you can load the thesaurus in JavaScript with a program similar to the following:
const thsr = require("/MarkLogic/thesaurus");
declareUpdate();
thsr.load("c:\thesaurus\wordnet.xml", "/myThsrDocs/wordnet.xml");
This loads the thesaurus with a URI of /myThsrDocs/wordnet.xml. You can now use this URI with the thesaurus module functions.
Updating a Thesaurus Document
Use the following thesaurus functions to modify existing thesaurus documents:
Additionally, the thsr:insert / thsr.insert function adds entries to an existing thesaurus document (as well as creates a new one if one does not exist at the specified URI).
The transactional unit in MarkLogic Server is a query; therefore, if you are performing multiple updates to the same thesaurus document, be sure to perform those updates as part of separate queries. In XQuery, you can place a semi-colon between the update statements to start a new query (and therefore a new transaction). If you use a semicolon to start any new queries that uses thesaurus functions in XQuery, each query must include the import statement in the prolog to resolve the thesaurus namespace.
Security Considerations With Thesaurus Documents
Thesaurus documents are stored in XML format in the database. Therefore, they can be queried just like any other document. Note the following about security and thesaurus documents:
-
By default, thesaurus documents are loaded into the following collections:
- http://marklogic.com/xdmp/documents
- http://marklogic.com/xdmp/thesaurus
-
Thesaurus documents are loaded with the default permissions of the user who loads them. Make sure users who load thesaurus documents have approriate privileges, otherwise the documents might not have the needed permissions for reading and updating. For more information, see Setting Document Permissions in the Loading Content Into MarkLogic Server Guide.
- If you want to control access (read and/or write) to thesaurus documents beyond the default permissions with which the documents are loaded, perform an
xdmp:document-set-permissionsafter athsr:loadoperation.
Example Queries Using Thesaurus Management Functions
This section includes the following examples, in both XQuery and JavaScript:
- Example: Adding a New Thesaurus Entry in XQuery
- Example: Adding a New Thesaurus Entry in JavaScript
- Example: Removing a Thesaurus Entry
- Example: Removing Term(s) from a Thesaurus in XQuery
- Example: Removing Term(s) from a Thesaurus in JavaScript
- Example: Adding a Synonym to a Thesaurus Entry in XQuery
- Example: Adding a Synonym to a Thesaurus Entry in JavaScript
- Example: Removing a Synonym From a Thesaurus in XQuery
- Example: Removing a Synonym From a Thesaurus in JavaScript
Example: Adding a New Thesaurus Entry in XQuery
The following XQuery uses the thsr:set-entry function to add an entry for Car to the thesaurus with URI /myThsrDocs/wordnet.xml:
xquery version "1.0-ml";
import module namespace thsr="http://marklogic.com/xdmp/thesaurus"
at "/MarkLogic/thesaurus.xqy";
thsr:set-entry("/myThsrDocs/wordnet.xml",
<entry xmlns="http://marklogic.com/xdmp/thesaurus">
<term>Car</term>
<part-of-speech>noun</part-of-speech>
<synonym>
<term>Ford</term>
<part-of-speech>noun</part-of-speech>
</synonym>
<synonym>
<term>automobile</term>
<part-of-speech>noun</part-of-speech>
</synonym>
<synonym>
<term>Fiat</term>
<part-of-speech>noun</part-of-speech>
</synonym>
</entry>)
If the /myThsrDocs/wordnet.xml thesaurus has an identical entry, there will be no change to the thesaurus. If the thesaurus has no entry for car or has an entry for car that is not identical (that is, where the nodes are not equivalent), it will add the new entry. The new entry is added to the end of the thesaurus document.
Example: Adding a New Thesaurus Entry in JavaScript
The JavaScript thsr.setEntry function allows you to use a JavaScript object to update your thesaurs documents. The following JavaScript uses the thsr.setEntry function to add an entry for Car to the thesaurus with URI /myThsrDocs/wordnet.xml:
const thsr = require("/MarkLogic/thesaurus");
declareUpdate();
thsr.setEntry("/myThsrDocs/wordnet.xml",
{
"term":"Car",
"partOfSpeech":"noun",
"synonyms":[
{"term":"Ford",
"partOfSpeech":"noun"
},
{"term":"automobile",
"partOfSpeech":"noun"
},
{"term":"Fiat",
"partOfSpeech":"noun"
}
]
});
If the /myThsrDocs/wordnet.xml thesaurus has an identical entry, there will be no change to the thesaurus. If the thesaurus has no entry for car or has an entry for car that is not identical (that is, where the nodes are not equivalent), it will add the new entry. The new entry is added to the end of the thesaurus document.
Example: Removing a Thesaurus Entry
The following XQuery uses the thsr:remove-entry function to remove the second entry for Car from the thesaurus with URI /myThsrDocs/wordnet.xml:
xquery version "1.0-ml";
import module namespace thsr="http://marklogic.com/xdmp/thesaurus"
at "/MarkLogic/thesaurus.xqy";
thsr:remove-entry("/myThsrDocs/wordnet.xml",
thsr:lookup("/myThsrDocs/wordnet.xml","Car")[2])
Similarly, the following is a JavaScript example to do the same thing:
const thsr = require("/MarkLogic/thesaurus");
declareUpdate();
thsr.removeEntry("/myThsrDocs/roget.xml",
thsr.lookup("/myThsrDocs/roget.xml","Car").toObject()[1])
This removes the second Car entry from the /myThsrDocs/wordnet.xml thesaurus document.
Example: Removing Term(s) from a Thesaurus in XQuery
The following XQuery uses the thsr:remove-term function to remove all entries for the term Car from the thesaurus with URI /myThsrDocs/wordnet.xml:
xquery version "1.0-ml";
import module namespace thsr="http://marklogic.com/xdmp/thesaurus"
at "/MarkLogic/thesaurus.xqy";
thsr:remove-term("/myThsrDocs/wordnet.xml", "Car")
This removes all of the Car terms from the /myThsrDocs/wordnet.xml thesaurus document. If you only have a single term for Car in the thesaurus, the thsr:remove-term function does the same as the thsr:remove-entry function.
Example: Removing Term(s) from a Thesaurus in JavaScript
The following JavaScript program uses the thsr.removeTerm function to remove all entries for the term Car from the thesaurus with URI /myThsrDocs/wordnet.xml:
const thsr = require("/MarkLogic/thesaurus");
declareUpdate();
thsr.removeTerm("/myThsrDocs/wordnet.xml", "Car")
This removes all of the Car terms from the /myThsrDocs/wordnet.xml thesaurus document. If you only have a single term for Car in the thesaurus, the thsr.removeTerm function does the same as the thsr.removeEntry function.
Example: Adding a Synonym to a Thesaurus Entry in XQuery
The following XQuery adds the synonym Alfa Romeo to the thesaurus entry for car in the thesaurus with URI /myThsrDocs/wordnet.xml:
xquery version "1.0-ml";
import module namespace thsr="http://marklogic.com/xdmp/thesaurus"
at "/MarkLogic/thesaurus.xqy";
thsr:add-synonym(thsr:lookup("/myThsrDocs/wordnet.xml", "car"),
<thsr:synonym>
<thsr:term>Alfa Romeo</thsr:term>
</thsr:synonym>)
This query assumes that the lookup for the car thesaurus entry returns a single entry. If the car lookup returns multiple entries, you must specify a single entry. For example, if you wanted to add the synonym to the first car entry in the thesaurus, specify the first argument as follows:
thsr:lookup("/myThsrDocs/wordnet.xml", "car")[1]
Example: Adding a Synonym to a Thesaurus Entry in JavaScript
The following JavaScript program adds the synonym Alfa Romeo to the thesaurus entry for car in the thesaurus with URI /myThsrDocs/wordnet.xml:
const thsr = require("/MarkLogic/thesaurus");
declareUpdate();
thsr.addSynonym(
thsr.lookup("/myThsrDocs/wordnet.xml", "car"
// requires the "elements" option because addSynonym takes an
// element, not a JSON object
"elements"),
{"synonym":{
"term": "Alfa Romeo"}
})
This assumes that the lookup for the car thesaurus entry returns a single entry. If the car lookup returns multiple entries, you must specify a single entry. Notice also that the lookup must specify "elements" because thsr.addSynonym requires an element entry. For example, if you wanted to add the synonym to the first car entry in the thesaurus, specify the first argument using the first variable from the following code:
fn.subsequence(
thsr.lookup("/myThsrDocs/wordnet.xml", "car"), 2, 1))
Example: Removing a Synonym From a Thesaurus in XQuery
The following XQuery removes the synonym Fiat from the thesaurus entry for car in the thesaurus with URI /myThsrDocs/wordnet.xml:
xquery version "1.0-ml";
import module namespace thsr="http://marklogic.com/xdmp/thesaurus"
at "/MarkLogic/thesaurus.xqy";
thsr:remove-synonym(thsr:lookup("/myThsrDocs/wordnet.xml", "car"),
<thsr:synonym>
<thsr:term>Fiat</thsr:term>
</thsr:synonym>)
This query assumes that the lookup for the car thesaurus entry returns a single entry. If the car lookup returns multiple entries, you must specify a single entry. For example, if you wanted to remove the synonym from the first car entry in the thesaurus, specify the first argument as follows:
thsr:lookup("/myThsrDocs/wordnet.xml", "car")[1]
Example: Removing a Synonym From a Thesaurus in JavaScript
The following JavaScript program removes the synonym Fiat from the thesaurus entry for car in the thesaurus with URI /myThsrDocs/wordnet.xml:
const thsr = require("/MarkLogic/thesaurus");
declareUpdate();
thsr.removeSynonym(thsr.lookup("/myThsrDocs/wordnet.xml", "car",
"elements"),
{"term": "Fiat"});
This query assumes that the lookup for the car thesaurus entry returns a single entry. If the car lookup returns multiple entries, you must specify a single entry. For example, if you wanted to remove the synonym from the first car entry in the thesaurus, specify the first argument as follows:
fn.subsequence(
thsr.lookup("/myThsrDocs/wordnet.xml", "car"), 2, 1))
Expanding Searches Using a Thesaurus in XQuery
You can expand a search to include terms from a thesaurus as well as the terms entered in the search. Consider the following XQuery statement:
xquery version "1.0-ml";
import module namespace thsr="http://marklogic.com/xdmp/thesaurus"
at "/MarkLogic/thesaurus.xqy";
cts:search(
doc("/Docs/hamlet.xml")//LINE,
thsr:expand(
cts:word-query("weary"),
thsr:lookup("/myThsrDocs/thesaurus.xml", "weary"),
(),
(),
() )
)
This query finds all of the lines in Shakespeare's Hamlet that have the word weary or any of the synonyms of the word weary.
Thesaurus entries can have many synonyms, though. Therefore, when you expand a search, you might want to create a user interface in the application which provides a form allowing a user to specify the desired synonyms from the list returned by thsr:expand. Once the user chooses which synonyms to include in the search, the application can add those terms to the search and submit it to the database.
Expanding Searches Using a Thesaurus in JavaScript
You can expand a search to include terms from a thesaurus as well as the terms entered in the search. Consider the following JavaScript program:
const thsr = require("/MarkLogic/thesaurus");
let res = [];
for (const x of cts.doc("/shakespeare/plays/hamlet.xml").xpath("//LINE")) {
if (cts.contains(x,
thsr.expand(
cts.wordQuery("weary"),
thsr.lookup("/myThsrDocs/thesaurus.xml", "weary"),
null, null, null ))) {
res.push(x) } };
res;
This returns an array containing all of the lines in Shakespeare's Hamlet that have the word weary or any of the synonyms of the word weary.
Thesaurus entries can have many synonyms, though. Therefore, when you expand a search, you might want to create a user interface in the application which provides a form allowing a user to specify the desired synonyms from the list returned by thsr.expand. Once the user chooses which synonyms to include in the search, the application can add those terms to the search and submit it to the database.