Integration Implementation Details

Save PDF

Last Updated: April 5, 2026
21 minute read

Semaphore
Documentation

Given the functionality provided by Semaphore there are several functional elements that can be easily identified for implementation within a CMS. This section describes these elements along with proposed implementation guidelines.

General Technical Considerations

When integrating any solution with Semaphore there are a few general technical considerations:

Implementation Technology- Implementation of Semaphore functionality generally requires integration into one or more of the APIs provided by the various components of the suite. These APIs are XML or JSON-based so any technology should be used that can communicate via standard the HTTP TCP/IP protocol and process the XML/JSON results returned.
Concept IDs and URIs - When implementing any of concept-based features (such as retrieval of concept information) Smartlogic (generally) do not use the concept name to perform this action but rather the concept ID or URI assigned by Knowledge Model Management (5.x) or Ontology Editor (4.x) when a concept is created. This is because it is often the case that a model may contain multiple concepts with the same name so using the unique ID uniquely identifies exactly what concept is required. The interface may display the concept name but what is actually being used to retrieve concept information, tagged documents, etc, is actually the concept ID. Further note that it is not, technically speaking, required to use the conceptname it does make things considerably easier to implement.
Semaphore product licencing- Care should be made to ensure that any implementation keeps to any stipulations in the Semaphore licencing arrangements. This could include the number of installations of the various products, the size of the model, which APIs may be used, etc.

Tagging of Content on Update/Creation

An obvious implementation of Semaphore is to “tag” content as it is added or updated in the CMS with concepts from the model. These “tags” are stored alongside the document for use elsewhere (such as searching for those documents containing a particular tag value). This can be done in a number of different ways, for example:

Automatically - Semaphore determines the relevant tags for the document and they are automatically assigned to the document with no user interaction.
Manually- User selects concepts manually from a visual representation of the model.
Assisted- Semaphore automatically determines the relevant tags for the document but then the interface allows the user to adjust these tags appropriately before they are attached to the document.

Which option is used depends upon the specific requirements of the implementation. For example, in an environment where users have little understanding or need for the model perhaps the “Automatic” approach would be preferred however in an environment where users are required to correctly classify the content they maintain then perhaps “Assisted” is the better approach.

Important: Consideration should be made regarding the classification of existing content when a Semaphore-augmented solution is first implemented. Will someone be responsible for reviewing and classifying all existing content or should a tool be created to do this automatically?

An additional consideration would be to include some sort of mechanism whereby a CMS user can comment on the model itself such as to:

Recommend additional concepts for the model should they find the concept they require not present.
Comment on the structuring of concepts.

Example Tagging Implementation

The following is an example of how tagging could be implemented in a typical CMS integration project.

When a page is first created in the CMS then the user could be prompted with a screen such as the following:

This same screen could be optionally shown for pages that already exist or automatically displayed whenever the page is updated (to allow the user to review the concepts previously present). In these cases the concepts already assigned to the document would, of course, be displayed on the left side of the screen.

When this screen is displayed the user has several options (in this example): They can have Semaphore suggest concepts (via a submission, behind the scenes, of the document to the Classification Server application), manually select concepts using a visual representation of the model (using information provided by the Semantic Enhancement Server application) or request concepts be added. An alternative to this would be to display a list of suggested concepts immediately to the user (based upon the information returned from Classification Server, as if they had selected the “Auto Suggest” button straight away).
If the user chooses to automatically suggest concepts then a screen such as the following could be displayed:

The right side of the page allows the user to see the concepts recommend by Semaphore as being relevant for the document along with the ability to de-select any of those concepts that the user may disagree as being relevant. In this case as we have performed automatic classification this selection of concepts should overwrite any pre-existing concepts present for the document.
If the user wishes to manually select concepts (by searching) they might see the following after clicking the “Search” button:

In both of the above cases we have displayed a list of concepts but you could also display the model in a hierarchical manner and allow users to select from it appropriately such as the following (from Smartlogic’s Semaphore for SharePoint solution):
If the user wishes to make a suggestion they might see the following after clicking the “User Suggest” button:

This process could generate an email with the page ID, user Id and the suggestion. Probably sensible to capture a note - “why do you need this concept added?”. The email would be routed to the team in charge of model maintenance. Alternatively, the Knowledge Model Management API could be used and concept suggestions sent directly to a “dummy” node in the model itself.
When the user saves their changes and/or tags have been automatically assigned the CMS should be updated to save the list of concepts selected (effectively “attached” to the document). This list should be ID-based, that is, it should contain the concept IDs rather than the concept name as often in a model the concept name is often not unique so the ID (which is assigned automatically by Knowledge Model Management and is available to all other Semaphore applications) will always uniquely identify the concept.

Automatic Classification of Content

The process for classification of content by Semaphore is to use the Classification Server component as follows:

Document content is submitted to Classification Server

When submitting content to Classification Server for classification only the content itself should be submitted, omitting any common headers, footers or menu elements that may interfere with classification. For example, the content may be something like the following:

<!-- start body text -->

<P>The <A href="http://www.un.org/crc/">Convention on the Rights of the Child</A> (CRC) is the most widely ratified human rights treaty in history. It sets forth a wide range of provisions that encompass civil rights and freedoms, family environment, 
… href="http://www.unhchr.ch/html/menu3/b/a_cescr.htm" target=5>International Covenant on Economic, Social and Cultural Rights</A>. </P>
<P> </P>

<!-- end body text -->

<P><EM>External links open in a new window and take you to non-health web sites.</EM></P>

The content itself, in this example, is between “<!– start body text –>” and “<!– end body text –>” which is extracted and submitted to Classification Server. The request to Classification Server is in XML format as follows (see Semaphore Classification and Language Service (CLS) for further information regarding requests/responses from Classification Server):

<?xml version="1.0" ?>
<request op="CLASSIFY">
  <document>
    <body>
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C%%//%%DTD XHTML 1.0 Transitional%%//%%EN"...
end rss blurb -->
<!-- start body text -->
<P>The <A href="http://www.unicef.org/crc/">Convention on the Rights of the Child</A> (CRC) is the most widely ratified human rights treaty in history. It&amp;nbsp;sets forth&amp;nbsp;a wide range of provisions&amp;nbsp;that encompass civil rights …
</html>
    </body>
    <singlearticle />
    <language>en1</language>
    <clustering type="RMS" threshold="48" />
    <threshold>48</threshold>
<min_average_article_pagesize>1.0</min_average_article_pagesize>
  </document>
</request>

You can see in this request that the text extracted from the document (XML encoded) is placed into the “<body>” section of the XML.

Classification Server returns results

The results returned from Classification Server are in XML format, as follows:

<?xml version="1.0" encoding="UTF-8" ?> 
<response>
<STRUCTUREDDOCUMENT>
  <URL>../tmp/1257331870_a48.txt</URL> 
  <META name="Type" value="TEXT" /> 
  <META name="SUBJECT" value="Adoption" score="0.51" key="1087322" /> 
  <META name="SUBJECT" value="Child care" score="0.51" key="748174" /> 
  <META name="SUBJECT" value="Child protection" score="0.34" key="1157154" /> 
  <META name="SUBJECT" value="Civil and human rights" score="0.86" key="1071783" /> 
  <META name="SUBJECT" value="Health education" score="0.20" key="862379" /> 
  <META name="SUBJECT" value="International organisations" score="0.48" key="411225" /> 
</STRUCTUREDDOCUMENT>
</response>

Integration code extracts concepts from XML and presents information appropriately in the CMS interface

In the above examples, the concepts to display to the user are:
- Adoption
- Child care
- Child protection
- Civil and human rights
- Health education
- Internal organisations
Important: The "score" returned in the XML response for each concept is a measure of relevance indicating just how relevant Classification Server believes the concept to be for the document as a value between 0 and 1 where 1 is absolutely relevant and 0 is completely irrelevant.

Search Enhancement

This section discusses some of the functionality provided by the Semaphore software and the integration into the CMS that can be implemented.

Important: This section contains examples using the "REST" version of the Semantic Enhancement Server API, but there is also a "GET" version with different parameters. See Semaphore Semantic Enhancement Server (SES) API Reference for details.

Search Features

One of the key reasons for tagging content is to allow users to be able to display content specific to a given concept in the model. Once documents are appropriately tagged facilities can be provided to allow users to search for this specific content. Various features can be provided that will extend the functionality provided by normal search implementations.

There are two separate aspects to this based upon what type of search is being performed.

Normal free-text search - If a user has simply typed in a phrase or words and performed a search then the normal search results can be displayed without any alterations from the Semaphore software. In this case the Semaphore software can be used to display a list of recommended concepts (see the “Concept Mapping” section) from which the user can select a concept to perform the second type of search, that is, to display a list of documents containing the concept selected.
Searching using a particular concept - Once a user selects a concept either from concept mapping (as above) or via some other mechanism (lists of concepts, etc) then the search returns content that has been tagged (via the CMS tagging process, see Tagging of Content on Update/Creation) as being relevant for that concept. If supported in the search engine, the search results can also be ordered by relevancy to the concept being searched on (the first result in the results is the most relevant for the concept, the second is the second most relevant, and so on).

It is important to note the distinction between these two types of searches: The first search simply looks for the word(s)/phrase entered in the documents whereas the second searches for documents that have been tagged as being relevant for the selected concept. A simple example of this is a search for “vacation”. In the first search above this would return all documents containing the word “vacation” but in the second search, when we are searching for the concept “vacation” not only documents returning that word are returned but also documents talking about “holidays” or various tourist hot-spots are also returned.

Important: To provide the functionality required to support these searches, the Semaphore "Semantic Enhancement Server" component is used to return concept information (based upon information found in the model stored in the Semaphore "Ontology Server" component) while the Semaphore "Classification Server" component is used for automatic classification of content (again, generated from the model information).

The following sections describe some of the key search behaviour then go on to describe how the behaviour can, technically, be implemented using the Semaphore software.

“Search As You Type” Functionality

Description

This functionality works by suggesting concepts from the model that may be relevant to what a user is attempting to search on as they information into the search box. A list of suggested concepts is dynamically updated as the user enters characters by the keyboard in a drop down window. Selecting a concept from the drop down performs a search to retrieve those pages from the CMS that are relevant for the concept selected.

Search as You Type

Implementation Details

Technically this can be implemented by calling the “hints” service of the Semantic Enhancement Server with a suffix containing the search text. The XML results will contain a list of concepts that can then be displayed in the interface. For example, retrieving http://localhost:8983/ses/SpaceMissions/hints/apollo would return, for the above example, the following:

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<SEMAPHORE>
  <PARAMETERS>
    <PARAMETER NAME="q">autocomplete_en_plf:apollo* autocomplete_en_f:apollo* autocomplete_en_pl:apollo* autocomplete_en:apollo*</PARAMETER>
    <PARAMETER NAME="defType">edismax</PARAMETER>
    <PARAMETER NAME="qf">autocomplete_en_plf^100.0 autocomplete_en_f^20.0 autocomplete_en_pl^50.0 autocomplete_en^1.0</PARAMETER>
    <PARAMETER NAME="fl">id, name_en_pl, name_en, class_name_en, [child parentFilter="(content_type:concept OR content_type:concept_scheme)" childFilter=content_type:facet]</PARAMETER>
    <PARAMETER NAME="language">en</PARAMETER>
    <PARAMETER NAME="language">en</PARAMETER>
    <PARAMETER NAME="language">en</PARAMETER>
    <PARAMETER NAME="fq">content_type:concept_scheme content_type:concept</PARAMETER>
    <PARAMETER NAME="sort">score desc, name_en_pl asc</PARAMETER>
    <PARAMETER NAME="rows">10</PARAMETER>
    <PARAMETER NAME="version">1</PARAMETER>
    <PARAMETER NAME="raw_query">apollo</PARAMETER>
    <PARAMETER NAME="wt">sesHintsXML</PARAMETER>
    <PARAMETER NAME="structure">XML</PARAMETER>
  </PARAMETERS>
  <TERM_HINTS total="10">
    <TERM_HINT ID="bd6579bf-401d-4883-8918-ac2e5832e124" NAME="Apollo 1">
      <CLASSES>
        <CLASS>Mission</CLASS>
      </CLASSES>
      <HINT ID="bd6579bf-401d-4883-8918-ac2e5832e124" NATURE="PT"><EM>Apollo</EM> 1</HINT>
      <FACET ID="7e3009d6-2636-4e96-af72-81a70b19e18d" NAME="Programmes &amp; Missions"/>
    </TERM_HINT>
    <TERM_HINT ID="ec138adf-2527-47f9-8607-f798f87bf872" NAME="Apollo 10">
      <CLASSES>
        <CLASS>Mission</CLASS>
      </CLASSES>
      <HINT ID="ec138adf-2527-47f9-8607-f798f87bf872" NATURE="PT"><EM>Apollo</EM> 10</HINT>
      <FACET ID="7e3009d6-2636-4e96-af72-81a70b19e18d" NAME="Programmes &amp; Missions"/>
    </TERM_HINT>
    <TERM_HINT ID="493cb4f5-102f-4492-a766-449ff1eba891" NAME="Apollo 11">
      <CLASSES>
        <CLASS>Mission</CLASS>
      </CLASSES>
      <HINT ID="493cb4f5-102f-4492-a766-449ff1eba891" NATURE="PT"><EM>Apollo</EM> 11</HINT>
      <FACET ID="7e3009d6-2636-4e96-af72-81a70b19e18d" NAME="Programmes &amp; Missions"/>
    </TERM_HINT>
    ...
  </TERM_HINTS>
</SEMAPHORE>

In this case we know that the concepts (given as the “NAME” attribute on the <TERM_HINT> tag):

Apollo 1
Apollo 10
Apollo 11
…

The “<HINT>” element shows us what to actually display in the drop down (with <EM> flagging the part of the text that matches what the user has typed).

Important: In order to facilitate implementation of this using client-side web browser (HTML with JavaScript) interfaces the Semantic Enhancement Server can return information in JSON format by adding a ".json" to the end of the REST API call, for example, http://localhost:8983/ses/SpaceMissions/hints/apollo.json. This is the same for any call to SES.

“Concept Mapping” Functionality

Description

This functionality of Semantic Enhancement Server will, for given search text, return a list of concepts that it believes are relevant for the search. For example, if you were to search for “england” it may come back with “Bank of England” and “UK Government”. When these concepts are returned they can be used for subsequent searches, that is, when the user selects one (or more) of these concepts the search engine is requested to return only those documents tagged with the concept(s) selected.

Concept Mapping

The determination of whether or not a concept is relevant for a particular free-text search is based upon more than simply a check to see if the word (or words) are present in the concept name but also takes into consideration things such as synonyms and variants of the word entered (for example, if someone searches for “cars” return concepts with that name in it but also those with “car” in them).

Implementation Details

Technically, this is implemented by calling the “concepts” service of the Semantic Enhancement Server with a “query” parameter containing the search text. The XML results will contain a list of concepts that can then be displayed in the interface. For example, retrieving http://localhost:8983/ses/SpaceMissions/concepts/apollo would return XML containing the list of concepts/terms (within the individual “<TERM>” sections in the XML results), as follows:

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<SEMAPHORE>
  <PARAMETERS>
    <PARAMETER NAME="q">conceptmap_en_plf:apollo^1000.0  conceptmap_en_f:apollo^100.0  conceptmap_en_pl:apollo^10.0  conceptmap_en:apollo^1.0 </PARAMETER>
    <PARAMETER NAME="stop_cm_after_stage">2</PARAMETER>
    <PARAMETER NAME="fl">* score [child limit=10000 parentFilter="(content_type:concept OR content_type:concept_scheme)" childFilter=content_type:related_concept] [child limit=10000 parentFilter="(content_type:concept OR content_type:concept_scheme)" childFilter=content_type:ordered_related_concept] [child limit=10000 parentFilter="(content_type:concept OR content_type:concept_scheme)" childFilter=content_type:related_concept_scheme] [child limit=10000 parentFilter="(content_type:concept OR content_type:concept_scheme)" childFilter=content_type:facet] [child limit=10000 parentFilter="(content_type:concept OR content_type:concept_scheme)" childFilter=content_type:path_element] prefLabelCount:termfreq(conceptmap_en_plf, "apollo") altLabelCount:termfreq(conceptmap_en_f, "apollo")</PARAMETER>
    <PARAMETER NAME="language">en</PARAMETER>
    <PARAMETER NAME="language">en</PARAMETER>
    <PARAMETER NAME="language">en</PARAMETER>
    <PARAMETER NAME="sort">score desc, name_en_pl asc</PARAMETER>
    <PARAMETER NAME="fq">content_type:concept_scheme content_type:concept</PARAMETER>
    <PARAMETER NAME="rows">1000</PARAMETER>
    <PARAMETER NAME="version">1</PARAMETER>
    <PARAMETER NAME="wt">sesConceptsXML</PARAMETER>
    <PARAMETER NAME="structure">XML</PARAMETER>
    <PARAMETER NAME="command">conceptmap</PARAMETER>
  </PARAMETERS>
  <TERMS count="24">
    <TERM SCORE="35.87069" SRC="3" URI="http://ontologies.smartlogic.com/Space-Missions#Apollo_1">
      <NAME>Apollo 1</NAME>
      <ID>bd6579bf-401d-4883-8918-ac2e5832e124</ID>
      <DISPLAY_NAME>Apollo 1</DISPLAY_NAME>
      <FREQUENCY>28</FREQUENCY>
      <CLASSES>
        <CLASS>Mission</CLASS>
      </CLASSES>
      <FACETS>
        <FACET ID="7e3009d6-2636-4e96-af72-81a70b19e18d" NAME="Programmes &amp; Missions"/>
      </FACETS>
      <HIERARCHY ABBR="BT" QTY="1" TYPE="Broader Term">
        <FIELD FREQ="1609" ID="66e9490b-fc45-488b-bba6-720c88b4657b" NAME="term">Apollo space program</FIELD>
      </HIERARCHY>
      <ASSOCIATED ABBR="LAUNCHED FROM" QTY="1" TYPE="launched from">
        <FIELD FREQ="0" ID="1eeab722-25d0-40b7-a895-7df6b5bc52e4" NAME="term">Cape Canaveral Air Force Station Launch Complex 34</FIELD>
      </ASSOCIATED>
      <ASSOCIATED ABBR="HAS SPACECRAFT" QTY="1" TYPE="has spacecraft">
        <FIELD FREQ="146" ID="45a88948-50ac-41ec-a181-a7b621b48db4" NAME="term">Apollo Command/Service Module</FIELD>
      </ASSOCIATED>
      <PATH ABBR="NT" TYPE="Narrower Term">
        <FIELD FREQ="2111" ID="7e3009d6-2636-4e96-af72-81a70b19e18d" NAME="term">Programmes &amp; Missions</FIELD>
        <FIELD FREQ="1609" ID="66e9490b-fc45-488b-bba6-720c88b4657b" NAME="term">Apollo space program</FIELD>
        <FIELD FREQ="28" ID="bd6579bf-401d-4883-8918-ac2e5832e124" NAME="term">Apollo 1</FIELD>
      </PATH>
      <METADATA>
        <FIELD NAME="URI">http://ontologies.smartlogic.com/Space-Missions#Apollo_1</FIELD>
        <FIELD NAME="editorial note">Apollo 1 (initially designated AS-204) was the first manned mission of the U.S. Apollo manned lunar landing program. The planned low Earth orbital test of the Apollo Command/Service Module never made its target launch date of February 21, 1967, because a cabin fire during a launch rehearsal test on January 27 at Cape Canaveral Air Force Station Launch Complex 34 killed all three crew members—Command Pilot Virgil I. "Gus" Grissom, Senior Pilot Edward H. White II, and Pilot Roger B. Chaffee—and destroyed the Command Module (CM). The name Apollo 1, chosen by the crew, was officially retired by NASA in commemoration of them on April 24, 1967.Immediately after the fire, NASA convened the Apollo 204 Accident Review Board to determine the cause of the fire, and both houses of the United States Congress launched their own committee inquiries to oversee NASA's investigation. During the investigation, a NASA internal document citing problems with prime Apollo contractor North American Aviation was publicly revealed by a Senator and became known as the "Phillips Report", embarrassing NASA Administrator James E. Webb, who was unaware of the document's existence, and attracting controversy to the Apollo program. Despite congressional displeasure at NASA's openness, both congressional committees ruled that the issues raised in the report had no bearing on the accident, and allowed NASA to continue with the program.Although the ignition source could not be conclusively identified, the astronauts' deaths were attributed to a wide range of lethal design and construction flaws in the early Apollo Command Module. Manned Apollo flights were suspended for 20 months while these problems were corrected. The Saturn IB launch vehicle, SA-204, scheduled for use on this mission, was later used for the first unmanned Lunar Module (LM) test flight, Apollo 5. The first successful manned Apollo mission was flown by Apollo 1's backup crew on Apollo 7 in October 1968.</FIELD>
        <FIELD NAME="Wikidata URI">http://www.wikidata.org/entity/Q194082</FIELD>
        <FIELD NAME="DBPedia URI">http://dbpedia.org/resource/Apollo_1</FIELD>
      </METADATA>
      <CREATED_DATE>2017-08-09T17:20:57+0000</CREATED_DATE>
      <MODIFIED_DATE>2018-04-05T15:26:14+0000</MODIFIED_DATE>
    </TERM>
    <TERM SCORE="35.87069" SRC="3" URI="http://ontologies.smartlogic.com/Space-Missions#Apollo_18">
      <NAME>Apollo 18</NAME>
      <ID>7985492e-7a23-4070-ac9b-11ad2cf868d4</ID>
      <DISPLAY_NAME>Apollo 18</DISPLAY_NAME>
      <FREQUENCY>1</FREQUENCY>
      <CLASSES>
        <CLASS>Mission</CLASS>
      </CLASSES>
      <FACETS>
        <FACET ID="7e3009d6-2636-4e96-af72-81a70b19e18d" NAME="Programmes &amp; Missions"/>
      </FACETS>
      <HIERARCHY ABBR="BT" QTY="2" TYPE="Broader Term">
        <FIELD FREQ="1609" ID="66e9490b-fc45-488b-bba6-720c88b4657b" NAME="term">Apollo space program</FIELD>
        <FIELD FREQ="2" ID="62f939be-d1c4-4441-be09-e26290f0018e" NAME="term">Canceled Apollo missions</FIELD>
      </HIERARCHY>
      <PATH ABBR="NT" TYPE="Narrower Term">
        <FIELD FREQ="2111" ID="7e3009d6-2636-4e96-af72-81a70b19e18d" NAME="term">Programmes &amp; Missions</FIELD>
        <FIELD FREQ="1609" ID="66e9490b-fc45-488b-bba6-720c88b4657b" NAME="term">Apollo space program</FIELD>
        <FIELD FREQ="1" ID="7985492e-7a23-4070-ac9b-11ad2cf868d4" NAME="term">Apollo 18</FIELD>
      </PATH>
      <PATH ABBR="NT" TYPE="Narrower Term">
        <FIELD FREQ="2111" ID="7e3009d6-2636-4e96-af72-81a70b19e18d" NAME="term">Programmes &amp; Missions</FIELD>
        <FIELD FREQ="1609" ID="66e9490b-fc45-488b-bba6-720c88b4657b" NAME="term">Apollo space program</FIELD>
        <FIELD FREQ="2" ID="62f939be-d1c4-4441-be09-e26290f0018e" NAME="term">Canceled Apollo missions</FIELD>
        <FIELD FREQ="1" ID="7985492e-7a23-4070-ac9b-11ad2cf868d4" NAME="term">Apollo 18</FIELD>
      </PATH>
      <METADATA>
        <FIELD NAME="URI">http://ontologies.smartlogic.com/Space-Missions#Apollo_18</FIELD>
        <FIELD NAME="Wikidata URI">http://www.wikidata.org/entity/Q923656</FIELD>
      </METADATA>
      <CREATED_DATE>2017-08-09T17:20:57+0000</CREATED_DATE>
      <MODIFIED_DATE>2017-08-09T17:20:57+0000</MODIFIED_DATE>
    </TERM>
    ...
  </TERMS>
</SEMAPHORE>

From this XML we can extract that the following concepts are relevant for “apollo”:

Apollo 1
Apollo 18
…

Important: See Semaphore Semantic Enhancement Server (SES) API Reference document for additional information on how concept mapping works and how it can be more fully utilised.

“Search Result Ordering” Functionality

With the automatic classification functionality of Semaphore not only does Classification Server return a list of concepts but it also returns a list of “confidence values” or “scores” for each of those concepts which indicates how relevant it believes the specific concept is to the document. Using this score information when searching for documents containing a particular concept you can order the search results so that the first result is the document that Classification Server believes to be the most relevant.

“Best Bets” or “Recommended Pages” Functionality

In conjunction with the ordering of search results the idea of a “Best Bet” or “Recommended Pages” can also be implemented. This is where the most relevant documents for a particular concept are manually flagged by users so that they can be displayed at the top of the search results perhaps in a differently coloured box or identified with appropriate text to somehow highlight it for the search user. Often this can be as simple as maintaining relevant information in the model itself using Ontology Manager. In the past this has been implemented using Semaphore by adding the following meta data (“term information”) elements to the model and setting appropriately for the relevant concepts:

site_url - The URL for the page. e.g. www.bbc.co.uk
site_title - The text that appears in the actual link to the page. e.g. “BBC Home Page”
site_description - A description for the page that can be displayed below the link. e.g. “Home of the British Broadcasting Corporation.”

Technically, this is implemented by using the “term” (if a term search is being performed) or “concepts” (if a free-text search is being performed) service of the Semantic Enhancement Server to return a list of all meta information present when a search is performed. At this point you can then

“Topic Maps” Functionality

Description

Topic maps are very much the “must have” feature in many new search implementations. A topic map is relevant for any search performed and displays a list of common “meta” elements present in the search results. These are displayed on the results page so that when a user clicks on one or more of these elements the results are filtered to display only those documents relevant. An example might be the ability to filter your search results to only include “Adobe Acrobat (PDF)” documents or you may only want to see documents modified in the last year. The topic maps would allow you to filter the results to meet these criteria.

With Semaphore you can provide a section to your topic map that pertains to concepts that have been assigned to the documents that appear in the search results. So, you can filter the results to only display those documents relevant for particular concepts (regardless of the search being performed).

Topic Maps

This can be extended to provide even more advanced styles of interface including the display of the actual hierarchy from the model as follows:

Hierarchical Topic Maps

Implementation Details

The technical implementation here is a bit more involved. The process consists of a few stages:

For each of the documents in the search results extract the model concepts (from the CMS) that have been assigned to them.
For each of the concepts, call the Semantic Enhancement Server “term” service to return the concept names (and, as above, the hierarchy information for display).
Construct the display by processing the information returned by Semantic Enhancement Server to create the display (perhaps split by “facet”, as above, which is based on the information returned in the <FACETS> section of the XML).
When the concept is selected, the system should perform the same search as was previously performed with the appropriate filter parameter added to include the concept selected.

A few suggestions:

Include the number of documents relevant for the concept displayed in the topic map by either directly indicating the count or a relevance bar (as used above).
If you want to display the hierarchy do not allow the user to select concepts that have no documents (that is, where a child concept is present in the search results but not the parent, even though the parent might be displayed do not allow users to select it).
Somehow visually indicate to the users that filtering has been applied and also allow them to remove filters that they no longer require.
You may want to allow users to apply the “inverse” option on a filter, that is, all documents that do NOT have the concept indicated assigned to them.
You may want to include non-Semaphore generated information such as file types, and modification dates based upon the information that is available in the CMS.

Alternative Search Methods

There are ways of guiding users to the content they require that do not involve them having to type in some information and performing a search. Depending on the specific implementation requirements there are several alternative methods that can be used to guide users to the content required. This section describes several that may be of interest that can be implemented using Semaphore technology.

Description

Another use of Semaphore is to display a list of all model concepts initially to the user (perhaps displayed in any hierarchy used in the model structure itself) and, when the user selects a concept, display a list of documents relevant for that concept. If the model is logically structured this may be a viable enhancement to the system to complement free-text searching but it does require that users have an understanding of the model or the structure the model represents. The model list could be displayed on the main page on a web site or as part of any standard page template as a navigation aid.

Term Browser

Implementation Details

Technically, this can be implemented using the “hierarchy” service of the Semantic Enhancement Server. When initially called with only the basic set of parameters (using “roots” without an “ID” parameter) this service returns a list of top level concepts, e.g. http://localhost:8983/ses/SpaceMissions/hierarchy/roots. As a user clicks through the hierarchy of concepts in the model a list of sub-concepts can be retrieved by calling the “hierarchy” service with an appropriate ID suffix (the “ID” being the ID of the concept being expanded), for example http://localhost:8983/ses/SpaceMissions/hierarchy/bfc33274-197f-4873-a1f4-f421c8ab64aa.

In this implementation, it would be logical that when the user clicks on a particular concept (to select it) then the search is called with the appropriate parameters provided to search for the concept.

Description

A common navigation facility for a web site is the “A-Z” which provides the letters of the alphabet displayed that can be selected to show key categories beginning with each of the letters that, when selected, display those documents relevant for the category. When used with Semaphore these categories can be concepts from the model that, when clicked, perform a search to display those documents that have been tagged with the concept selected.

A-Z

Implementation Details

Technically, this can be implemented using the “az” service of the Semantic Enhancement Server. This works in conjunction with a the “A-Z Entry” attribute in the model that identifies which concepts are to be included in this service. The “AZ” parameter for SES indicates which letter you are interested in seeing with a special value of “all” that returns a complete list of all A-Z concepts in the model. So, for example, http://localhost:8983/ses/SpaceMissions/az/a will return a list of concepts (in XML format) beginning with the letter “a” that have been flagged in the model with the “A-Z Entry” attribute.

Important: It is recommended that you only display those concepts for which content exists as there is nothing more frustrating to a user than clicking on a concept in the A-Z only to find that no documents are present. If you are using the Count Updater to retrieve documents counts from a particular search engine then you can check the XML being returned for each concept to see if the "<FREQUENCY>" value is greater than "0".

When implementing the A-Z you may want to have a “catch-all” value such as “#” in the image displayed above that contains a list of all concepts not beginning with a letter (such as numerics but this could also be concepts beginning with brackets or any other non-letter values). To implement this you can call the “az” Semantic Enhancement Server service repeatedly for each of the characters you are interested in (e.g. az=0, az=1, az=2, etc) or you can call the az service once with the parameter “all” then put each of the concepts in the appropriate areas in the display all at once (with any that do not begin with a letter being added to the “#” area).

Each of the concepts displayed in the A-Z can be a link to call the normal search interface with parameters provided to return only those documents relevant for the concept selected.

Using Count Updater in any Search Engine

The “Semaphore Count Updater” sub-component of Semaphore Publisher updates the a “count” value for each concept in the model that is then exposed when using the Semantic Enhancement Server search API. This is useful for determining, for example, when to display a concept or not - “If there are no documents, there is no point in showing this concept to the user in the interface”. The Count Updater, out of the box, only supports Semaphore for SoLR implementations, however, it can be used for any search engine as long as a service is provided that returns information in the format expected by Count Updater. The approach is to duplicate the results that are returned by the Semaphore for SoLR interface (as it is the simplest) then update the Count Updater configuration to talk to the application that returns these results.

Expected Count Updater Output

The application written will need to provide the same output that the Count Updater expects from SoLR. For Solr the expected output format is XML and is structured as follows:

<?xml version="1.0" encoding="UTF-8"?>
<response>
  <lst name="facet_counts">
    <lst name="facet_fields">
      <lst name="**field**">
        <int name="**term id**">**document count**</int>
        <int name="**term id**">**document count**</int>
        ...
      </lst>
    </lst>
  </lst>
</response>

Where **field** is the “field” configuration value for the count updater (see next section), **term id** is the relevant concept ID from the model which should match the “lookup_attribute” configuration value (e.g. if “lookup_attribute” is set to “zid” then the **term id** should be a zThes ID), and document count is the number of documents for the given concept (term).

If an application is written that returns the above information and responds to standard HTTP requests then the only thing that has to be done after this is update the count updater configuration to talk to it.

Important: If you want to write a more generic and flexible implementation then the application written should correctly respond to the following GET request format: http://**solr_host**:**solr_port****solr_path**?q=*%3A*&facet=true&facet.field=**field**&facet.limit=-1&rows=0 where the values enclosed in ”** **” are values from the configuration file (as per the format detailed in the next section)..

Count Updater Configuration

The default Solr Count Updater configuration is set in the publisher configuration as follows (be sure to add a reference to this bean to the “com.smartlogic.publisher.Publisher” <bean> in the publisher configuration file):

<bean id="solrCountUpdater" class="com.smartlogic.publisher.countupdater.SolrCountFinder" >
    <property name="solrURL" value="http://<solr host>:<solr port>/<solr path>/<document index name"> <!-- The URL of the SOLR index being interrogated -->
    <property name="solrFieldsIds">
       <list>
           <value>The solr field in which the identifier may be present</value>
       </list>
    </property>
</bean>

So, in this case the “solr_host”, “solr_port”, “solr_path” and “solrFieldIds” values should be updated to point to the application written to return the information in the correct format. As discussed previously, the “lookup_attribute” should also be set correctly.

As per standard Count Updater configuration, be sure to also update the “DefaultIndex” properties to point to the “SolrCountLookup” configuration you have updated (and make sure the “DummyCountLookup” reference is removed or commented out).

See Count Updater in the Semaphore Publisher guide for further configuration details.

Using the Custom Count Updater

When the application is written and the Count Updater configuration file is updated to point to it, Count Updater can be called as normal, that is, as part of the standard publish process, manually executed as required or scheduled using standard operating system features.

Semaphore Generic Integration Guide