Powered by Zoomin Software. For more details please contactZoomin

Semaphore Classification Server Rulebase Reference

Linguistic capabilities

  • Last Updated: May 13, 2026
  • 6 minute read
    • Semaphore
    • Documentation

Key to table:

  • Y = supported
  • - = not supported
Languages Tokeni-zation Sentence Boundary Token Normali-zation Lemma Lookup Part-of-Speech Tagging Disambi-guation Lemma User Dictio-nary Segmen-tation User Dictio-nary Disambi-guation Disambi-guation Script Convers-ion Disambi-guation Semitic Root
Arabic Y Y Y Y Y Y - Y - - - Y Y
Catalan Y Y - Y - - - Y - - - - -
Chinese (simplified and traditional) Y Y Y Y Y Y Y Y Y Y Y - -
Czech Y Y Y Y Y Y Y Y - - - - -
Danish Y Y Y Y - - Y Y Y - - - -
Dutch Y Y Y Y Y Y Y Y Y - - - -
English Y Y Y Y Y Y Y Y - - - - -
Estonian Y Y - Y - - - Y - - - - -
Finnish Y Y - - - - - Y - - - Y -
French Y Y Y Y Y Y Y Y - - - - -
German Y Y Y Y Y Y Y Y Y - - - -
Greek Y Y Y Y Y Y Y Y - - - - -
Hebrew Y Y - Y Y Y - Y - - - - -
Hungarian Y Y Y Y Y Y Y Y Y - - - -
Italian Y Y Y Y Y Y Y Y - - - - -
Languages Tokeni-zation Sentence Boundary Token Normali-zation Lemma Lookup Part-of-Speech Tagging Disambi-guation Lemma User Dictio-nary Segmen-tation User Dictio-nary Disambi-guation Disambi-guation Script Convers-ion Disambi-guation Semitic Root
Japanese Y Y Y Y Y Y Y Y Y Y - - -
Korean Y Y Y Y Y Y Y Y Y - - - -
Norwegian Y Y Y Y - - Y Y Y - - - -
Persian Y Y Y Y Y - - Y - - - Y -
Polish Y Y Y Y Y Y Y Y - - - - -
Portuguese Y Y Y Y Y Y Y Y - - - - -
Pushto Y Y - - - - - Y - - - - -
Romanian Y Y Y Y - - - Y - - - - -
Russian Y Y Y Y Y Y Y Y - - - - -
Serbian Y Y - Y - - - Y - - - - -
Slovak Y Y - Y - - - Y - - - - -
Spanish Y Y Y Y Y Y Y Y - - - - -
Swedish Y Y Y Y - - Y Y Y - - - -
Thai Y Y Y Y - - Y Y - - - - -
Turkish Y Y Y Y - - - Y - - - - -
Urdu Y Y Y - Y - - Y - - - Y -

Extracted Entities

Entity Extraction is the process of discovering and presenting specific entities and facts that occur in unstructured text. Entities denote the names of people, places, things, dates, values, and so forth, that can be extracted from text. An entity is defined as a pairing of a standard form and its type. For example, Winston Churchill/PERSON is an entity in which Winston Churchill is the standard form and PERSON is the type.

The language modules included with the software contain system dictionaries and provide an extensive set of predefined entity types. The extraction process can extract entities using Statistical techniques, pattern-matching (regex) and dictionaries (gazatteers). Extraction classifies each extracted entity by entity type and presents this metadata in a standardized format.

Key to tables:

  • S = statistical processor
  • G = exact matching processor (gazetteer)
  • R = pattern matching processor (regex)
  • D = (BETA) deep neural network processor

Standard Entities

These entities are enabled by default in Semaphore.

  • LOC = LOCATION: A city, state, country, region, or other location that contains both a population and a government. A geographic place such as a body of water, mountain, park, or address. A structure such as a building or monument.
  • ORG = ORGANIZATION: A corporation, institution, government agency, or other group of people defined by an established organizational structure.
  • PER = PERSON: A human identified by name, nickname or alias.
  • PROD = PRODUCT
  • TTL = TITLE: Appellation associated with a person by virtue of occupation, office, birth, or as an honorific
  • NAT = NATIONALITY: Reference to a country or region of origin, such as American or Swiss.
  • REL = RELIGION: Reference to an organized religion or theology as well as its followers
  • CC# = IDENTIFIER:CREDIT_CARD_NUM
  • EM = IDENTIFIER:EMAIL
  • MONEY = IDENTIFIER:MONEY
  • PERS_ID = IDENTIFIER:PERSONAL_ID_NUM
  • TEL# = IDENTIFIER:PHONE_NUMBER
  • URL = IDENTIFIER:URL
Language ISO code LOC ORG PER PROD TTL NAT REL CC# EM MONEY PERS ID TEL# URL
Arabic ara S/G/D S/G/D S/D S G G R R R R R R
Chinese (script-intensive, simplified and traditional) zho, zhs, zhs S/G S/G S S G G R R R R R R
Dutch nld S S/G S G R R R R R R
English eng S/G S/R/G/D S/D S S G G R R R R R R
French fra S S/G S S R R R R R R
German deu S S/G S S R R R R R R
Hebrew heb S S/G S R R R R R R
Hungarian hun S/G S/G S/G S S R R R R R R
Indonesian ind S/G S/G S R R R R R R
Italian ita S S/G S S R R R R R R
Japanese jpn S S/G S S G G R R R R R R
Korean kor S/D S/G/D S/D S G G S S S G G
Malay, Malay Standard msa, zsm S/G S/G S R R R R R R
Persian (Western Farsi, Dari) fas S S/G S G G G R R R R R R
Portuguese por S S/G S S R R R R R R
Pashto pus S S/G S S R R R R R R
Russian rus S S/G S S G G R R R R R R
Spanish spa S S/G S S R R R R R R
Urdu urd S S/G S G R R R R R R
Vietnamese vie S S S G G G R R R R R R

Additional Entities

An additional set of entities that can be enabled for most languages, mostly Regex processors.

  • DIST = IDENTIFIER:DISTANCE
  • LATLNG = IDENTIFIER:LATITUDE_LONGITUDE
  • UTM = IDENTIFIER:UTM: Geographical coordinates, expressed with the Universal Transverse Mercator System
  • DATE = TEMPORAL:DATE
  • TIME = TEMPORAL:TIME
Language ISO code DIST LATLNG UTM DATE TIME
Arabic ara R R R R R
Chinese (script-intensive, simplified and traditional) zho, zhs, zhs R R R R R
Dutch nld R R R R R
English eng R R R R R
French fra R R R R R
German deu R R R R R
Hebrew heb R R R R R
Hungarian hun R R R R R
Indonesian ind R R R R R
Italian ita R R R R R
Japanese jpn R R R R R
Korean kor S
Malay, Malay Standard msa, zsm R R R R
Persian (Western Farsi, Dari) fas R R R R R
Portuguese por R R R R R
Pashto pus R R R R R
Russian rus R R R R R
Spanish spa R R R R R
Urdu urd R
Vietnamese vie R R R

Comments and glossary

See [CONVERSION ERROR: MISSING FILE ./introduction/terminology.md] for help.

Chinese

There are two standard forms of written Chinese: Simplified Chinese (SC) and Traditional Chinese (TC). SC is used in the People’s Republic of China (PRC), normally employing the GB2312-80 or GBK character set. TC is used in Taiwan, Hong Kong, and Macau, normally employing the Big Five character set. Semaphore supports both forms of Chinese.

Reading

Japanese support includes the rendering of Furigana transcriptions into Hiragana.

TitleResults for “How to create a CRG?”Also Available inAlert