What is a context?
- Last Updated: May 13, 2026
- 3 minute read
- Semaphore
- Documentation
We will now outline the FACTS methodology and the view it takes of fact extraction and classification.
A context is a range of text that has somehow been identified in the content and around which the fact we wish to extract occurs. However, as said before, if the fact we wish to extract has enough structure in and of itself, then we might be able to use simply that, without further contextualization.
A context can be defined in several ways. The most basic one is that a context has a starting point (e.g. word, document marker) and an ending point. Such a context can then be said to have a range of words that lie between those two points – we refer to that as the context’s phrase range or range. The fact that we are interested in extracting or classifying then lies within or next to that context’s range. For example, the following text contains two dates:
Date of approval: 10th January 2018
Publish date: 20th March 2018”
We could say, on a simple reading, that such content has two facts in it. However, our project’s requirements might only be interested in extracting a single date, the approval date. In that case, we need to be able to distinguish between the two dates, making only one a “fact”, as represented in our framework. To do that, we need to define its context, distinguishing it from the other date. Perhaps it is always the first date in the document type we are interested in. Perhaps it will always have the anchor “Date of approval” before it, and / or the anchor “Publish date” after it. All of those can be used as contexts for looking for an entity of type date. Should we find one, we can be very certain it is the date, the approval date, the fact, that we were looking for.
So, in this case, the inherent structure of a date is not in and of itself enough to choose the correct fact! We need some contextualization.
We can define contexts in many ways. Semaphore can bring its understanding of grammatical units and structures to the methodology. That is, we can identify a context as being a phrase, a sentence, a paragraph, or even the whole document, through using some feature they possess to identify them. That could be a word, or words, (that is, anchor or anchors), or even the fact itself, such as an entity (e.g. sentences with dates in them). For grammatical structures such as words, sentences, and paragraphs, that can also include their location in the document, e.g. does the context start or end the document, a paragraph, a sentence, etc. Out of the box, Semaphore will identify several grammatical units and structures, such as:
- Words
- Sentences
- Paragraphs
- Documents
- Fields (e.g. title, header, footer, body, etc.)
- Punctuation (e.g. ignore it or not within different units such as paragraphs and sentences, etc.)
- Start / end of units (e.g. of paragraphs, sentences, documents, fields, etc.)
The resulting contexts should also contain the fact or facts that we are interested in extracting. After all, identifying a context is just a means to an end – it is the fact or facts that we really want to get at! Sometimes we are interested in extracting the entire context’s words – as that is the “fact” we are trying to extract, the raw free-text. Other times, it is a more granular and / or normalising approach we need, where within our context, we wish to extract only very specific text such as an entity (e.g. a person or date) or concept (a concept from a taxonomy) that we interpret as a “fact”.
Next Section: A methodology