A methodology (Extractors)
- Last Updated: May 29, 2026
- 1 minute read
- Semaphore
- Documentation
Extractor Methodology
This is where the real art of fact extraction takes place!
At this point, you should know:
- How the content breaks down into distinctive document types.
- How to identify those document types.
- Which document types have which facts.
- How your facts should be structured.
All that remains is to write the extractors.
How does CS “see” your content?
The first crucial step is this: after processing your content through CS and examining it in CAT / CSTI, carefully note how your fact now appears.
Tip: Use how CS tokenizes your content to determine if the fact can be found through its atomic structure. Is it, in its entirety (as a single fact if simple, or multiple facts if complex), matchable against some pattern of concept, taxonomy, wildcard, or entity facts with no other anchors?
Extraction Decision Flow
- Is the fact matchable as described above?
- Yes:
- Is it a simple fact?
- If so, use any context type and a single fact type that matches.
- Is it a complex fact?
- If so, see Complex Fact Extraction Strategies.
- Is it a simple fact?
- No:
- Is it a simple fact?
- Is it a complex fact?
- Yes:
- If it is a logical fact, refer to Logical Fact Extraction Strategies.