Guidelines for using Unicode
- Last Updated: March 30, 2020
- 2 minute read
- OpenEdge
- Version 12.2
- Documentation
Guidelines for using Unicode
When you use Unicode in OpenEdge applications, the following restrictions, cautions, and suggestions apply:
- With the OpenEdge UTF-8 BASIC collation, composed and decomposed characters are treated as different characters. With the International Components for Unicode (ICU) collations, composed and decomposed characters are treated as the same character for comparisons and indexes.
- The OpenEdge UTF-8 BASIC collation provides for sorting Unicode data in
binary order. Alternatively, the ICU collations provide for sorting Unicode data based on
the language-specific requirements for a locale.Note: You can specify an OpenEdge collation or an ICU collation for sorting data using either the Collation Table (
-cpcoll) startup parameter, or theCOLLATEoption on theFORstatement, theOPEN QUERYstatement, and thePRESELECTphrase. For more information on the-cpcollstartup parameter, see Startup Command and Parameter Reference. For more information on the ABL elements, see ABL Reference.For information about using ICU collations as database collations, see Rules and Techniques for Databases.
- Before sorting Unicode data with the UTF-8 BASIC collation, normalize the
data using the ABL
NORMALIZEfunction. Normalizing the data converts the data into a standardized form that allows for more accurate and consistent sorting and indexing. This is important when working with characters or sequences of characters that have multiple representations (for example, base characters and combining characters) because it ensures that equivalent strings have a unique binary representation. For more information on the ABLNORMALIZEfunction, see ABL Reference.Note: When sorting Unicode data with an ICU collation, you do not need to normalize the data. - When UTF-8 data contains decomposed characters, you cannot convert
it to a single-byte code page. You must first compose the data using
the ABL
NORMALIZEfunction. When you convert data from a single-byte code page to Unicode, the result is always composed data. - OpenEdge supports code-page conversion to and from UTF-8 the same way it supports code-page conversion to and from other code pages. For more information on code-page conversion, see About Code Pages and Character Processing Tables.
- When an existing database is converted to UTF-8, the amount of storage required by each non-ASCII character increases. Roughly, each non-ASCII Latin-alphabet character converted to UTF-8 tends to require two bytes, while each double-byte Chinese, Japanese, or Korean character converted to UTF-8 tends to require three bytes.
- To display and print Unicode data, consider using a Unicode font. They are available commercially.