Character precedence in Unicode and Java Collator

Save PDF

Last Updated: October 22, 2025
1 minute read

Corticon
Documentation

The Unicode standard assigns a 4 digit (hexadecimal) code to every character, including many that can't be typed on standard keyboards. Java (and hence Progress Corticon software) uses a special method named Collator to sort these characters in specific sequences based on the I18n locale of the user.

While sorting by locale allows for regional variations of language-specific characters like accents, the combination of these two systems can also make determining character precedence very complicated. The Unicode code and Java Collator sequence for standard keyboards in US-English locale is shown in the table below.

Sequences for other languages and/or locales may differ, and many other Unicode characters are available but are not shown in the table. We recommend http://www.unicode.org/charts for more information on the Unicode system and http://java.sun.com/docs/books/tutorial/i18n/text/locale.html for more information on the Java Collator method.

'Z'='z' evaluates to false.
'C & S' < 'C and S' evaluates to true because character a has a higher precedence than & (26 < 44). These characters are decisive because they are the first different characters encountered as the two strings are compared beginning with characters in position 1.
'B' > 'aardvark' evaluates to true because character B has a higher precedence than a (45 > 44).
'Marilynn' < 'Marilyn' evaluates to false because character n has a higher precedence than <space> (57 > 1). The first seven characters of each String are identical, so the final character comparison is decisive.
NOTE: Encoding special characters in string literals is not documented here, as a list of all chars to escape with a backslash is easily searched on the internet.


character	name	precedence	Unicode 5.0 code
	typed space	1	0020
-	dash or minus sign	2	002D
_	underline or underscore	3	005F
,	comma	4	002C
;	semicolon	5	003B
:	colon	6	003A
!	exclamation point	7	0021
?	question mark	8	003F
/	slash	9	002F
.	period	10	002E
`	grave accent	11	0060
^	circumflex	12	005E
~	tilde	13	007E
'	apostrophe	14	0027
"	quotation marks	15	0022
(	left parenthesis	16	0028
)	right parenthesis	17	0029
[	left bracket	18	005B
]	right bracket	19	005D
{	left brace	20	007B
}	right brace	21	007D
@	at symbol	22	0040
$	dollar sign	23	0024
*	asterisk	24	002A
\	backslash	25	005C
&	ampersand	26	0026
#	number sign or hash sign	27	0023
%	percent sign	28	0025
+	plus sign	29	002B
<	less than sign	30	003C
=	equals sign	31	003D
>	greater than sign	32	003E
\|	vertical line	33	007C
0..9	numbers 1 through 9	34-43	0031-0039
a, A	letter a, small and capital	44	0061, 0041
b, B	letter b, small and capital	45	0062, 0042
c, C	letter c, small and capital	46	0063, 0043
d, D	letter d, small and capital	47	0064, 0044
e, E	letter e, small and capital	48	0065, 0045
f, F	letter f, small and capital	49	0066, 0046
g, G	letter g, small and capital	50	0067, 0047
h, H	letter h, small and capital	51	0068, 0048
I, I	letter I, small and capital	52	0069, 0049
j, J	letter j, small and capital	53	006A, 004A
k, K	letter k, small and capital	54	006B, 004B
l, L	letter l, small and capital	55	006C, 004C
m, M	letter m, small and capital	56	006D, 004D
n, N	letter n, small and capital	57	006E, 004E
o, O	letter o, small and capital	58	006F, 004F
p, P	letter p, small and capital	59	0070, 0050
q, Q	letter q, small and capital	60	0071, 0051
r, R	letter r, small and capital	61	0072, 0052
s, S	letter s, small and capital	62	0073, 0053
t, T	letter t, small and capital	63	0074, 0054
u, U	letter u, small and capital	64	0075, 0055
v, V	letter v, small and capital	65	0076, 0056
w, W	letter w, small and capital	66	0077, 0057
x, X	letter x, small and capital	67	0078, 0058
y, Y	letter y, small and capital	68	0079, 0059
z, Z	letter z, small and capital	69	007A, 005A

Corticon Rule Language

Character precedence in Unicode and Java Collator

Table of Contents

Character precedence in Unicode and Java Collator