+ upd: Oniguruma current DEV (2020-03-16)

This commit is contained in:
RaiKoHoff 2020-03-16 10:06:25 +01:00
parent 2e4bed6668
commit 42263fc246
11 changed files with 9437 additions and 8131 deletions

578
Build/Docs/Oniguruma_RE.txt Normal file
View File

@ -0,0 +1,578 @@
Oniguruma Regular Expressions Version 6.9.5 2020/01/28
syntax: ONIG_SYNTAX_ONIGURUMA (default)
1. Syntax elements
\ escape (enable or disable meta character)
| alternation
(...) group
[...] character class
2. Characters
\t horizontal tab (0x09)
\v vertical tab (0x0B)
\n newline (line feed) (0x0A)
\r carriage return (0x0D)
\b backspace (0x08)
\f form feed (0x0C)
\a bell (0x07)
\e escape (0x1B)
\nnn octal char (encoded byte value)
\o{17777777777} wide octal char (character code point value)
\uHHHH wide hexadecimal char (character code point value)
\xHH hexadecimal char (encoded byte value)
\x{7HHHHHHH} wide hexadecimal char (character code point value)
\cx control char (character code point value)
\C-x control char (character code point value)
\M-x meta (x|0x80) (character code point value)
\M-\C-x meta control char (character code point value)
(* \b as backspace is effective in character class only)
3. Character types
. any character (except newline)
\w word character
Not Unicode:
alphanumeric, "_" and multibyte char.
Unicode:
General_Category -- (Letter|Mark|Number|Connector_Punctuation)
\W non-word char
\s whitespace char
Not Unicode:
\t, \n, \v, \f, \r, \x20
Unicode case:
U+0009, U+000A, U+000B, U+000C, U+000D, U+0085(NEL),
General_Category -- Line_Separator
-- Paragraph_Separator
-- Space_Separator
\S non-whitespace char
\d decimal digit char
Unicode: General_Category -- Decimal_Number
\D non-decimal-digit char
\h hexadecimal digit char [0-9a-fA-F]
\H non-hexdigit char
\R general newline (* can't be used in character-class)
"\r\n" or \n,\v,\f,\r (* but doesn't backtrack from \r\n to \r)
Unicode case:
"\r\n" or \n,\v,\f,\r or U+0085, U+2028, U+2029
\N negative newline (?-m:.)
\O true anychar (?m:.) (* original function)
\X Text Segment \X === (?>\O(?:\Y\O)*)
The meaning of this operator changes depending on the setting of
the option (?y{..}).
\X doesn't check whether matching start position is boundary or not.
Please write as \y\X if you want to ensure it.
[Extended Grapheme Cluster mode] (default)
Unicode case:
See [Unicode Standard Annex #29: http://unicode.org/reports/tr29/]
Not Unicode case: \X === (?>\r\n|\O)
[Word mode]
Currently, this mode is supported in Unicode only.
See [Unicode Standard Annex #29: http://unicode.org/reports/tr29/]
Character Property
* \p{property-name}
* \p{^property-name} (negative)
* \P{property-name} (negative)
property-name:
+ works on all encodings
Alnum, Alpha, Blank, Cntrl, Digit, Graph, Lower,
Print, Punct, Space, Upper, XDigit, Word, ASCII
+ works on EUC_JP, Shift_JIS
Hiragana, Katakana
+ works on UTF8, UTF16, UTF32
See doc/UNICODE_PROPERTIES.
4. Quantifier
greedy
? 1 or 0 times
* 0 or more times
+ 1 or more times
{n,m} (n <= m) at least n but no more than m times
{n,} at least n times
{,n} at least 0 but no more than n times ({0,n})
{n} n times
reluctant
?? 0 or 1 times
*? 0 or more times
+? 1 or more times
{n,m}? (n <= m) at least n but not more than m times
{n,}? at least n times
{,n}? at least 0 but not more than n times (== {0,n}?)
{n}? is reluctant operator in ONIG_SYNTAX_JAVA and ONIG_SYNTAX_PERL only.
(In that case, it doesn't make sense to write so.)
In default syntax, /a{n}?/ === /(?:a{n})?/
possessive (greedy and does not backtrack once match)
?+ 1 or 0 times
*+ 0 or more times
++ 1 or more times
{n,m} (n > m) at least m but not more than n times
{n,m}+, {n,}+, {n}+ are possessive operators in ONIG_SYNTAX_JAVA and
ONIG_SYNTAX_PERL only.
ex. /a*+/ === /(?>a*)/
5. Anchors
^ beginning of the line
$ end of the line
\b word boundary
\B non-word boundary
\A beginning of string
\Z end of string, or before newline at the end
\z end of string
\G where the current search attempt begins
\K keep (keep start position of the result string)
\y Text Segment boundary
\Y Text Segment non-boundary
The meaning of these operators(\y, \Y) changes depending on the setting
of the option (?y{..}).
[Extended Grapheme Cluster mode] (default)
Unicode case:
See [Unicode Standard Annex #29: http://unicode.org/reports/tr29/]
Not Unicode:
All positions except between \r and \n.
[Word mode]
Currently, this mode is supported in Unicode only.
See [Unicode Standard Annex #29: http://unicode.org/reports/tr29/]
6. Character class
^... negative class (lowest precedence)
x-y range from x to y
[...] set (character class in character class)
..&&.. intersection (low precedence, only higher than ^)
ex. [a-w&&[^c-g]z] ==> ([a-w] AND ([^c-g] OR z)) ==> [abh-w]
* If you want to use '[', '-', or ']' as a normal character
in character class, you should escape them with '\'.
POSIX bracket ([:xxxxx:], negate [:^xxxxx:])
Not Unicode Case:
alnum alphabet or digit char
alpha alphabet
ascii code value: [0 - 127]
blank \t, \x20
cntrl
digit 0-9
graph include all of multibyte encoded characters
lower
print include all of multibyte encoded characters
punct
space \t, \n, \v, \f, \r, \x20
upper
xdigit 0-9, a-f, A-F
word alphanumeric, "_" and multibyte characters
Unicode Case:
alnum Letter | Mark | Decimal_Number
alpha Letter | Mark
ascii 0000 - 007F
blank Space_Separator | 0009
cntrl Control | Format | Unassigned | Private_Use | Surrogate
digit Decimal_Number
graph [[:^space:]] && ^Control && ^Unassigned && ^Surrogate
lower Lowercase_Letter
print [[:graph:]] | [[:space:]]
punct Connector_Punctuation | Dash_Punctuation | Close_Punctuation |
Final_Punctuation | Initial_Punctuation | Other_Punctuation |
Open_Punctuation
space Space_Separator | Line_Separator | Paragraph_Separator |
U+0009 | U+000A | U+000B | U+000C | U+000D | U+0085
upper Uppercase_Letter
xdigit U+0030 - U+0039 | U+0041 - U+0046 | U+0061 - U+0066
(0-9, a-f, A-F)
word Letter | Mark | Decimal_Number | Connector_Punctuation
7. Extended groups
(?#...) comment
(?imxWDSPy-imxWDSP:subexp) option on/off for subexp
i: ignore case
m: multi-line (dot (.) also matches newline)
x: extended form
W: ASCII only word (\w, \p{Word}, [[:word:]])
ASCII only word bound (\b)
D: ASCII only digit (\d, \p{Digit}, [[:digit:]])
S: ASCII only space (\s, \p{Space}, [[:space:]])
P: ASCII only POSIX properties (includes W,D,S)
(alnum, alpha, blank, cntrl, digit, graph,
lower, print, punct, space, upper, xdigit, word)
y{?}: Text Segment mode
This option changes the meaning of \X, \y, \Y.
Currently, this option is supported in Unicode only.
y{g}: Extended Grapheme Cluster mode (default)
y{w}: Word mode
See [Unicode Standard Annex #29]
(?imxWDSPy-imxWDSP) isolated option
* It makes a group to the next ')' or end of the pattern.
/ab(?i)c|def|gh/ == /ab(?i:c|def|gh)/
(?:subexp) non-capturing group
(subexp) capturing group
(?=subexp) look-ahead
(?!subexp) negative look-ahead
(?<=subexp) look-behind
(?<!subexp) negative look-behind
* Cannot use Absent stopper (?~|expr) and Range clear
(?~|) operators in look-behind and negative look-behind.
* In look-behind and negative look-behind, support for
ignore-case option is limited. Only supports conversion
between single characters. (Does not support conversion
of multiple characters in Unicode)
(?>subexp) atomic group
no backtracks in subexp.
(?<name>subexp), (?'name'subexp)
define named group
(Each character of the name must be a word character.)
Not only a name but a number is assigned like a capturing
group.
Assigning the same name to two or more subexps is allowed.
<Callouts>
* Callouts of contents
(?{...contents...}) callout in progress
(?{...contents...}D) D is a direction flag char
D = 'X': in progress and retraction
'<': in retraction only
'>': in progress only
(?{...contents...}[tag]) tag assigned
(?{...contents...}[tag]D)
* Escape characters have no effects in contents.
* contents is not allowed to start with '{'.
(?{{{...contents...}}}) n times continuations '}' in contents is allowed in
(n+1) times continuations {{{...}}}.
Allowed tag string characters: _ A-Z a-z 0-9 (* first character: _ A-Z a-z)
* Callouts of name
(*name)
(*name{args...}) with args
(*name[tag]) tag assigned
(*name[tag]{args...})
Allowed name string characters: _ A-Z a-z 0-9 (* first character: _ A-Z a-z)
Allowed tag string characters: _ A-Z a-z 0-9 (* first character: _ A-Z a-z)
<Absent functions>
(?~absent) Absent repeater (* proposed by Tanaka Akira)
This works like .* (more precisely \O*), but it is
limited by the range that does not include the string
match with <absent>.
This is a written abbreviation of (?~|(?:absent)|\O*).
\O* is used as a repeater.
(?~|absent|exp) Absent expression (* original)
This works like "exp", but it is limited by the range
that does not include the string match with <absent>.
ex. (?~|345|\d*) "12345678" ==> "12", "1", ""
(?~|absent) Absent stopper (* original)
After passed this operator, string right range is limited
at the point that does not include the string match whth
<absent>.
(?~|) Range clear
Clear the effects caused by Absent stoppers.
* Nested Absent functions are not supported and the behavior
is undefined.
<if-then-else>
(?(condition_exp)then_exp|else_exp) if-then-else
(?(condition_exp)then_exp) if-then
condition_exp can be a backreference number/name or a normal
regular expression.
When condition_exp is a backreference number/name, both then_exp and
else_exp can be omitted.
Then it works as a backreference validity checker.
[ Backreference validity checker ] (* original)
(?(n)), (?(-n)), (?(+n)), (?(n+level)) ...
(?(<n>)), (?('-n')), (?(<+n>)) ...
(?(<name>)), (?('name')), (?(<name+level>)) ...
8. Backreferences
When we say "backreference a group," it actually means, "re-match the same
text matched by the subexp in that group."
\n \k<n> \k'n' (n >= 1) backreference the nth group in the regexp
\k<-n> \k'-n' (n >= 1) backreference the nth group counting
backwards from the referring position
\k<+n> \k'+n' (n >= 1) backreference the nth group counting
forwards from the referring position
\k<name> \k'name' backreference a group with the specified name
When backreferencing with a name that is assigned to more than one groups,
the last group with the name is checked first, if not matched then the
previous one with the name, and so on, until there is a match.
* Backreference by number is forbidden if any named group is defined and
ONIG_OPTION_CAPTURE_GROUP is not set.
backreference with recursion level
(n >= 1, level >= 0)
\k<n+level> \k'n+level'
\k<n-level> \k'n-level'
\k<name+level> \k'name+level'
\k<name-level> \k'name-level'
Destine a group on the recursion level relative to the referring position.
ex 1.
/\A(?<a>|.|(?:(?<b>.)\g<a>\k<b>))\z/.match("reee")
/\A(?<a>|.|(?:(?<b>.)\g<a>\k<b+0>))\z/.match("reer")
\k<b+0> refers to the (?<b>.) on the same recursion level with it.
ex 2.
r = Regexp.compile(<<'__REGEXP__'.strip, Regexp::EXTENDED)
(?<element> \g<stag> \g<content>* \g<etag> ){0}
(?<stag> < \g<name> \s* > ){0}
(?<name> [a-zA-Z_:]+ ){0}
(?<content> [^<&]+ (\g<element> | [^<&]+)* ){0}
(?<etag> </ \k<name+1> >){0}
\g<element>
__REGEXP__
p r.match("<foo>f<bar>bbb</bar>f</foo>").captures
9. Subexp calls ("Tanaka Akira special") (* original function)
When we say "call a group," it actually means, "re-execute the subexp in
that group."
\g<n> \g'n' (n >= 1) call the nth group
\g<0> \g'0' call zero (call the total regexp)
\g<-n> \g'-n' (n >= 1) call the nth group counting backwards from
the calling position
\g<+n> \g'+n' (n >= 1) call the nth group counting forwards from
the calling position
\g<name> \g'name' call the group with the specified name
* Left-most recursive calls are not allowed.
ex. (?<name>a|\g<name>b) => error
(?<name>a|b\g<name>c) => OK
* Calls with a name that is assigned to more than one groups are not
allowed.
* Call by number is forbidden if any named group is defined and
ONIG_OPTION_CAPTURE_GROUP is not set.
* The option status of the called group is always effective.
ex. /(?-i:\g<name>)(?i:(?<name>a)){0}/.match("A")
10. Captured group
Behavior of an unnamed group (...) changes with the following conditions.
(But named group is not changed.)
case 1. /.../ (named group is not used, no option)
(...) is treated as a capturing group.
case 2. /.../g (named group is not used, 'g' option)
(...) is treated as a non-capturing group (?:...).
case 3. /..(?<name>..)../ (named group is used, no option)
(...) is treated as a non-capturing group.
numbered-backref/call is not allowed.
case 4. /..(?<name>..)../G (named group is used, 'G' option)
(...) is treated as a capturing group.
numbered-backref/call is allowed.
where
g: ONIG_OPTION_DONT_CAPTURE_GROUP
G: ONIG_OPTION_CAPTURE_GROUP
('g' and 'G' options are argued in ruby-dev ML)
-----------------------------
A-1. Syntax-dependent options
+ ONIG_SYNTAX_ONIGURUMA
(?m): dot (.) also matches newline
+ ONIG_SYNTAX_PERL and ONIG_SYNTAX_JAVA
(?s): dot (.) also matches newline
(?m): ^ matches after newline, $ matches before newline
A-2. Original extensions
+ hexadecimal digit char type \h, \H
+ true anychar \O
+ text segment boundary \y, \Y
+ backreference validity checker (?(...))
+ named group (?<name>...), (?'name'...)
+ named backref \k<name>
+ subexp call \g<name>, \g<group-num>
+ absent expression (?~|...|...)
+ absent stopper (?|...)
A-3. Missing features compared with perl 5.8.0
+ \N{name}
+ \l,\u,\L,\U,\C
+ (??{code})
* \Q...\E
This is effective on ONIG_SYNTAX_PERL and ONIG_SYNTAX_JAVA.
A-4. Differences with Japanized GNU regex(version 0.12) of Ruby 1.8
+ add character property (\p{property}, \P{property})
+ add hexadecimal digit char type (\h, \H)
+ add look-behind
(?<=fixed-width-pattern), (?<!fixed-width-pattern)
+ add possessive quantifier. ?+, *+, ++
+ add operations in character class. [], &&
('[' must be escaped as an usual char in character class.)
+ add named group and subexp call.
+ octal or hexadecimal number sequence can be treated as
a multibyte code char in character class if multibyte encoding
is specified.
(ex. [\xa1\xa2], [\xa1\xa7-\xa4\xa1])
+ allow the range of single byte char and multibyte char in character
class.
ex. /[a-<<any EUC-JP character>>]/ in EUC-JP encoding.
+ effect range of isolated option is to next ')'.
ex. (?:(?i)a|b) is interpreted as (?:(?i:a|b)), not (?:(?i:a)|b).
+ isolated option is not transparent to previous pattern.
ex. a(?i)* is a syntax error pattern.
+ allowed unpaired left brace as a normal character.
ex. /{/, /({)/, /a{2,3/ etc...
+ negative POSIX bracket [:^xxxx:] is supported.
+ POSIX bracket [:ascii:] is added.
+ repeat of look-ahead is not allowed.
ex. /(?=a)*/, /(?!b){5}/
+ Ignore case option is effective to escape sequence.
ex. /\x61/i =~ "A"
+ In the range quantifier, the number of the minimum is optional.
/a{,n}/ == /a{0,n}/
The omission of both minimum and maximum values is not allowed.
/a{,}/
+ /{n}?/ is not a reluctant quantifier.
/a{n}?/ == /(?:a{n})?/
+ invalid back reference is checked and raises error.
/\1/, /(a)\2/
+ Zero-width match in an infinite loop stops the repeat,
then changes of the capture group status are checked as stop condition.
/(?:()|())*\1\2/ =~ ""
/(?:\1a|())*/ =~ "a"
// END

View File

@ -30,6 +30,7 @@ Supported character encodings:
Master branch
-------------
* Update Unicode version 13.0.0
* NEW API: retry limit in search functions
* Limit on maximum nesting level of subexp call (16)
* Fixed behavior of isolated options in Perl and Java syntaxes. /...(?i).../

File diff suppressed because it is too large Load Diff

View File

@ -308,10 +308,15 @@ op2name(int opcode)
return "";
}
static void
p_after_op(FILE* f)
{
fputs(" ", f);
}
static void
p_string(FILE* f, int len, UChar* s)
{
fputs(":", f);
while (len-- > 0) { fputc(*s++, f); }
}
@ -320,16 +325,27 @@ p_len_string(FILE* f, LengthType len, int mb_len, UChar* s)
{
int x = len * mb_len;
fprintf(f, ":%d:", len);
fprintf(f, "len:%d ", len);
while (x-- > 0) { fputc(*s++, f); }
}
static void
p_rel_addr(FILE* f, RelAddrType rel_addr, Operation* p, Operation* start)
{
RelAddrType curr = (RelAddrType )(p - start);
char* flag;
char* space1;
char* space2;
RelAddrType curr;
AbsAddrType abs_addr;
fprintf(f, "{%d/%d}", rel_addr, curr + rel_addr);
curr = (RelAddrType )(p - start);
abs_addr = curr + rel_addr;
flag = rel_addr < 0 ? "" : "+";
space1 = rel_addr < 10 ? " " : "";
space2 = abs_addr < 10 ? " " : "";
fprintf(f, "%s%s%d => %s%d", space1, flag, rel_addr, space2, abs_addr);
}
static int
@ -356,6 +372,21 @@ static void
print_compiled_byte_code(FILE* f, regex_t* reg, int index,
Operation* start, OnigEncoding enc)
{
static char* SaveTypeNames[] = {
"KEEP",
"S",
"RIGHT_RANGE"
};
static char* UpdateVarTypeNames[] = {
"KEEP_FROM_STACK_LAST",
"S_FROM_STACK",
"RIGHT_RANGE_FROM_STACK",
"RIGHT_RANGE_FROM_S_STACK",
"RIGHT_RANGE_TO_S",
"RIGHT_RANGE_INIT"
};
int i, n;
RelAddrType addr;
LengthType len;
@ -371,6 +402,8 @@ print_compiled_byte_code(FILE* f, regex_t* reg, int index,
opcode = GET_OPCODE(reg, index);
fprintf(f, "%s", op2name(opcode));
p_after_op(f);
switch (opcode) {
case OP_STR_1:
p_string(f, 1, p->exact.s); break;
@ -404,7 +437,7 @@ print_compiled_byte_code(FILE* f, regex_t* reg, int index,
mb_len = p->exact_len_n.len;
len = p->exact_len_n.n;
q = p->exact_len_n.s;
fprintf(f, ":%d:%d:", mb_len, len);
fprintf(f, "mblen:%d len:%d ", mb_len, len);
n = len * mb_len;
while (n-- > 0) { fputc(*q++, f); }
}
@ -413,7 +446,7 @@ print_compiled_byte_code(FILE* f, regex_t* reg, int index,
case OP_CCLASS:
case OP_CCLASS_NOT:
n = bitset_on_num(p->cclass.bsp);
fprintf(f, ":%d", n);
fprintf(f, "n:%d", n);
break;
case OP_CCLASS_MB:
case OP_CCLASS_MB_NOT:
@ -425,7 +458,7 @@ print_compiled_byte_code(FILE* f, regex_t* reg, int index,
GET_CODE_POINT(ncode, codes);
codes++;
GET_CODE_POINT(code, codes);
fprintf(f, ":%d:0x%x", ncode, code);
fprintf(f, "n:%d code:0x%x", ncode, code);
}
break;
case OP_CCLASS_MIX:
@ -440,7 +473,7 @@ print_compiled_byte_code(FILE* f, regex_t* reg, int index,
GET_CODE_POINT(ncode, codes);
codes++;
GET_CODE_POINT(code, codes);
fprintf(f, ":%d:%u:%u", n, code, ncode);
fprintf(f, "nsg:%d code:%u nmb:%u", n, code, ncode);
}
break;
@ -454,19 +487,19 @@ print_compiled_byte_code(FILE* f, regex_t* reg, int index,
case OP_WORD_BEGIN:
case OP_WORD_END:
mode = p->word_boundary.mode;
fprintf(f, ":%d", mode);
fprintf(f, "mode:%d", mode);
break;
case OP_BACKREF_N:
case OP_BACKREF_N_IC:
mem = p->backref_n.n1;
fprintf(f, ":%d", mem);
fprintf(f, "n:%d", mem);
break;
case OP_BACKREF_MULTI_IC:
case OP_BACKREF_MULTI:
case OP_BACKREF_CHECK:
fputs(" ", f);
n = p->backref_general.num;
fprintf(f, "n:%d ", n);
for (i = 0; i < n; i++) {
mem = (n == 1) ? p->backref_general.n1 : p->backref_general.ns[i];
if (i > 0) fputs(", ", f);
@ -480,8 +513,7 @@ print_compiled_byte_code(FILE* f, regex_t* reg, int index,
LengthType level;
level = p->backref_general.nest_level;
fprintf(f, ":%d", level);
fputs(" ", f);
fprintf(f, "level:%d ", level);
n = p->backref_general.num;
for (i = 0; i < n; i++) {
mem = (n == 1) ? p->backref_general.n1 : p->backref_general.ns[i];
@ -494,7 +526,7 @@ print_compiled_byte_code(FILE* f, regex_t* reg, int index,
case OP_MEM_START:
case OP_MEM_START_PUSH:
mem = p->memory_start.num;
fprintf(f, ":%d", mem);
fprintf(f, "mem:%d", mem);
break;
case OP_MEM_END:
@ -504,35 +536,33 @@ print_compiled_byte_code(FILE* f, regex_t* reg, int index,
case OP_MEM_END_PUSH_REC:
#endif
mem = p->memory_end.num;
fprintf(f, ":%d", mem);
fprintf(f, "mem:%d", mem);
break;
case OP_JUMP:
addr = p->jump.addr;
fputc(':', f);
p_rel_addr(f, addr, p, start);
break;
case OP_PUSH:
case OP_PUSH_SUPER:
addr = p->push.addr;
fputc(':', f);
p_rel_addr(f, addr, p, start);
break;
#ifdef USE_OP_PUSH_OR_JUMP_EXACT
case OP_PUSH_OR_JUMP_EXACT1:
addr = p->push_or_jump_exact1.addr;
fputc(':', f);
p_rel_addr(f, addr, p, start);
fprintf(f, " c:");
p_string(f, 1, &(p->push_or_jump_exact1.c));
break;
#endif
case OP_PUSH_IF_PEEK_NEXT:
addr = p->push_if_peek_next.addr;
fputc(':', f);
p_rel_addr(f, addr, p, start);
fprintf(f, " c:");
p_string(f, 1, &(p->push_if_peek_next.c));
break;
@ -540,19 +570,19 @@ print_compiled_byte_code(FILE* f, regex_t* reg, int index,
case OP_REPEAT_NG:
mem = p->repeat.id;
addr = p->repeat.addr;
fprintf(f, ":%d:", mem);
fprintf(f, "id:%d ", mem);
p_rel_addr(f, addr, p, start);
break;
case OP_REPEAT_INC:
case OP_REPEAT_INC_NG:
mem = p->repeat.id;
fprintf(f, ":%d", mem);
fprintf(f, "id:%d", mem);
break;
case OP_EMPTY_CHECK_START:
mem = p->empty_check_start.mem;
fprintf(f, ":%d", mem);
fprintf(f, "id:%d", mem);
break;
case OP_EMPTY_CHECK_END:
case OP_EMPTY_CHECK_END_MEMST:
@ -560,23 +590,23 @@ print_compiled_byte_code(FILE* f, regex_t* reg, int index,
case OP_EMPTY_CHECK_END_MEMST_PUSH:
#endif
mem = p->empty_check_end.mem;
fprintf(f, ":%d", mem);
fprintf(f, "id:%d", mem);
break;
#ifdef USE_CALL
case OP_CALL:
addr = p->call.addr;
fprintf(f, ":{/%d}", addr);
fprintf(f, "=> %d", addr);
break;
#endif
case OP_MOVE:
fprintf(f, ":%d", p->move.n);
fprintf(f, "n:%d", p->move.n);
break;
case OP_STEP_BACK_START:
addr = p->step_back_start.addr;
fprintf(f, ":%d:%d:",
fprintf(f, "init:%d rem:%d ",
p->step_back_start.initial,
p->step_back_start.remaining);
p_rel_addr(f, addr, p, start);
@ -584,7 +614,7 @@ print_compiled_byte_code(FILE* f, regex_t* reg, int index,
case OP_POP_TO_MARK:
mem = p->pop_to_mark.id;
fprintf(f, ":%d", mem);
fprintf(f, "id:%d", mem);
break;
case OP_CUT_TO_MARK:
@ -593,7 +623,7 @@ print_compiled_byte_code(FILE* f, regex_t* reg, int index,
mem = p->cut_to_mark.id;
restore = p->cut_to_mark.restore_pos;
fprintf(f, ":%d:%d", mem, restore);
fprintf(f, "id:%d restore:%d", mem, restore);
}
break;
@ -603,7 +633,7 @@ print_compiled_byte_code(FILE* f, regex_t* reg, int index,
mem = p->mark.id;
save = p->mark.save_pos;
fprintf(f, ":%d:%d", mem, save);
fprintf(f, "id:%d save:%d", mem, save);
}
break;
@ -613,7 +643,7 @@ print_compiled_byte_code(FILE* f, regex_t* reg, int index,
type = p->save_val.type;
mem = p->save_val.id;
fprintf(f, ":%d:%d", type, mem);
fprintf(f, "%s id:%d", SaveTypeNames[type], mem);
}
break;
@ -625,17 +655,17 @@ print_compiled_byte_code(FILE* f, regex_t* reg, int index,
type = p->update_var.type;
mem = p->update_var.id;
clear = p->update_var.clear;
fprintf(f, ":%d:%d", type, mem);
fprintf(f, "%s id:%d", UpdateVarTypeNames[type], mem);
if (type == UPDATE_VAR_RIGHT_RANGE_FROM_S_STACK ||
type == UPDATE_VAR_RIGHT_RANGE_FROM_STACK)
fprintf(f, ":%d", clear);
fprintf(f, " clear:%d", clear);
}
break;
#ifdef USE_CALLOUT
case OP_CALLOUT_CONTENTS:
mem = p->callout_contents.num;
fprintf(f, ":%d", mem);
fprintf(f, "num:%d", mem);
break;
case OP_CALLOUT_NAME:
@ -644,22 +674,22 @@ print_compiled_byte_code(FILE* f, regex_t* reg, int index,
id = p->callout_name.id;
mem = p->callout_name.num;
fprintf(f, ":%d:%d", id, mem);
fprintf(f, "id:%d num:%d", id, mem);
}
break;
#endif
case OP_TEXT_SEGMENT_BOUNDARY:
if (p->text_segment_boundary.not != 0)
fprintf(f, ":not");
fprintf(f, " not");
break;
case OP_CHECK_POSITION:
switch (p->check_position.type) {
case CHECK_POSITION_SEARCH_START:
fprintf(f, ":search-start"); break;
fprintf(f, "search-start"); break;
case CHECK_POSITION_CURRENT_RIGHT_RANGE:
fprintf(f, ":current-right-range"); break;
fprintf(f, "current-right-range"); break;
default:
break;
};

View File

@ -25,7 +25,7 @@
* SUCH DAMAGE.
*/
#define GRAPHEME_BREAK_PROPERTY_VERSION 120100
#define GRAPHEME_BREAK_PROPERTY_VERSION 130000
/*
CR
@ -43,7 +43,7 @@ V
ZWJ
*/
static int EGCB_RANGE_NUM = 1326;
static int EGCB_RANGE_NUM = 1344;
static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x000000, 0x000009, EGCB_Control },
{0x00000a, 0x00000a, EGCB_LF },
@ -136,7 +136,7 @@ static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x000b47, 0x000b48, EGCB_SpacingMark },
{0x000b4b, 0x000b4c, EGCB_SpacingMark },
{0x000b4d, 0x000b4d, EGCB_Extend },
{0x000b56, 0x000b57, EGCB_Extend },
{0x000b55, 0x000b57, EGCB_Extend },
{0x000b62, 0x000b63, EGCB_Extend },
{0x000b82, 0x000b82, EGCB_Extend },
{0x000bbe, 0x000bbe, EGCB_Extend },
@ -182,6 +182,7 @@ static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x000d4e, 0x000d4e, EGCB_Prepend },
{0x000d57, 0x000d57, EGCB_Extend },
{0x000d62, 0x000d63, EGCB_Extend },
{0x000d81, 0x000d81, EGCB_Extend },
{0x000d82, 0x000d83, EGCB_SpacingMark },
{0x000dca, 0x000dca, EGCB_Extend },
{0x000dcf, 0x000dcf, EGCB_Extend },
@ -267,7 +268,7 @@ static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x001a6d, 0x001a72, EGCB_SpacingMark },
{0x001a73, 0x001a7c, EGCB_Extend },
{0x001a7f, 0x001a7f, EGCB_Extend },
{0x001ab0, 0x001abe, EGCB_Extend },
{0x001ab0, 0x001ac0, EGCB_Extend },
{0x001b00, 0x001b03, EGCB_Extend },
{0x001b04, 0x001b04, EGCB_SpacingMark },
{0x001b34, 0x001b3a, EGCB_Extend },
@ -329,6 +330,7 @@ static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x00a823, 0x00a824, EGCB_SpacingMark },
{0x00a825, 0x00a826, EGCB_Extend },
{0x00a827, 0x00a827, EGCB_SpacingMark },
{0x00a82c, 0x00a82c, EGCB_Extend },
{0x00a880, 0x00a881, EGCB_SpacingMark },
{0x00a8b4, 0x00a8c3, EGCB_SpacingMark },
{0x00a8c4, 0x00a8c5, EGCB_Extend },
@ -1189,6 +1191,7 @@ static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x010a3f, 0x010a3f, EGCB_Extend },
{0x010ae5, 0x010ae6, EGCB_Extend },
{0x010d24, 0x010d27, EGCB_Extend },
{0x010eab, 0x010eac, EGCB_Extend },
{0x010f46, 0x010f50, EGCB_Extend },
{0x011000, 0x011000, EGCB_SpacingMark },
{0x011001, 0x011001, EGCB_Extend },
@ -1215,6 +1218,8 @@ static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x0111bf, 0x0111c0, EGCB_SpacingMark },
{0x0111c2, 0x0111c3, EGCB_Prepend },
{0x0111c9, 0x0111cc, EGCB_Extend },
{0x0111ce, 0x0111ce, EGCB_SpacingMark },
{0x0111cf, 0x0111cf, EGCB_Extend },
{0x01122c, 0x01122e, EGCB_SpacingMark },
{0x01122f, 0x011231, EGCB_Extend },
{0x011232, 0x011233, EGCB_SpacingMark },
@ -1286,6 +1291,17 @@ static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x01182f, 0x011837, EGCB_Extend },
{0x011838, 0x011838, EGCB_SpacingMark },
{0x011839, 0x01183a, EGCB_Extend },
{0x011930, 0x011930, EGCB_Extend },
{0x011931, 0x011935, EGCB_SpacingMark },
{0x011937, 0x011938, EGCB_SpacingMark },
{0x01193b, 0x01193c, EGCB_Extend },
{0x01193d, 0x01193d, EGCB_SpacingMark },
{0x01193e, 0x01193e, EGCB_Extend },
{0x01193f, 0x01193f, EGCB_Prepend },
{0x011940, 0x011940, EGCB_SpacingMark },
{0x011941, 0x011941, EGCB_Prepend },
{0x011942, 0x011942, EGCB_SpacingMark },
{0x011943, 0x011943, EGCB_Extend },
{0x0119d1, 0x0119d3, EGCB_SpacingMark },
{0x0119d4, 0x0119d7, EGCB_Extend },
{0x0119da, 0x0119db, EGCB_Extend },
@ -1337,6 +1353,8 @@ static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x016f4f, 0x016f4f, EGCB_Extend },
{0x016f51, 0x016f87, EGCB_SpacingMark },
{0x016f8f, 0x016f92, EGCB_Extend },
{0x016fe4, 0x016fe4, EGCB_Extend },
{0x016ff0, 0x016ff1, EGCB_SpacingMark },
{0x01bc9d, 0x01bc9e, EGCB_Extend },
{0x01bca0, 0x01bca3, EGCB_Control },
{0x01d165, 0x01d165, EGCB_Extend },

File diff suppressed because it is too large Load Diff

View File

@ -28,7 +28,7 @@
#include "regenc.h"
#define UNICODE_CASEFOLD_VERSION 120100
#define UNICODE_CASEFOLD_VERSION 130000
OnigCodePoint OnigUnicodeFolds1[] = {
@ -1132,262 +1132,265 @@ OnigCodePoint OnigUnicodeFolds1[] = {
/*3321*/ 0xa7bd, 1, 0xa7bc, /* LATIN CAPITAL LETTER GLOTTAL I */
/*3324*/ 0xa7bf, 1, 0xa7be, /* LATIN CAPITAL LETTER GLOTTAL U */
/*3327*/ 0xa7c3, 1, 0xa7c2, /* LATIN CAPITAL LETTER ANGLICANA W */
/*3330*/ 0xab53, 1, 0xa7b3, /* LATIN CAPITAL LETTER CHI */
/*3333*/ 0xff41, 1, 0xff21, /* FULLWIDTH LATIN CAPITAL LETTER A */
/*3336*/ 0xff42, 1, 0xff22, /* FULLWIDTH LATIN CAPITAL LETTER B */
/*3339*/ 0xff43, 1, 0xff23, /* FULLWIDTH LATIN CAPITAL LETTER C */
/*3342*/ 0xff44, 1, 0xff24, /* FULLWIDTH LATIN CAPITAL LETTER D */
/*3345*/ 0xff45, 1, 0xff25, /* FULLWIDTH LATIN CAPITAL LETTER E */
/*3348*/ 0xff46, 1, 0xff26, /* FULLWIDTH LATIN CAPITAL LETTER F */
/*3351*/ 0xff47, 1, 0xff27, /* FULLWIDTH LATIN CAPITAL LETTER G */
/*3354*/ 0xff48, 1, 0xff28, /* FULLWIDTH LATIN CAPITAL LETTER H */
/*3357*/ 0xff49, 1, 0xff29, /* FULLWIDTH LATIN CAPITAL LETTER I */
/*3360*/ 0xff4a, 1, 0xff2a, /* FULLWIDTH LATIN CAPITAL LETTER J */
/*3363*/ 0xff4b, 1, 0xff2b, /* FULLWIDTH LATIN CAPITAL LETTER K */
/*3366*/ 0xff4c, 1, 0xff2c, /* FULLWIDTH LATIN CAPITAL LETTER L */
/*3369*/ 0xff4d, 1, 0xff2d, /* FULLWIDTH LATIN CAPITAL LETTER M */
/*3372*/ 0xff4e, 1, 0xff2e, /* FULLWIDTH LATIN CAPITAL LETTER N */
/*3375*/ 0xff4f, 1, 0xff2f, /* FULLWIDTH LATIN CAPITAL LETTER O */
/*3378*/ 0xff50, 1, 0xff30, /* FULLWIDTH LATIN CAPITAL LETTER P */
/*3381*/ 0xff51, 1, 0xff31, /* FULLWIDTH LATIN CAPITAL LETTER Q */
/*3384*/ 0xff52, 1, 0xff32, /* FULLWIDTH LATIN CAPITAL LETTER R */
/*3387*/ 0xff53, 1, 0xff33, /* FULLWIDTH LATIN CAPITAL LETTER S */
/*3390*/ 0xff54, 1, 0xff34, /* FULLWIDTH LATIN CAPITAL LETTER T */
/*3393*/ 0xff55, 1, 0xff35, /* FULLWIDTH LATIN CAPITAL LETTER U */
/*3396*/ 0xff56, 1, 0xff36, /* FULLWIDTH LATIN CAPITAL LETTER V */
/*3399*/ 0xff57, 1, 0xff37, /* FULLWIDTH LATIN CAPITAL LETTER W */
/*3402*/ 0xff58, 1, 0xff38, /* FULLWIDTH LATIN CAPITAL LETTER X */
/*3405*/ 0xff59, 1, 0xff39, /* FULLWIDTH LATIN CAPITAL LETTER Y */
/*3408*/ 0xff5a, 1, 0xff3a, /* FULLWIDTH LATIN CAPITAL LETTER Z */
/*3411*/ 0x010428, 1, 0x010400, /* DESERET CAPITAL LETTER LONG I */
/*3414*/ 0x010429, 1, 0x010401, /* DESERET CAPITAL LETTER LONG E */
/*3417*/ 0x01042a, 1, 0x010402, /* DESERET CAPITAL LETTER LONG A */
/*3420*/ 0x01042b, 1, 0x010403, /* DESERET CAPITAL LETTER LONG AH */
/*3423*/ 0x01042c, 1, 0x010404, /* DESERET CAPITAL LETTER LONG O */
/*3426*/ 0x01042d, 1, 0x010405, /* DESERET CAPITAL LETTER LONG OO */
/*3429*/ 0x01042e, 1, 0x010406, /* DESERET CAPITAL LETTER SHORT I */
/*3432*/ 0x01042f, 1, 0x010407, /* DESERET CAPITAL LETTER SHORT E */
/*3435*/ 0x010430, 1, 0x010408, /* DESERET CAPITAL LETTER SHORT A */
/*3438*/ 0x010431, 1, 0x010409, /* DESERET CAPITAL LETTER SHORT AH */
/*3441*/ 0x010432, 1, 0x01040a, /* DESERET CAPITAL LETTER SHORT O */
/*3444*/ 0x010433, 1, 0x01040b, /* DESERET CAPITAL LETTER SHORT OO */
/*3447*/ 0x010434, 1, 0x01040c, /* DESERET CAPITAL LETTER AY */
/*3450*/ 0x010435, 1, 0x01040d, /* DESERET CAPITAL LETTER OW */
/*3453*/ 0x010436, 1, 0x01040e, /* DESERET CAPITAL LETTER WU */
/*3456*/ 0x010437, 1, 0x01040f, /* DESERET CAPITAL LETTER YEE */
/*3459*/ 0x010438, 1, 0x010410, /* DESERET CAPITAL LETTER H */
/*3462*/ 0x010439, 1, 0x010411, /* DESERET CAPITAL LETTER PEE */
/*3465*/ 0x01043a, 1, 0x010412, /* DESERET CAPITAL LETTER BEE */
/*3468*/ 0x01043b, 1, 0x010413, /* DESERET CAPITAL LETTER TEE */
/*3471*/ 0x01043c, 1, 0x010414, /* DESERET CAPITAL LETTER DEE */
/*3474*/ 0x01043d, 1, 0x010415, /* DESERET CAPITAL LETTER CHEE */
/*3477*/ 0x01043e, 1, 0x010416, /* DESERET CAPITAL LETTER JEE */
/*3480*/ 0x01043f, 1, 0x010417, /* DESERET CAPITAL LETTER KAY */
/*3483*/ 0x010440, 1, 0x010418, /* DESERET CAPITAL LETTER GAY */
/*3486*/ 0x010441, 1, 0x010419, /* DESERET CAPITAL LETTER EF */
/*3489*/ 0x010442, 1, 0x01041a, /* DESERET CAPITAL LETTER VEE */
/*3492*/ 0x010443, 1, 0x01041b, /* DESERET CAPITAL LETTER ETH */
/*3495*/ 0x010444, 1, 0x01041c, /* DESERET CAPITAL LETTER THEE */
/*3498*/ 0x010445, 1, 0x01041d, /* DESERET CAPITAL LETTER ES */
/*3501*/ 0x010446, 1, 0x01041e, /* DESERET CAPITAL LETTER ZEE */
/*3504*/ 0x010447, 1, 0x01041f, /* DESERET CAPITAL LETTER ESH */
/*3507*/ 0x010448, 1, 0x010420, /* DESERET CAPITAL LETTER ZHEE */
/*3510*/ 0x010449, 1, 0x010421, /* DESERET CAPITAL LETTER ER */
/*3513*/ 0x01044a, 1, 0x010422, /* DESERET CAPITAL LETTER EL */
/*3516*/ 0x01044b, 1, 0x010423, /* DESERET CAPITAL LETTER EM */
/*3519*/ 0x01044c, 1, 0x010424, /* DESERET CAPITAL LETTER EN */
/*3522*/ 0x01044d, 1, 0x010425, /* DESERET CAPITAL LETTER ENG */
/*3525*/ 0x01044e, 1, 0x010426, /* DESERET CAPITAL LETTER OI */
/*3528*/ 0x01044f, 1, 0x010427, /* DESERET CAPITAL LETTER EW */
/*3531*/ 0x0104d8, 1, 0x0104b0, /* OSAGE CAPITAL LETTER A */
/*3534*/ 0x0104d9, 1, 0x0104b1, /* OSAGE CAPITAL LETTER AI */
/*3537*/ 0x0104da, 1, 0x0104b2, /* OSAGE CAPITAL LETTER AIN */
/*3540*/ 0x0104db, 1, 0x0104b3, /* OSAGE CAPITAL LETTER AH */
/*3543*/ 0x0104dc, 1, 0x0104b4, /* OSAGE CAPITAL LETTER BRA */
/*3546*/ 0x0104dd, 1, 0x0104b5, /* OSAGE CAPITAL LETTER CHA */
/*3549*/ 0x0104de, 1, 0x0104b6, /* OSAGE CAPITAL LETTER EHCHA */
/*3552*/ 0x0104df, 1, 0x0104b7, /* OSAGE CAPITAL LETTER E */
/*3555*/ 0x0104e0, 1, 0x0104b8, /* OSAGE CAPITAL LETTER EIN */
/*3558*/ 0x0104e1, 1, 0x0104b9, /* OSAGE CAPITAL LETTER HA */
/*3561*/ 0x0104e2, 1, 0x0104ba, /* OSAGE CAPITAL LETTER HYA */
/*3564*/ 0x0104e3, 1, 0x0104bb, /* OSAGE CAPITAL LETTER I */
/*3567*/ 0x0104e4, 1, 0x0104bc, /* OSAGE CAPITAL LETTER KA */
/*3570*/ 0x0104e5, 1, 0x0104bd, /* OSAGE CAPITAL LETTER EHKA */
/*3573*/ 0x0104e6, 1, 0x0104be, /* OSAGE CAPITAL LETTER KYA */
/*3576*/ 0x0104e7, 1, 0x0104bf, /* OSAGE CAPITAL LETTER LA */
/*3579*/ 0x0104e8, 1, 0x0104c0, /* OSAGE CAPITAL LETTER MA */
/*3582*/ 0x0104e9, 1, 0x0104c1, /* OSAGE CAPITAL LETTER NA */
/*3585*/ 0x0104ea, 1, 0x0104c2, /* OSAGE CAPITAL LETTER O */
/*3588*/ 0x0104eb, 1, 0x0104c3, /* OSAGE CAPITAL LETTER OIN */
/*3591*/ 0x0104ec, 1, 0x0104c4, /* OSAGE CAPITAL LETTER PA */
/*3594*/ 0x0104ed, 1, 0x0104c5, /* OSAGE CAPITAL LETTER EHPA */
/*3597*/ 0x0104ee, 1, 0x0104c6, /* OSAGE CAPITAL LETTER SA */
/*3600*/ 0x0104ef, 1, 0x0104c7, /* OSAGE CAPITAL LETTER SHA */
/*3603*/ 0x0104f0, 1, 0x0104c8, /* OSAGE CAPITAL LETTER TA */
/*3606*/ 0x0104f1, 1, 0x0104c9, /* OSAGE CAPITAL LETTER EHTA */
/*3609*/ 0x0104f2, 1, 0x0104ca, /* OSAGE CAPITAL LETTER TSA */
/*3612*/ 0x0104f3, 1, 0x0104cb, /* OSAGE CAPITAL LETTER EHTSA */
/*3615*/ 0x0104f4, 1, 0x0104cc, /* OSAGE CAPITAL LETTER TSHA */
/*3618*/ 0x0104f5, 1, 0x0104cd, /* OSAGE CAPITAL LETTER DHA */
/*3621*/ 0x0104f6, 1, 0x0104ce, /* OSAGE CAPITAL LETTER U */
/*3624*/ 0x0104f7, 1, 0x0104cf, /* OSAGE CAPITAL LETTER WA */
/*3627*/ 0x0104f8, 1, 0x0104d0, /* OSAGE CAPITAL LETTER KHA */
/*3630*/ 0x0104f9, 1, 0x0104d1, /* OSAGE CAPITAL LETTER GHA */
/*3633*/ 0x0104fa, 1, 0x0104d2, /* OSAGE CAPITAL LETTER ZA */
/*3636*/ 0x0104fb, 1, 0x0104d3, /* OSAGE CAPITAL LETTER ZHA */
/*3639*/ 0x010cc0, 1, 0x010c80, /* OLD HUNGARIAN CAPITAL LETTER A */
/*3642*/ 0x010cc1, 1, 0x010c81, /* OLD HUNGARIAN CAPITAL LETTER AA */
/*3645*/ 0x010cc2, 1, 0x010c82, /* OLD HUNGARIAN CAPITAL LETTER EB */
/*3648*/ 0x010cc3, 1, 0x010c83, /* OLD HUNGARIAN CAPITAL LETTER AMB */
/*3651*/ 0x010cc4, 1, 0x010c84, /* OLD HUNGARIAN CAPITAL LETTER EC */
/*3654*/ 0x010cc5, 1, 0x010c85, /* OLD HUNGARIAN CAPITAL LETTER ENC */
/*3657*/ 0x010cc6, 1, 0x010c86, /* OLD HUNGARIAN CAPITAL LETTER ECS */
/*3660*/ 0x010cc7, 1, 0x010c87, /* OLD HUNGARIAN CAPITAL LETTER ED */
/*3663*/ 0x010cc8, 1, 0x010c88, /* OLD HUNGARIAN CAPITAL LETTER AND */
/*3666*/ 0x010cc9, 1, 0x010c89, /* OLD HUNGARIAN CAPITAL LETTER E */
/*3669*/ 0x010cca, 1, 0x010c8a, /* OLD HUNGARIAN CAPITAL LETTER CLOS.. */
/*3672*/ 0x010ccb, 1, 0x010c8b, /* OLD HUNGARIAN CAPITAL LETTER EE */
/*3675*/ 0x010ccc, 1, 0x010c8c, /* OLD HUNGARIAN CAPITAL LETTER EF */
/*3678*/ 0x010ccd, 1, 0x010c8d, /* OLD HUNGARIAN CAPITAL LETTER EG */
/*3681*/ 0x010cce, 1, 0x010c8e, /* OLD HUNGARIAN CAPITAL LETTER EGY */
/*3684*/ 0x010ccf, 1, 0x010c8f, /* OLD HUNGARIAN CAPITAL LETTER EH */
/*3687*/ 0x010cd0, 1, 0x010c90, /* OLD HUNGARIAN CAPITAL LETTER I */
/*3690*/ 0x010cd1, 1, 0x010c91, /* OLD HUNGARIAN CAPITAL LETTER II */
/*3693*/ 0x010cd2, 1, 0x010c92, /* OLD HUNGARIAN CAPITAL LETTER EJ */
/*3696*/ 0x010cd3, 1, 0x010c93, /* OLD HUNGARIAN CAPITAL LETTER EK */
/*3699*/ 0x010cd4, 1, 0x010c94, /* OLD HUNGARIAN CAPITAL LETTER AK */
/*3702*/ 0x010cd5, 1, 0x010c95, /* OLD HUNGARIAN CAPITAL LETTER UNK */
/*3705*/ 0x010cd6, 1, 0x010c96, /* OLD HUNGARIAN CAPITAL LETTER EL */
/*3708*/ 0x010cd7, 1, 0x010c97, /* OLD HUNGARIAN CAPITAL LETTER ELY */
/*3711*/ 0x010cd8, 1, 0x010c98, /* OLD HUNGARIAN CAPITAL LETTER EM */
/*3714*/ 0x010cd9, 1, 0x010c99, /* OLD HUNGARIAN CAPITAL LETTER EN */
/*3717*/ 0x010cda, 1, 0x010c9a, /* OLD HUNGARIAN CAPITAL LETTER ENY */
/*3720*/ 0x010cdb, 1, 0x010c9b, /* OLD HUNGARIAN CAPITAL LETTER O */
/*3723*/ 0x010cdc, 1, 0x010c9c, /* OLD HUNGARIAN CAPITAL LETTER OO */
/*3726*/ 0x010cdd, 1, 0x010c9d, /* OLD HUNGARIAN CAPITAL LETTER NIKO.. */
/*3729*/ 0x010cde, 1, 0x010c9e, /* OLD HUNGARIAN CAPITAL LETTER RUDI.. */
/*3732*/ 0x010cdf, 1, 0x010c9f, /* OLD HUNGARIAN CAPITAL LETTER OEE */
/*3735*/ 0x010ce0, 1, 0x010ca0, /* OLD HUNGARIAN CAPITAL LETTER EP */
/*3738*/ 0x010ce1, 1, 0x010ca1, /* OLD HUNGARIAN CAPITAL LETTER EMP */
/*3741*/ 0x010ce2, 1, 0x010ca2, /* OLD HUNGARIAN CAPITAL LETTER ER */
/*3744*/ 0x010ce3, 1, 0x010ca3, /* OLD HUNGARIAN CAPITAL LETTER SHOR.. */
/*3747*/ 0x010ce4, 1, 0x010ca4, /* OLD HUNGARIAN CAPITAL LETTER ES */
/*3750*/ 0x010ce5, 1, 0x010ca5, /* OLD HUNGARIAN CAPITAL LETTER ESZ */
/*3753*/ 0x010ce6, 1, 0x010ca6, /* OLD HUNGARIAN CAPITAL LETTER ET */
/*3756*/ 0x010ce7, 1, 0x010ca7, /* OLD HUNGARIAN CAPITAL LETTER ENT */
/*3759*/ 0x010ce8, 1, 0x010ca8, /* OLD HUNGARIAN CAPITAL LETTER ETY */
/*3762*/ 0x010ce9, 1, 0x010ca9, /* OLD HUNGARIAN CAPITAL LETTER ECH */
/*3765*/ 0x010cea, 1, 0x010caa, /* OLD HUNGARIAN CAPITAL LETTER U */
/*3768*/ 0x010ceb, 1, 0x010cab, /* OLD HUNGARIAN CAPITAL LETTER UU */
/*3771*/ 0x010cec, 1, 0x010cac, /* OLD HUNGARIAN CAPITAL LETTER NIKO.. */
/*3774*/ 0x010ced, 1, 0x010cad, /* OLD HUNGARIAN CAPITAL LETTER RUDI.. */
/*3777*/ 0x010cee, 1, 0x010cae, /* OLD HUNGARIAN CAPITAL LETTER EV */
/*3780*/ 0x010cef, 1, 0x010caf, /* OLD HUNGARIAN CAPITAL LETTER EZ */
/*3783*/ 0x010cf0, 1, 0x010cb0, /* OLD HUNGARIAN CAPITAL LETTER EZS */
/*3786*/ 0x010cf1, 1, 0x010cb1, /* OLD HUNGARIAN CAPITAL LETTER ENT-.. */
/*3789*/ 0x010cf2, 1, 0x010cb2, /* OLD HUNGARIAN CAPITAL LETTER US */
/*3792*/ 0x0118c0, 1, 0x0118a0, /* WARANG CITI CAPITAL LETTER NGAA */
/*3795*/ 0x0118c1, 1, 0x0118a1, /* WARANG CITI CAPITAL LETTER A */
/*3798*/ 0x0118c2, 1, 0x0118a2, /* WARANG CITI CAPITAL LETTER WI */
/*3801*/ 0x0118c3, 1, 0x0118a3, /* WARANG CITI CAPITAL LETTER YU */
/*3804*/ 0x0118c4, 1, 0x0118a4, /* WARANG CITI CAPITAL LETTER YA */
/*3807*/ 0x0118c5, 1, 0x0118a5, /* WARANG CITI CAPITAL LETTER YO */
/*3810*/ 0x0118c6, 1, 0x0118a6, /* WARANG CITI CAPITAL LETTER II */
/*3813*/ 0x0118c7, 1, 0x0118a7, /* WARANG CITI CAPITAL LETTER UU */
/*3816*/ 0x0118c8, 1, 0x0118a8, /* WARANG CITI CAPITAL LETTER E */
/*3819*/ 0x0118c9, 1, 0x0118a9, /* WARANG CITI CAPITAL LETTER O */
/*3822*/ 0x0118ca, 1, 0x0118aa, /* WARANG CITI CAPITAL LETTER ANG */
/*3825*/ 0x0118cb, 1, 0x0118ab, /* WARANG CITI CAPITAL LETTER GA */
/*3828*/ 0x0118cc, 1, 0x0118ac, /* WARANG CITI CAPITAL LETTER KO */
/*3831*/ 0x0118cd, 1, 0x0118ad, /* WARANG CITI CAPITAL LETTER ENY */
/*3834*/ 0x0118ce, 1, 0x0118ae, /* WARANG CITI CAPITAL LETTER YUJ */
/*3837*/ 0x0118cf, 1, 0x0118af, /* WARANG CITI CAPITAL LETTER UC */
/*3840*/ 0x0118d0, 1, 0x0118b0, /* WARANG CITI CAPITAL LETTER ENN */
/*3843*/ 0x0118d1, 1, 0x0118b1, /* WARANG CITI CAPITAL LETTER ODD */
/*3846*/ 0x0118d2, 1, 0x0118b2, /* WARANG CITI CAPITAL LETTER TTE */
/*3849*/ 0x0118d3, 1, 0x0118b3, /* WARANG CITI CAPITAL LETTER NUNG */
/*3852*/ 0x0118d4, 1, 0x0118b4, /* WARANG CITI CAPITAL LETTER DA */
/*3855*/ 0x0118d5, 1, 0x0118b5, /* WARANG CITI CAPITAL LETTER AT */
/*3858*/ 0x0118d6, 1, 0x0118b6, /* WARANG CITI CAPITAL LETTER AM */
/*3861*/ 0x0118d7, 1, 0x0118b7, /* WARANG CITI CAPITAL LETTER BU */
/*3864*/ 0x0118d8, 1, 0x0118b8, /* WARANG CITI CAPITAL LETTER PU */
/*3867*/ 0x0118d9, 1, 0x0118b9, /* WARANG CITI CAPITAL LETTER HIYO */
/*3870*/ 0x0118da, 1, 0x0118ba, /* WARANG CITI CAPITAL LETTER HOLO */
/*3873*/ 0x0118db, 1, 0x0118bb, /* WARANG CITI CAPITAL LETTER HORR */
/*3876*/ 0x0118dc, 1, 0x0118bc, /* WARANG CITI CAPITAL LETTER HAR */
/*3879*/ 0x0118dd, 1, 0x0118bd, /* WARANG CITI CAPITAL LETTER SSUU */
/*3882*/ 0x0118de, 1, 0x0118be, /* WARANG CITI CAPITAL LETTER SII */
/*3885*/ 0x0118df, 1, 0x0118bf, /* WARANG CITI CAPITAL LETTER VIYO */
/*3888*/ 0x016e60, 1, 0x016e40, /* MEDEFAIDRIN CAPITAL LETTER M */
/*3891*/ 0x016e61, 1, 0x016e41, /* MEDEFAIDRIN CAPITAL LETTER S */
/*3894*/ 0x016e62, 1, 0x016e42, /* MEDEFAIDRIN CAPITAL LETTER V */
/*3897*/ 0x016e63, 1, 0x016e43, /* MEDEFAIDRIN CAPITAL LETTER W */
/*3900*/ 0x016e64, 1, 0x016e44, /* MEDEFAIDRIN CAPITAL LETTER ATIU */
/*3903*/ 0x016e65, 1, 0x016e45, /* MEDEFAIDRIN CAPITAL LETTER Z */
/*3906*/ 0x016e66, 1, 0x016e46, /* MEDEFAIDRIN CAPITAL LETTER KP */
/*3909*/ 0x016e67, 1, 0x016e47, /* MEDEFAIDRIN CAPITAL LETTER P */
/*3912*/ 0x016e68, 1, 0x016e48, /* MEDEFAIDRIN CAPITAL LETTER T */
/*3915*/ 0x016e69, 1, 0x016e49, /* MEDEFAIDRIN CAPITAL LETTER G */
/*3918*/ 0x016e6a, 1, 0x016e4a, /* MEDEFAIDRIN CAPITAL LETTER F */
/*3921*/ 0x016e6b, 1, 0x016e4b, /* MEDEFAIDRIN CAPITAL LETTER I */
/*3924*/ 0x016e6c, 1, 0x016e4c, /* MEDEFAIDRIN CAPITAL LETTER K */
/*3927*/ 0x016e6d, 1, 0x016e4d, /* MEDEFAIDRIN CAPITAL LETTER A */
/*3930*/ 0x016e6e, 1, 0x016e4e, /* MEDEFAIDRIN CAPITAL LETTER J */
/*3933*/ 0x016e6f, 1, 0x016e4f, /* MEDEFAIDRIN CAPITAL LETTER E */
/*3936*/ 0x016e70, 1, 0x016e50, /* MEDEFAIDRIN CAPITAL LETTER B */
/*3939*/ 0x016e71, 1, 0x016e51, /* MEDEFAIDRIN CAPITAL LETTER C */
/*3942*/ 0x016e72, 1, 0x016e52, /* MEDEFAIDRIN CAPITAL LETTER U */
/*3945*/ 0x016e73, 1, 0x016e53, /* MEDEFAIDRIN CAPITAL LETTER YU */
/*3948*/ 0x016e74, 1, 0x016e54, /* MEDEFAIDRIN CAPITAL LETTER L */
/*3951*/ 0x016e75, 1, 0x016e55, /* MEDEFAIDRIN CAPITAL LETTER Q */
/*3954*/ 0x016e76, 1, 0x016e56, /* MEDEFAIDRIN CAPITAL LETTER HP */
/*3957*/ 0x016e77, 1, 0x016e57, /* MEDEFAIDRIN CAPITAL LETTER NY */
/*3960*/ 0x016e78, 1, 0x016e58, /* MEDEFAIDRIN CAPITAL LETTER X */
/*3963*/ 0x016e79, 1, 0x016e59, /* MEDEFAIDRIN CAPITAL LETTER D */
/*3966*/ 0x016e7a, 1, 0x016e5a, /* MEDEFAIDRIN CAPITAL LETTER OE */
/*3969*/ 0x016e7b, 1, 0x016e5b, /* MEDEFAIDRIN CAPITAL LETTER N */
/*3972*/ 0x016e7c, 1, 0x016e5c, /* MEDEFAIDRIN CAPITAL LETTER R */
/*3975*/ 0x016e7d, 1, 0x016e5d, /* MEDEFAIDRIN CAPITAL LETTER O */
/*3978*/ 0x016e7e, 1, 0x016e5e, /* MEDEFAIDRIN CAPITAL LETTER AI */
/*3981*/ 0x016e7f, 1, 0x016e5f, /* MEDEFAIDRIN CAPITAL LETTER Y */
/*3984*/ 0x01e922, 1, 0x01e900, /* ADLAM CAPITAL LETTER ALIF */
/*3987*/ 0x01e923, 1, 0x01e901, /* ADLAM CAPITAL LETTER DAALI */
/*3990*/ 0x01e924, 1, 0x01e902, /* ADLAM CAPITAL LETTER LAAM */
/*3993*/ 0x01e925, 1, 0x01e903, /* ADLAM CAPITAL LETTER MIIM */
/*3996*/ 0x01e926, 1, 0x01e904, /* ADLAM CAPITAL LETTER BA */
/*3999*/ 0x01e927, 1, 0x01e905, /* ADLAM CAPITAL LETTER SINNYIIYHE */
/*4002*/ 0x01e928, 1, 0x01e906, /* ADLAM CAPITAL LETTER PE */
/*4005*/ 0x01e929, 1, 0x01e907, /* ADLAM CAPITAL LETTER BHE */
/*4008*/ 0x01e92a, 1, 0x01e908, /* ADLAM CAPITAL LETTER RA */
/*4011*/ 0x01e92b, 1, 0x01e909, /* ADLAM CAPITAL LETTER E */
/*4014*/ 0x01e92c, 1, 0x01e90a, /* ADLAM CAPITAL LETTER FA */
/*4017*/ 0x01e92d, 1, 0x01e90b, /* ADLAM CAPITAL LETTER I */
/*4020*/ 0x01e92e, 1, 0x01e90c, /* ADLAM CAPITAL LETTER O */
/*4023*/ 0x01e92f, 1, 0x01e90d, /* ADLAM CAPITAL LETTER DHA */
/*4026*/ 0x01e930, 1, 0x01e90e, /* ADLAM CAPITAL LETTER YHE */
/*4029*/ 0x01e931, 1, 0x01e90f, /* ADLAM CAPITAL LETTER WAW */
/*4032*/ 0x01e932, 1, 0x01e910, /* ADLAM CAPITAL LETTER NUN */
/*4035*/ 0x01e933, 1, 0x01e911, /* ADLAM CAPITAL LETTER KAF */
/*4038*/ 0x01e934, 1, 0x01e912, /* ADLAM CAPITAL LETTER YA */
/*4041*/ 0x01e935, 1, 0x01e913, /* ADLAM CAPITAL LETTER U */
/*4044*/ 0x01e936, 1, 0x01e914, /* ADLAM CAPITAL LETTER JIIM */
/*4047*/ 0x01e937, 1, 0x01e915, /* ADLAM CAPITAL LETTER CHI */
/*4050*/ 0x01e938, 1, 0x01e916, /* ADLAM CAPITAL LETTER HA */
/*4053*/ 0x01e939, 1, 0x01e917, /* ADLAM CAPITAL LETTER QAAF */
/*4056*/ 0x01e93a, 1, 0x01e918, /* ADLAM CAPITAL LETTER GA */
/*4059*/ 0x01e93b, 1, 0x01e919, /* ADLAM CAPITAL LETTER NYA */
/*4062*/ 0x01e93c, 1, 0x01e91a, /* ADLAM CAPITAL LETTER TU */
/*4065*/ 0x01e93d, 1, 0x01e91b, /* ADLAM CAPITAL LETTER NHA */
/*4068*/ 0x01e93e, 1, 0x01e91c, /* ADLAM CAPITAL LETTER VA */
/*4071*/ 0x01e93f, 1, 0x01e91d, /* ADLAM CAPITAL LETTER KHA */
/*4074*/ 0x01e940, 1, 0x01e91e, /* ADLAM CAPITAL LETTER GBE */
/*4077*/ 0x01e941, 1, 0x01e91f, /* ADLAM CAPITAL LETTER ZAL */
/*4080*/ 0x01e942, 1, 0x01e920, /* ADLAM CAPITAL LETTER KPO */
/*4083*/ 0x01e943, 1, 0x01e921, /* ADLAM CAPITAL LETTER SHA */
#define FOLDS1_NORMAL_END_INDEX 4086
/*3330*/ 0xa7c8, 1, 0xa7c7, /* LATIN CAPITAL LETTER D WITH SHORT.. */
/*3333*/ 0xa7ca, 1, 0xa7c9, /* LATIN CAPITAL LETTER S WITH SHORT.. */
/*3336*/ 0xa7f6, 1, 0xa7f5, /* LATIN CAPITAL LETTER REVERSED HAL.. */
/*3339*/ 0xab53, 1, 0xa7b3, /* LATIN CAPITAL LETTER CHI */
/*3342*/ 0xff41, 1, 0xff21, /* FULLWIDTH LATIN CAPITAL LETTER A */
/*3345*/ 0xff42, 1, 0xff22, /* FULLWIDTH LATIN CAPITAL LETTER B */
/*3348*/ 0xff43, 1, 0xff23, /* FULLWIDTH LATIN CAPITAL LETTER C */
/*3351*/ 0xff44, 1, 0xff24, /* FULLWIDTH LATIN CAPITAL LETTER D */
/*3354*/ 0xff45, 1, 0xff25, /* FULLWIDTH LATIN CAPITAL LETTER E */
/*3357*/ 0xff46, 1, 0xff26, /* FULLWIDTH LATIN CAPITAL LETTER F */
/*3360*/ 0xff47, 1, 0xff27, /* FULLWIDTH LATIN CAPITAL LETTER G */
/*3363*/ 0xff48, 1, 0xff28, /* FULLWIDTH LATIN CAPITAL LETTER H */
/*3366*/ 0xff49, 1, 0xff29, /* FULLWIDTH LATIN CAPITAL LETTER I */
/*3369*/ 0xff4a, 1, 0xff2a, /* FULLWIDTH LATIN CAPITAL LETTER J */
/*3372*/ 0xff4b, 1, 0xff2b, /* FULLWIDTH LATIN CAPITAL LETTER K */
/*3375*/ 0xff4c, 1, 0xff2c, /* FULLWIDTH LATIN CAPITAL LETTER L */
/*3378*/ 0xff4d, 1, 0xff2d, /* FULLWIDTH LATIN CAPITAL LETTER M */
/*3381*/ 0xff4e, 1, 0xff2e, /* FULLWIDTH LATIN CAPITAL LETTER N */
/*3384*/ 0xff4f, 1, 0xff2f, /* FULLWIDTH LATIN CAPITAL LETTER O */
/*3387*/ 0xff50, 1, 0xff30, /* FULLWIDTH LATIN CAPITAL LETTER P */
/*3390*/ 0xff51, 1, 0xff31, /* FULLWIDTH LATIN CAPITAL LETTER Q */
/*3393*/ 0xff52, 1, 0xff32, /* FULLWIDTH LATIN CAPITAL LETTER R */
/*3396*/ 0xff53, 1, 0xff33, /* FULLWIDTH LATIN CAPITAL LETTER S */
/*3399*/ 0xff54, 1, 0xff34, /* FULLWIDTH LATIN CAPITAL LETTER T */
/*3402*/ 0xff55, 1, 0xff35, /* FULLWIDTH LATIN CAPITAL LETTER U */
/*3405*/ 0xff56, 1, 0xff36, /* FULLWIDTH LATIN CAPITAL LETTER V */
/*3408*/ 0xff57, 1, 0xff37, /* FULLWIDTH LATIN CAPITAL LETTER W */
/*3411*/ 0xff58, 1, 0xff38, /* FULLWIDTH LATIN CAPITAL LETTER X */
/*3414*/ 0xff59, 1, 0xff39, /* FULLWIDTH LATIN CAPITAL LETTER Y */
/*3417*/ 0xff5a, 1, 0xff3a, /* FULLWIDTH LATIN CAPITAL LETTER Z */
/*3420*/ 0x010428, 1, 0x010400, /* DESERET CAPITAL LETTER LONG I */
/*3423*/ 0x010429, 1, 0x010401, /* DESERET CAPITAL LETTER LONG E */
/*3426*/ 0x01042a, 1, 0x010402, /* DESERET CAPITAL LETTER LONG A */
/*3429*/ 0x01042b, 1, 0x010403, /* DESERET CAPITAL LETTER LONG AH */
/*3432*/ 0x01042c, 1, 0x010404, /* DESERET CAPITAL LETTER LONG O */
/*3435*/ 0x01042d, 1, 0x010405, /* DESERET CAPITAL LETTER LONG OO */
/*3438*/ 0x01042e, 1, 0x010406, /* DESERET CAPITAL LETTER SHORT I */
/*3441*/ 0x01042f, 1, 0x010407, /* DESERET CAPITAL LETTER SHORT E */
/*3444*/ 0x010430, 1, 0x010408, /* DESERET CAPITAL LETTER SHORT A */
/*3447*/ 0x010431, 1, 0x010409, /* DESERET CAPITAL LETTER SHORT AH */
/*3450*/ 0x010432, 1, 0x01040a, /* DESERET CAPITAL LETTER SHORT O */
/*3453*/ 0x010433, 1, 0x01040b, /* DESERET CAPITAL LETTER SHORT OO */
/*3456*/ 0x010434, 1, 0x01040c, /* DESERET CAPITAL LETTER AY */
/*3459*/ 0x010435, 1, 0x01040d, /* DESERET CAPITAL LETTER OW */
/*3462*/ 0x010436, 1, 0x01040e, /* DESERET CAPITAL LETTER WU */
/*3465*/ 0x010437, 1, 0x01040f, /* DESERET CAPITAL LETTER YEE */
/*3468*/ 0x010438, 1, 0x010410, /* DESERET CAPITAL LETTER H */
/*3471*/ 0x010439, 1, 0x010411, /* DESERET CAPITAL LETTER PEE */
/*3474*/ 0x01043a, 1, 0x010412, /* DESERET CAPITAL LETTER BEE */
/*3477*/ 0x01043b, 1, 0x010413, /* DESERET CAPITAL LETTER TEE */
/*3480*/ 0x01043c, 1, 0x010414, /* DESERET CAPITAL LETTER DEE */
/*3483*/ 0x01043d, 1, 0x010415, /* DESERET CAPITAL LETTER CHEE */
/*3486*/ 0x01043e, 1, 0x010416, /* DESERET CAPITAL LETTER JEE */
/*3489*/ 0x01043f, 1, 0x010417, /* DESERET CAPITAL LETTER KAY */
/*3492*/ 0x010440, 1, 0x010418, /* DESERET CAPITAL LETTER GAY */
/*3495*/ 0x010441, 1, 0x010419, /* DESERET CAPITAL LETTER EF */
/*3498*/ 0x010442, 1, 0x01041a, /* DESERET CAPITAL LETTER VEE */
/*3501*/ 0x010443, 1, 0x01041b, /* DESERET CAPITAL LETTER ETH */
/*3504*/ 0x010444, 1, 0x01041c, /* DESERET CAPITAL LETTER THEE */
/*3507*/ 0x010445, 1, 0x01041d, /* DESERET CAPITAL LETTER ES */
/*3510*/ 0x010446, 1, 0x01041e, /* DESERET CAPITAL LETTER ZEE */
/*3513*/ 0x010447, 1, 0x01041f, /* DESERET CAPITAL LETTER ESH */
/*3516*/ 0x010448, 1, 0x010420, /* DESERET CAPITAL LETTER ZHEE */
/*3519*/ 0x010449, 1, 0x010421, /* DESERET CAPITAL LETTER ER */
/*3522*/ 0x01044a, 1, 0x010422, /* DESERET CAPITAL LETTER EL */
/*3525*/ 0x01044b, 1, 0x010423, /* DESERET CAPITAL LETTER EM */
/*3528*/ 0x01044c, 1, 0x010424, /* DESERET CAPITAL LETTER EN */
/*3531*/ 0x01044d, 1, 0x010425, /* DESERET CAPITAL LETTER ENG */
/*3534*/ 0x01044e, 1, 0x010426, /* DESERET CAPITAL LETTER OI */
/*3537*/ 0x01044f, 1, 0x010427, /* DESERET CAPITAL LETTER EW */
/*3540*/ 0x0104d8, 1, 0x0104b0, /* OSAGE CAPITAL LETTER A */
/*3543*/ 0x0104d9, 1, 0x0104b1, /* OSAGE CAPITAL LETTER AI */
/*3546*/ 0x0104da, 1, 0x0104b2, /* OSAGE CAPITAL LETTER AIN */
/*3549*/ 0x0104db, 1, 0x0104b3, /* OSAGE CAPITAL LETTER AH */
/*3552*/ 0x0104dc, 1, 0x0104b4, /* OSAGE CAPITAL LETTER BRA */
/*3555*/ 0x0104dd, 1, 0x0104b5, /* OSAGE CAPITAL LETTER CHA */
/*3558*/ 0x0104de, 1, 0x0104b6, /* OSAGE CAPITAL LETTER EHCHA */
/*3561*/ 0x0104df, 1, 0x0104b7, /* OSAGE CAPITAL LETTER E */
/*3564*/ 0x0104e0, 1, 0x0104b8, /* OSAGE CAPITAL LETTER EIN */
/*3567*/ 0x0104e1, 1, 0x0104b9, /* OSAGE CAPITAL LETTER HA */
/*3570*/ 0x0104e2, 1, 0x0104ba, /* OSAGE CAPITAL LETTER HYA */
/*3573*/ 0x0104e3, 1, 0x0104bb, /* OSAGE CAPITAL LETTER I */
/*3576*/ 0x0104e4, 1, 0x0104bc, /* OSAGE CAPITAL LETTER KA */
/*3579*/ 0x0104e5, 1, 0x0104bd, /* OSAGE CAPITAL LETTER EHKA */
/*3582*/ 0x0104e6, 1, 0x0104be, /* OSAGE CAPITAL LETTER KYA */
/*3585*/ 0x0104e7, 1, 0x0104bf, /* OSAGE CAPITAL LETTER LA */
/*3588*/ 0x0104e8, 1, 0x0104c0, /* OSAGE CAPITAL LETTER MA */
/*3591*/ 0x0104e9, 1, 0x0104c1, /* OSAGE CAPITAL LETTER NA */
/*3594*/ 0x0104ea, 1, 0x0104c2, /* OSAGE CAPITAL LETTER O */
/*3597*/ 0x0104eb, 1, 0x0104c3, /* OSAGE CAPITAL LETTER OIN */
/*3600*/ 0x0104ec, 1, 0x0104c4, /* OSAGE CAPITAL LETTER PA */
/*3603*/ 0x0104ed, 1, 0x0104c5, /* OSAGE CAPITAL LETTER EHPA */
/*3606*/ 0x0104ee, 1, 0x0104c6, /* OSAGE CAPITAL LETTER SA */
/*3609*/ 0x0104ef, 1, 0x0104c7, /* OSAGE CAPITAL LETTER SHA */
/*3612*/ 0x0104f0, 1, 0x0104c8, /* OSAGE CAPITAL LETTER TA */
/*3615*/ 0x0104f1, 1, 0x0104c9, /* OSAGE CAPITAL LETTER EHTA */
/*3618*/ 0x0104f2, 1, 0x0104ca, /* OSAGE CAPITAL LETTER TSA */
/*3621*/ 0x0104f3, 1, 0x0104cb, /* OSAGE CAPITAL LETTER EHTSA */
/*3624*/ 0x0104f4, 1, 0x0104cc, /* OSAGE CAPITAL LETTER TSHA */
/*3627*/ 0x0104f5, 1, 0x0104cd, /* OSAGE CAPITAL LETTER DHA */
/*3630*/ 0x0104f6, 1, 0x0104ce, /* OSAGE CAPITAL LETTER U */
/*3633*/ 0x0104f7, 1, 0x0104cf, /* OSAGE CAPITAL LETTER WA */
/*3636*/ 0x0104f8, 1, 0x0104d0, /* OSAGE CAPITAL LETTER KHA */
/*3639*/ 0x0104f9, 1, 0x0104d1, /* OSAGE CAPITAL LETTER GHA */
/*3642*/ 0x0104fa, 1, 0x0104d2, /* OSAGE CAPITAL LETTER ZA */
/*3645*/ 0x0104fb, 1, 0x0104d3, /* OSAGE CAPITAL LETTER ZHA */
/*3648*/ 0x010cc0, 1, 0x010c80, /* OLD HUNGARIAN CAPITAL LETTER A */
/*3651*/ 0x010cc1, 1, 0x010c81, /* OLD HUNGARIAN CAPITAL LETTER AA */
/*3654*/ 0x010cc2, 1, 0x010c82, /* OLD HUNGARIAN CAPITAL LETTER EB */
/*3657*/ 0x010cc3, 1, 0x010c83, /* OLD HUNGARIAN CAPITAL LETTER AMB */
/*3660*/ 0x010cc4, 1, 0x010c84, /* OLD HUNGARIAN CAPITAL LETTER EC */
/*3663*/ 0x010cc5, 1, 0x010c85, /* OLD HUNGARIAN CAPITAL LETTER ENC */
/*3666*/ 0x010cc6, 1, 0x010c86, /* OLD HUNGARIAN CAPITAL LETTER ECS */
/*3669*/ 0x010cc7, 1, 0x010c87, /* OLD HUNGARIAN CAPITAL LETTER ED */
/*3672*/ 0x010cc8, 1, 0x010c88, /* OLD HUNGARIAN CAPITAL LETTER AND */
/*3675*/ 0x010cc9, 1, 0x010c89, /* OLD HUNGARIAN CAPITAL LETTER E */
/*3678*/ 0x010cca, 1, 0x010c8a, /* OLD HUNGARIAN CAPITAL LETTER CLOS.. */
/*3681*/ 0x010ccb, 1, 0x010c8b, /* OLD HUNGARIAN CAPITAL LETTER EE */
/*3684*/ 0x010ccc, 1, 0x010c8c, /* OLD HUNGARIAN CAPITAL LETTER EF */
/*3687*/ 0x010ccd, 1, 0x010c8d, /* OLD HUNGARIAN CAPITAL LETTER EG */
/*3690*/ 0x010cce, 1, 0x010c8e, /* OLD HUNGARIAN CAPITAL LETTER EGY */
/*3693*/ 0x010ccf, 1, 0x010c8f, /* OLD HUNGARIAN CAPITAL LETTER EH */
/*3696*/ 0x010cd0, 1, 0x010c90, /* OLD HUNGARIAN CAPITAL LETTER I */
/*3699*/ 0x010cd1, 1, 0x010c91, /* OLD HUNGARIAN CAPITAL LETTER II */
/*3702*/ 0x010cd2, 1, 0x010c92, /* OLD HUNGARIAN CAPITAL LETTER EJ */
/*3705*/ 0x010cd3, 1, 0x010c93, /* OLD HUNGARIAN CAPITAL LETTER EK */
/*3708*/ 0x010cd4, 1, 0x010c94, /* OLD HUNGARIAN CAPITAL LETTER AK */
/*3711*/ 0x010cd5, 1, 0x010c95, /* OLD HUNGARIAN CAPITAL LETTER UNK */
/*3714*/ 0x010cd6, 1, 0x010c96, /* OLD HUNGARIAN CAPITAL LETTER EL */
/*3717*/ 0x010cd7, 1, 0x010c97, /* OLD HUNGARIAN CAPITAL LETTER ELY */
/*3720*/ 0x010cd8, 1, 0x010c98, /* OLD HUNGARIAN CAPITAL LETTER EM */
/*3723*/ 0x010cd9, 1, 0x010c99, /* OLD HUNGARIAN CAPITAL LETTER EN */
/*3726*/ 0x010cda, 1, 0x010c9a, /* OLD HUNGARIAN CAPITAL LETTER ENY */
/*3729*/ 0x010cdb, 1, 0x010c9b, /* OLD HUNGARIAN CAPITAL LETTER O */
/*3732*/ 0x010cdc, 1, 0x010c9c, /* OLD HUNGARIAN CAPITAL LETTER OO */
/*3735*/ 0x010cdd, 1, 0x010c9d, /* OLD HUNGARIAN CAPITAL LETTER NIKO.. */
/*3738*/ 0x010cde, 1, 0x010c9e, /* OLD HUNGARIAN CAPITAL LETTER RUDI.. */
/*3741*/ 0x010cdf, 1, 0x010c9f, /* OLD HUNGARIAN CAPITAL LETTER OEE */
/*3744*/ 0x010ce0, 1, 0x010ca0, /* OLD HUNGARIAN CAPITAL LETTER EP */
/*3747*/ 0x010ce1, 1, 0x010ca1, /* OLD HUNGARIAN CAPITAL LETTER EMP */
/*3750*/ 0x010ce2, 1, 0x010ca2, /* OLD HUNGARIAN CAPITAL LETTER ER */
/*3753*/ 0x010ce3, 1, 0x010ca3, /* OLD HUNGARIAN CAPITAL LETTER SHOR.. */
/*3756*/ 0x010ce4, 1, 0x010ca4, /* OLD HUNGARIAN CAPITAL LETTER ES */
/*3759*/ 0x010ce5, 1, 0x010ca5, /* OLD HUNGARIAN CAPITAL LETTER ESZ */
/*3762*/ 0x010ce6, 1, 0x010ca6, /* OLD HUNGARIAN CAPITAL LETTER ET */
/*3765*/ 0x010ce7, 1, 0x010ca7, /* OLD HUNGARIAN CAPITAL LETTER ENT */
/*3768*/ 0x010ce8, 1, 0x010ca8, /* OLD HUNGARIAN CAPITAL LETTER ETY */
/*3771*/ 0x010ce9, 1, 0x010ca9, /* OLD HUNGARIAN CAPITAL LETTER ECH */
/*3774*/ 0x010cea, 1, 0x010caa, /* OLD HUNGARIAN CAPITAL LETTER U */
/*3777*/ 0x010ceb, 1, 0x010cab, /* OLD HUNGARIAN CAPITAL LETTER UU */
/*3780*/ 0x010cec, 1, 0x010cac, /* OLD HUNGARIAN CAPITAL LETTER NIKO.. */
/*3783*/ 0x010ced, 1, 0x010cad, /* OLD HUNGARIAN CAPITAL LETTER RUDI.. */
/*3786*/ 0x010cee, 1, 0x010cae, /* OLD HUNGARIAN CAPITAL LETTER EV */
/*3789*/ 0x010cef, 1, 0x010caf, /* OLD HUNGARIAN CAPITAL LETTER EZ */
/*3792*/ 0x010cf0, 1, 0x010cb0, /* OLD HUNGARIAN CAPITAL LETTER EZS */
/*3795*/ 0x010cf1, 1, 0x010cb1, /* OLD HUNGARIAN CAPITAL LETTER ENT-.. */
/*3798*/ 0x010cf2, 1, 0x010cb2, /* OLD HUNGARIAN CAPITAL LETTER US */
/*3801*/ 0x0118c0, 1, 0x0118a0, /* WARANG CITI CAPITAL LETTER NGAA */
/*3804*/ 0x0118c1, 1, 0x0118a1, /* WARANG CITI CAPITAL LETTER A */
/*3807*/ 0x0118c2, 1, 0x0118a2, /* WARANG CITI CAPITAL LETTER WI */
/*3810*/ 0x0118c3, 1, 0x0118a3, /* WARANG CITI CAPITAL LETTER YU */
/*3813*/ 0x0118c4, 1, 0x0118a4, /* WARANG CITI CAPITAL LETTER YA */
/*3816*/ 0x0118c5, 1, 0x0118a5, /* WARANG CITI CAPITAL LETTER YO */
/*3819*/ 0x0118c6, 1, 0x0118a6, /* WARANG CITI CAPITAL LETTER II */
/*3822*/ 0x0118c7, 1, 0x0118a7, /* WARANG CITI CAPITAL LETTER UU */
/*3825*/ 0x0118c8, 1, 0x0118a8, /* WARANG CITI CAPITAL LETTER E */
/*3828*/ 0x0118c9, 1, 0x0118a9, /* WARANG CITI CAPITAL LETTER O */
/*3831*/ 0x0118ca, 1, 0x0118aa, /* WARANG CITI CAPITAL LETTER ANG */
/*3834*/ 0x0118cb, 1, 0x0118ab, /* WARANG CITI CAPITAL LETTER GA */
/*3837*/ 0x0118cc, 1, 0x0118ac, /* WARANG CITI CAPITAL LETTER KO */
/*3840*/ 0x0118cd, 1, 0x0118ad, /* WARANG CITI CAPITAL LETTER ENY */
/*3843*/ 0x0118ce, 1, 0x0118ae, /* WARANG CITI CAPITAL LETTER YUJ */
/*3846*/ 0x0118cf, 1, 0x0118af, /* WARANG CITI CAPITAL LETTER UC */
/*3849*/ 0x0118d0, 1, 0x0118b0, /* WARANG CITI CAPITAL LETTER ENN */
/*3852*/ 0x0118d1, 1, 0x0118b1, /* WARANG CITI CAPITAL LETTER ODD */
/*3855*/ 0x0118d2, 1, 0x0118b2, /* WARANG CITI CAPITAL LETTER TTE */
/*3858*/ 0x0118d3, 1, 0x0118b3, /* WARANG CITI CAPITAL LETTER NUNG */
/*3861*/ 0x0118d4, 1, 0x0118b4, /* WARANG CITI CAPITAL LETTER DA */
/*3864*/ 0x0118d5, 1, 0x0118b5, /* WARANG CITI CAPITAL LETTER AT */
/*3867*/ 0x0118d6, 1, 0x0118b6, /* WARANG CITI CAPITAL LETTER AM */
/*3870*/ 0x0118d7, 1, 0x0118b7, /* WARANG CITI CAPITAL LETTER BU */
/*3873*/ 0x0118d8, 1, 0x0118b8, /* WARANG CITI CAPITAL LETTER PU */
/*3876*/ 0x0118d9, 1, 0x0118b9, /* WARANG CITI CAPITAL LETTER HIYO */
/*3879*/ 0x0118da, 1, 0x0118ba, /* WARANG CITI CAPITAL LETTER HOLO */
/*3882*/ 0x0118db, 1, 0x0118bb, /* WARANG CITI CAPITAL LETTER HORR */
/*3885*/ 0x0118dc, 1, 0x0118bc, /* WARANG CITI CAPITAL LETTER HAR */
/*3888*/ 0x0118dd, 1, 0x0118bd, /* WARANG CITI CAPITAL LETTER SSUU */
/*3891*/ 0x0118de, 1, 0x0118be, /* WARANG CITI CAPITAL LETTER SII */
/*3894*/ 0x0118df, 1, 0x0118bf, /* WARANG CITI CAPITAL LETTER VIYO */
/*3897*/ 0x016e60, 1, 0x016e40, /* MEDEFAIDRIN CAPITAL LETTER M */
/*3900*/ 0x016e61, 1, 0x016e41, /* MEDEFAIDRIN CAPITAL LETTER S */
/*3903*/ 0x016e62, 1, 0x016e42, /* MEDEFAIDRIN CAPITAL LETTER V */
/*3906*/ 0x016e63, 1, 0x016e43, /* MEDEFAIDRIN CAPITAL LETTER W */
/*3909*/ 0x016e64, 1, 0x016e44, /* MEDEFAIDRIN CAPITAL LETTER ATIU */
/*3912*/ 0x016e65, 1, 0x016e45, /* MEDEFAIDRIN CAPITAL LETTER Z */
/*3915*/ 0x016e66, 1, 0x016e46, /* MEDEFAIDRIN CAPITAL LETTER KP */
/*3918*/ 0x016e67, 1, 0x016e47, /* MEDEFAIDRIN CAPITAL LETTER P */
/*3921*/ 0x016e68, 1, 0x016e48, /* MEDEFAIDRIN CAPITAL LETTER T */
/*3924*/ 0x016e69, 1, 0x016e49, /* MEDEFAIDRIN CAPITAL LETTER G */
/*3927*/ 0x016e6a, 1, 0x016e4a, /* MEDEFAIDRIN CAPITAL LETTER F */
/*3930*/ 0x016e6b, 1, 0x016e4b, /* MEDEFAIDRIN CAPITAL LETTER I */
/*3933*/ 0x016e6c, 1, 0x016e4c, /* MEDEFAIDRIN CAPITAL LETTER K */
/*3936*/ 0x016e6d, 1, 0x016e4d, /* MEDEFAIDRIN CAPITAL LETTER A */
/*3939*/ 0x016e6e, 1, 0x016e4e, /* MEDEFAIDRIN CAPITAL LETTER J */
/*3942*/ 0x016e6f, 1, 0x016e4f, /* MEDEFAIDRIN CAPITAL LETTER E */
/*3945*/ 0x016e70, 1, 0x016e50, /* MEDEFAIDRIN CAPITAL LETTER B */
/*3948*/ 0x016e71, 1, 0x016e51, /* MEDEFAIDRIN CAPITAL LETTER C */
/*3951*/ 0x016e72, 1, 0x016e52, /* MEDEFAIDRIN CAPITAL LETTER U */
/*3954*/ 0x016e73, 1, 0x016e53, /* MEDEFAIDRIN CAPITAL LETTER YU */
/*3957*/ 0x016e74, 1, 0x016e54, /* MEDEFAIDRIN CAPITAL LETTER L */
/*3960*/ 0x016e75, 1, 0x016e55, /* MEDEFAIDRIN CAPITAL LETTER Q */
/*3963*/ 0x016e76, 1, 0x016e56, /* MEDEFAIDRIN CAPITAL LETTER HP */
/*3966*/ 0x016e77, 1, 0x016e57, /* MEDEFAIDRIN CAPITAL LETTER NY */
/*3969*/ 0x016e78, 1, 0x016e58, /* MEDEFAIDRIN CAPITAL LETTER X */
/*3972*/ 0x016e79, 1, 0x016e59, /* MEDEFAIDRIN CAPITAL LETTER D */
/*3975*/ 0x016e7a, 1, 0x016e5a, /* MEDEFAIDRIN CAPITAL LETTER OE */
/*3978*/ 0x016e7b, 1, 0x016e5b, /* MEDEFAIDRIN CAPITAL LETTER N */
/*3981*/ 0x016e7c, 1, 0x016e5c, /* MEDEFAIDRIN CAPITAL LETTER R */
/*3984*/ 0x016e7d, 1, 0x016e5d, /* MEDEFAIDRIN CAPITAL LETTER O */
/*3987*/ 0x016e7e, 1, 0x016e5e, /* MEDEFAIDRIN CAPITAL LETTER AI */
/*3990*/ 0x016e7f, 1, 0x016e5f, /* MEDEFAIDRIN CAPITAL LETTER Y */
/*3993*/ 0x01e922, 1, 0x01e900, /* ADLAM CAPITAL LETTER ALIF */
/*3996*/ 0x01e923, 1, 0x01e901, /* ADLAM CAPITAL LETTER DAALI */
/*3999*/ 0x01e924, 1, 0x01e902, /* ADLAM CAPITAL LETTER LAAM */
/*4002*/ 0x01e925, 1, 0x01e903, /* ADLAM CAPITAL LETTER MIIM */
/*4005*/ 0x01e926, 1, 0x01e904, /* ADLAM CAPITAL LETTER BA */
/*4008*/ 0x01e927, 1, 0x01e905, /* ADLAM CAPITAL LETTER SINNYIIYHE */
/*4011*/ 0x01e928, 1, 0x01e906, /* ADLAM CAPITAL LETTER PE */
/*4014*/ 0x01e929, 1, 0x01e907, /* ADLAM CAPITAL LETTER BHE */
/*4017*/ 0x01e92a, 1, 0x01e908, /* ADLAM CAPITAL LETTER RA */
/*4020*/ 0x01e92b, 1, 0x01e909, /* ADLAM CAPITAL LETTER E */
/*4023*/ 0x01e92c, 1, 0x01e90a, /* ADLAM CAPITAL LETTER FA */
/*4026*/ 0x01e92d, 1, 0x01e90b, /* ADLAM CAPITAL LETTER I */
/*4029*/ 0x01e92e, 1, 0x01e90c, /* ADLAM CAPITAL LETTER O */
/*4032*/ 0x01e92f, 1, 0x01e90d, /* ADLAM CAPITAL LETTER DHA */
/*4035*/ 0x01e930, 1, 0x01e90e, /* ADLAM CAPITAL LETTER YHE */
/*4038*/ 0x01e931, 1, 0x01e90f, /* ADLAM CAPITAL LETTER WAW */
/*4041*/ 0x01e932, 1, 0x01e910, /* ADLAM CAPITAL LETTER NUN */
/*4044*/ 0x01e933, 1, 0x01e911, /* ADLAM CAPITAL LETTER KAF */
/*4047*/ 0x01e934, 1, 0x01e912, /* ADLAM CAPITAL LETTER YA */
/*4050*/ 0x01e935, 1, 0x01e913, /* ADLAM CAPITAL LETTER U */
/*4053*/ 0x01e936, 1, 0x01e914, /* ADLAM CAPITAL LETTER JIIM */
/*4056*/ 0x01e937, 1, 0x01e915, /* ADLAM CAPITAL LETTER CHI */
/*4059*/ 0x01e938, 1, 0x01e916, /* ADLAM CAPITAL LETTER HA */
/*4062*/ 0x01e939, 1, 0x01e917, /* ADLAM CAPITAL LETTER QAAF */
/*4065*/ 0x01e93a, 1, 0x01e918, /* ADLAM CAPITAL LETTER GA */
/*4068*/ 0x01e93b, 1, 0x01e919, /* ADLAM CAPITAL LETTER NYA */
/*4071*/ 0x01e93c, 1, 0x01e91a, /* ADLAM CAPITAL LETTER TU */
/*4074*/ 0x01e93d, 1, 0x01e91b, /* ADLAM CAPITAL LETTER NHA */
/*4077*/ 0x01e93e, 1, 0x01e91c, /* ADLAM CAPITAL LETTER VA */
/*4080*/ 0x01e93f, 1, 0x01e91d, /* ADLAM CAPITAL LETTER KHA */
/*4083*/ 0x01e940, 1, 0x01e91e, /* ADLAM CAPITAL LETTER GBE */
/*4086*/ 0x01e941, 1, 0x01e91f, /* ADLAM CAPITAL LETTER ZAL */
/*4089*/ 0x01e942, 1, 0x01e920, /* ADLAM CAPITAL LETTER KPO */
/*4092*/ 0x01e943, 1, 0x01e921, /* ADLAM CAPITAL LETTER SHA */
#define FOLDS1_NORMAL_END_INDEX 4095
/* ----- LOCALE ----- */
/*4086*/ 0x0069, 1, 0x0049, /* LATIN CAPITAL LETTER I */
#define FOLDS1_END_INDEX 4089
/*4095*/ 0x0069, 1, 0x0049, /* LATIN CAPITAL LETTER I */
#define FOLDS1_END_INDEX 4098
};
OnigCodePoint OnigUnicodeFolds2[] = {

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -25,7 +25,7 @@
* SUCH DAMAGE.
*/
#define WORD_BREAK_PROPERTY_VERSION 120100
#define WORD_BREAK_PROPERTY_VERSION 130000
/*
ALetter
@ -48,7 +48,7 @@ WSegSpace
ZWJ
*/
static int WB_RANGE_NUM = 970;
static int WB_RANGE_NUM = 993;
static WB_RANGE_TYPE WB_RANGES[] = {
{0x00000a, 0x00000a, WB_LF },
{0x00000b, 0x00000c, WB_Newline },
@ -73,8 +73,7 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x0000c0, 0x0000d6, WB_ALetter },
{0x0000d8, 0x0000f6, WB_ALetter },
{0x0000f8, 0x0002d7, WB_ALetter },
{0x0002de, 0x0002e4, WB_ALetter },
{0x0002ec, 0x0002ff, WB_ALetter },
{0x0002de, 0x0002ff, WB_ALetter },
{0x000300, 0x00036f, WB_Extend },
{0x000370, 0x000374, WB_ALetter },
{0x000376, 0x000377, WB_ALetter },
@ -91,11 +90,12 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x000483, 0x000489, WB_Extend },
{0x00048a, 0x00052f, WB_ALetter },
{0x000531, 0x000556, WB_ALetter },
{0x000559, 0x000559, WB_ALetter },
{0x00055b, 0x00055c, WB_ALetter },
{0x000559, 0x00055c, WB_ALetter },
{0x00055e, 0x00055e, WB_ALetter },
{0x00055f, 0x00055f, WB_MidLetter },
{0x000560, 0x000588, WB_ALetter },
{0x000589, 0x000589, WB_MidNum },
{0x00058a, 0x00058a, WB_ALetter },
{0x000591, 0x0005bd, WB_Extend },
{0x0005bf, 0x0005bf, WB_Extend },
{0x0005c1, 0x0005c2, WB_Extend },
@ -155,7 +155,7 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x000859, 0x00085b, WB_Extend },
{0x000860, 0x00086a, WB_ALetter },
{0x0008a0, 0x0008b4, WB_ALetter },
{0x0008b6, 0x0008bd, WB_ALetter },
{0x0008b6, 0x0008c7, WB_ALetter },
{0x0008d3, 0x0008e1, WB_Extend },
{0x0008e2, 0x0008e2, WB_Format },
{0x0008e3, 0x000903, WB_Extend },
@ -239,7 +239,7 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x000b3e, 0x000b44, WB_Extend },
{0x000b47, 0x000b48, WB_Extend },
{0x000b4b, 0x000b4d, WB_Extend },
{0x000b56, 0x000b57, WB_Extend },
{0x000b55, 0x000b57, WB_Extend },
{0x000b5c, 0x000b5d, WB_ALetter },
{0x000b5f, 0x000b61, WB_ALetter },
{0x000b62, 0x000b63, WB_Extend },
@ -295,7 +295,7 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x000ce6, 0x000cef, WB_Numeric },
{0x000cf1, 0x000cf2, WB_ALetter },
{0x000d00, 0x000d03, WB_Extend },
{0x000d05, 0x000d0c, WB_ALetter },
{0x000d04, 0x000d0c, WB_ALetter },
{0x000d0e, 0x000d10, WB_ALetter },
{0x000d12, 0x000d3a, WB_ALetter },
{0x000d3b, 0x000d3c, WB_Extend },
@ -310,7 +310,7 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x000d62, 0x000d63, WB_Extend },
{0x000d66, 0x000d6f, WB_Numeric },
{0x000d7a, 0x000d7f, WB_ALetter },
{0x000d82, 0x000d83, WB_Extend },
{0x000d81, 0x000d83, WB_Extend },
{0x000d85, 0x000d96, WB_ALetter },
{0x000d9a, 0x000db1, WB_ALetter },
{0x000db3, 0x000dbb, WB_ALetter },
@ -421,7 +421,7 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x001a7f, 0x001a7f, WB_Extend },
{0x001a80, 0x001a89, WB_Numeric },
{0x001a90, 0x001a99, WB_Numeric },
{0x001ab0, 0x001abe, WB_Extend },
{0x001ab0, 0x001ac0, WB_Extend },
{0x001b00, 0x001b04, WB_Extend },
{0x001b05, 0x001b33, WB_ALetter },
{0x001b34, 0x001b44, WB_Extend },
@ -545,7 +545,7 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x0030fc, 0x0030ff, WB_Katakana },
{0x003105, 0x00312f, WB_ALetter },
{0x003131, 0x00318e, WB_ALetter },
{0x0031a0, 0x0031ba, WB_ALetter },
{0x0031a0, 0x0031bf, WB_ALetter },
{0x0031f0, 0x0031ff, WB_Katakana },
{0x0032d0, 0x0032fe, WB_Katakana },
{0x003300, 0x003357, WB_Katakana },
@ -562,9 +562,9 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x00a69e, 0x00a69f, WB_Extend },
{0x00a6a0, 0x00a6ef, WB_ALetter },
{0x00a6f0, 0x00a6f1, WB_Extend },
{0x00a717, 0x00a7bf, WB_ALetter },
{0x00a7c2, 0x00a7c6, WB_ALetter },
{0x00a7f7, 0x00a801, WB_ALetter },
{0x00a708, 0x00a7bf, WB_ALetter },
{0x00a7c2, 0x00a7ca, WB_ALetter },
{0x00a7f5, 0x00a801, WB_ALetter },
{0x00a802, 0x00a802, WB_Extend },
{0x00a803, 0x00a805, WB_ALetter },
{0x00a806, 0x00a806, WB_Extend },
@ -572,6 +572,7 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x00a80b, 0x00a80b, WB_Extend },
{0x00a80c, 0x00a822, WB_ALetter },
{0x00a823, 0x00a827, WB_Extend },
{0x00a82c, 0x00a82c, WB_Extend },
{0x00a840, 0x00a873, WB_ALetter },
{0x00a880, 0x00a881, WB_Extend },
{0x00a882, 0x00a8b3, WB_ALetter },
@ -617,7 +618,7 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x00ab11, 0x00ab16, WB_ALetter },
{0x00ab20, 0x00ab26, WB_ALetter },
{0x00ab28, 0x00ab2e, WB_ALetter },
{0x00ab30, 0x00ab67, WB_ALetter },
{0x00ab30, 0x00ab69, WB_ALetter },
{0x00ab70, 0x00abe2, WB_ALetter },
{0x00abe3, 0x00abea, WB_Extend },
{0x00abec, 0x00abed, WB_Extend },
@ -739,10 +740,14 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x010d00, 0x010d23, WB_ALetter },
{0x010d24, 0x010d27, WB_Extend },
{0x010d30, 0x010d39, WB_Numeric },
{0x010e80, 0x010ea9, WB_ALetter },
{0x010eab, 0x010eac, WB_Extend },
{0x010eb0, 0x010eb1, WB_ALetter },
{0x010f00, 0x010f1c, WB_ALetter },
{0x010f27, 0x010f27, WB_ALetter },
{0x010f30, 0x010f45, WB_ALetter },
{0x010f46, 0x010f50, WB_Extend },
{0x010fb0, 0x010fc4, WB_ALetter },
{0x010fe0, 0x010ff6, WB_ALetter },
{0x011000, 0x011002, WB_Extend },
{0x011003, 0x011037, WB_ALetter },
@ -761,6 +766,7 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x011136, 0x01113f, WB_Numeric },
{0x011144, 0x011144, WB_ALetter },
{0x011145, 0x011146, WB_Extend },
{0x011147, 0x011147, WB_ALetter },
{0x011150, 0x011172, WB_ALetter },
{0x011173, 0x011173, WB_Extend },
{0x011176, 0x011176, WB_ALetter },
@ -769,6 +775,7 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x0111b3, 0x0111c0, WB_Extend },
{0x0111c1, 0x0111c4, WB_ALetter },
{0x0111c9, 0x0111cc, WB_Extend },
{0x0111ce, 0x0111cf, WB_Extend },
{0x0111d0, 0x0111d9, WB_Numeric },
{0x0111da, 0x0111da, WB_ALetter },
{0x0111dc, 0x0111dc, WB_ALetter },
@ -807,7 +814,7 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x011447, 0x01144a, WB_ALetter },
{0x011450, 0x011459, WB_Numeric },
{0x01145e, 0x01145e, WB_Extend },
{0x01145f, 0x01145f, WB_ALetter },
{0x01145f, 0x011461, WB_ALetter },
{0x011480, 0x0114af, WB_ALetter },
{0x0114b0, 0x0114c3, WB_Extend },
{0x0114c4, 0x0114c5, WB_ALetter },
@ -832,7 +839,19 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x01182c, 0x01183a, WB_Extend },
{0x0118a0, 0x0118df, WB_ALetter },
{0x0118e0, 0x0118e9, WB_Numeric },
{0x0118ff, 0x0118ff, WB_ALetter },
{0x0118ff, 0x011906, WB_ALetter },
{0x011909, 0x011909, WB_ALetter },
{0x01190c, 0x011913, WB_ALetter },
{0x011915, 0x011916, WB_ALetter },
{0x011918, 0x01192f, WB_ALetter },
{0x011930, 0x011935, WB_Extend },
{0x011937, 0x011938, WB_Extend },
{0x01193b, 0x01193e, WB_Extend },
{0x01193f, 0x01193f, WB_ALetter },
{0x011940, 0x011940, WB_Extend },
{0x011941, 0x011941, WB_ALetter },
{0x011942, 0x011943, WB_Extend },
{0x011950, 0x011959, WB_Numeric },
{0x0119a0, 0x0119a7, WB_ALetter },
{0x0119aa, 0x0119d0, WB_ALetter },
{0x0119d1, 0x0119d7, WB_Extend },
@ -882,6 +901,7 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x011da0, 0x011da9, WB_Numeric },
{0x011ee0, 0x011ef2, WB_ALetter },
{0x011ef3, 0x011ef6, WB_Extend },
{0x011fb0, 0x011fb0, WB_ALetter },
{0x012000, 0x012399, WB_ALetter },
{0x012400, 0x01246e, WB_ALetter },
{0x012480, 0x012543, WB_ALetter },
@ -908,6 +928,8 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x016f93, 0x016f9f, WB_ALetter },
{0x016fe0, 0x016fe1, WB_ALetter },
{0x016fe3, 0x016fe3, WB_ALetter },
{0x016fe4, 0x016fe4, WB_Extend },
{0x016ff0, 0x016ff1, WB_Extend },
{0x01b000, 0x01b000, WB_Katakana },
{0x01b164, 0x01b167, WB_Katakana },
{0x01bc00, 0x01bc6a, WB_ALetter },
@ -1017,6 +1039,7 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x01f170, 0x01f189, WB_ALetter },
{0x01f1e6, 0x01f1ff, WB_Regional_Indicator },
{0x01f3fb, 0x01f3ff, WB_Extend },
{0x01fbf0, 0x01fbf9, WB_Numeric },
{0x0e0001, 0x0e0001, WB_Format },
{0x0e0020, 0x0e007f, WB_Extend },
{0x0e0100, 0x0e01ef, WB_Extend }