+upd: Oniguruma v6.9.10

This commit is contained in:
METANEOCORTEX\Kotti 2025-06-12 01:13:30 +02:00
parent c94222ef00
commit e35fe62b0f
35 changed files with 14359 additions and 12235 deletions

View File

@ -528,7 +528,7 @@ const LexicalClass lexicalClasses[] = {
27, "SCE_C_ESCAPESEQUENCE", "literal string escapesequence", "Escape sequence",
};
constexpr int sizeLexicalClasses{ std::size(lexicalClasses) };
constexpr int sizeLexicalClasses{ static_cast<int>(std::size(lexicalClasses)) };
}

View File

@ -1,6 +1,19 @@
History
2023/10/1X: Version 6.9.9
2024/XX/XX: Version 6.9.10
2024/11/18: fix #312: Build failure with GCC 15 (C23)
2024/09/11: Update to Unicode 16.0
2024/06/20: fix #290: retry limit in match == 0 means unlimited
2024/06/15: add new callout (*SKIP) #299
2024/06/05: add new behavior ONIG_SYN_ALLOW_CHAR_TYPE_FOLLOWED_BY_MINUS_IN_CC (#298)
2024/05/28: fix #296: ONIG_SYNTAX_EMACS doesn't support 'shy groups'
2024/05/24: fix #295: Invalid result for empty match with anchors
2024/04/03: fix #293: Literal escaped braces
2024/04/02: fix total call with whole options
2024/04/01: fix #292: ONIG_SYN_CONTEXT_INDEP_REPEAT_OPS not working for ^* pattern
2023/10/14: Version 6.9.9
2023/09/17: Update to Unicode 15.1.0
2023/07/11: Make sure oniguruma.pc is removed on distclean
@ -735,7 +748,7 @@ History
Any, Assigned, C, Cc, L, Lm, Arabic, Greek etc...
2006/09/21: [impl] add USE_UNICODE_PROPERTIES into regenc.h.
2006/09/21: [impl] remove USE_UNICODE_FULL_RANGE_CTYPE.
2006/09/20: [impl] change ONIGENC_CTYPE_XXXX to sequencial values.
2006/09/20: [impl] change ONIGENC_CTYPE_XXXX to sequential values.
add BIT_CTYPE_XXXX bit flags to regenc.h.
update XXXX_CtypeTable[] for BIT_CTYPE_ALNUM.
2006/09/19: [memo] move from CVS to Subversion (1.3.2).
@ -1197,7 +1210,7 @@ History
2004/12/16: [test] success in ruby 1.9.0 (2004-12-16) [i686-linux].
2004/12/16: [dist] update hash.c.patch.
2004/12/15: [bug] (thanks matz)
char > 127 should be casted to unsigned char. (utf8.c)
char > 127 should be cast to unsigned char. (utf8.c)
2004/12/13: [impl] add HAVE_PROTOTYPES and HAVE_STDARG_PROTOTYPES definition
to oniguruma.h in the case __cplusplus.
2004/12/06: [dist] update doc/RE and doc/RE.ja.
@ -2104,7 +2117,7 @@ History
2003/03/08: [impl] remove check_backref_number().
2003/03/08: [bug] called group in 0-repeat should not be eliminated from
compile code. ex. /(?*n)(?<n>){0}/ (thanks akr)
add is_refered member to QualifierNode.
add is_referred member to QualifierNode.
2003/03/07: [impl] use hash table(st.[ch]) for implementation of name table.
(enable on Ruby in default)
2003/03/07: [new] add regex_foreach_names().
@ -2157,7 +2170,7 @@ History
if it is set, then error /(\1)/, /\1(..)/ etc...
2003/02/26: [spec] if backref number is greater than max group number,
then return compile error. (REGERR_INVALID_BACKREF_NUMBER)
2003/02/26: [tune] bad implemented N_ALT case in get_min_match_length().
2003/02/26: [tune] badly implemented N_ALT case in get_min_match_length().
2003/02/26: [dist] auto update testc.c and win32/testc.c in dist target.
2003/02/26: [impl] add -win option to testconv.rb.
2003/02/25: [spec] allow to assign same name to different group.
@ -2277,7 +2290,7 @@ History
2003/01/18: [impl] change REGION_NOTPOS to REG_REGION_NOTPOS in regex.h.
2003/01/17: [dist] add sample/simple.c.
2003/01/17: [inst] add configure option --with-rubydir.
2003/01/17: [bug] bad implemeted POSIX API options.
2003/01/17: [bug] badly implemented POSIX API options.
default: /./ not match "\n", anchor not match "\n"
REG_NEWLINE: /./ not match "\n", anchor match "\n"
2003/01/16: [impl] rewrite POSIX API regexec() for speed up.

View File

@ -1,9 +1,8 @@
[![Fuzzing Status](https://oss-fuzz-build-logs.storage.googleapis.com/badges/oniguruma.svg)](https://oss-fuzz-build-logs.storage.googleapis.com/index.html#oniguruma)
Oniguruma
=========
## **This project ended on April 24, 2025.**
## **Since 2020, Oniguruma has been under attack on Google search in Japan.** [(Issue #234)](https://github.com/kkos/oniguruma/issues/234)
## **The only open source software attacked on Google search in Japan.** [(Issue #234)](https://github.com/kkos/oniguruma/issues/234)
https://github.com/kkos/oniguruma
@ -26,6 +25,7 @@ Supported character encodings:
* GB18030: contributed by KUBO Takehiro
* CP1251: contributed by Byte
* doc/SYNTAX.md: contributed by seanofw
* doc/onig_syn_md.c: tonco-miyazawa
Notice (from 6.9.6)
@ -33,6 +33,18 @@ Notice (from 6.9.6)
When using configure script, if you have the POSIX API enabled in an earlier version (disabled by default in 6.9.5) and you need application binary compatibility with the POSIX API, specify "--enable-binary-compatible-posix-api=yes" instead of "--enable-posix-api=yes". Starting in 6.9.6, "--enable-posix-api=yes" only supports source-level compatibility for 6.9.5 and earlier about POSIX API. (Issue #210)
Master branch
-------------
* Unicode property \pC, \pL, \pM, \pN, \pP, \pS, \pZ
Version 6.9.10
--------------
* Update Unicode version 16.0
* Add new operator (*SKIP) (PR#299)
* Fixed: ONIG_SYN_CONTEXT_INDEP_REPEAT_OPS not working for ^* pattern (Issue #292)
Version 6.9.9
-------------
* Update Unicode version 15.1.0

View File

@ -1,4 +1,4 @@
Oniguruma API Version 6.9.9 2022/10/28
Oniguruma API Version 6.9.10 2024/06/26
#include <oniguruma.h>
@ -277,6 +277,7 @@ Oniguruma API Version 6.9.9 2022/10/28
# int onig_set_retry_limit_in_match_of_match_param(OnigMatchParam* mp, unsigned long limit)
Set a retry limit count of a match process.
0 means unlimited.
arguments
1 mp: match-param pointer
@ -985,6 +986,7 @@ Oniguruma API Version 6.9.9 2022/10/28
# int onig_set_retry_limit_in_match(unsigned long limit)
Set the limit of retry counts in matching process.
0 means unlimited.
normal return: ONIG_NORMAL

View File

@ -1,4 +1,4 @@
CALLOUTS.BUILTIN 2018/03/26
CALLOUTS.BUILTIN 2024/07/04
* FAIL (progress)
@ -92,4 +92,13 @@ CALLOUTS.BUILTIN 2018/03/26
[callout data]
slot 0: op value (enum OP_CMP in src/regexec.c)
* SKIP (progress)
(*SKIP)
Advance the position where the current matching fails and the next search
begins to the current position.
It has no effect on the current matching.
//END

View File

@ -1,4 +1,4 @@
Oniguruma Regular Expressions Version 6.9.9 2023/03/27
Oniguruma Regular Expressions Version 6.9.11 2025/03/21
syntax: ONIG_SYNTAX_ONIGURUMA (default syntax)
@ -114,6 +114,8 @@ syntax: ONIG_SYNTAX_ONIGURUMA (default syntax)
* \p{property-name}
* \p{^property-name} (negative)
* \P{property-name} (negative)
* \pX (X = C, L, M, N, P, S, Z)
* \PX (X = C, L, M, N, P, S, Z) (negative)
property-name:
@ -237,22 +239,21 @@ syntax: ONIG_SYNTAX_ONIGURUMA (default syntax)
Unicode Case:
alnum Letter | Mark | Decimal_Number
alpha Letter | Mark
ascii 0000 - 007F
blank Space_Separator | 0009
cntrl Control | Format | Unassigned | Private_Use | Surrogate
alnum Alphabetic | Decimal_Number
alpha Alphabetic
ascii U+0000 - U+007F
blank Space_Separator | U+0009
cntrl U+0000 - U+001F, U+007F - U+009F
digit Decimal_Number
graph [[:^space:]] && ^Control && ^Unassigned && ^Surrogate
lower Lowercase_Letter
print [[:graph:]] | [[:space:]]
graph ^White_Space && ^[[:cntrl:]] && ^Unassigned && ^Surrogate
lower Lowercase
print [[:graph:]] | Space_Separator
punct Punctuation | Symbol
space Space_Separator | Line_Separator | Paragraph_Separator |
U+0009 | U+000A | U+000B | U+000C | U+000D | U+0085
upper Uppercase_Letter
space White_Space
upper Uppercase
xdigit U+0030 - U+0039 | U+0041 - U+0046 | U+0061 - U+0066
(0-9, a-f, A-F)
word Letter | Mark | Decimal_Number | Connector_Punctuation
word Alphabetic | Mark | Decimal_Number | Connector_Punctuation

View File

@ -1,7 +1,7 @@
# Oniguruma syntax (operator) configuration
_Documented for Oniguruma 6.9.5 (2020/01/23)_
_Documented for Oniguruma 6.9.10 (2024/12/21)_
----------
@ -38,7 +38,7 @@ follow.
The `options` field describes the default compile options to use if the caller does
not specify any options when invoking `onig_new()`.
The `meta_char_table` field is used exclusively by the ONIG_SYN_OP_VARIABLE_META_CHARACTERS
The `meta_char_table` field is used exclusively by the `ONIG_SYN_OP_VARIABLE_META_CHARACTERS`
option, which allows the various regex metacharacters, like `*` and `?`, to be replaced
with alternates (for example, SQL typically uses `%` instead of `.*` and `_` instead of `?`).
@ -75,7 +75,7 @@ data set by `onig_set_meta_char()` will be ignored.
### 1. ONIG_SYN_OP_DOT_ANYCHAR (enable `.`)
_Set in: Oniguruma, PosixBasic, PosixExtended, Emacs, Grep, GnuRegex, Java, Perl, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, GnuRegex, Grep, Emacs, PosixExtended, PosixBasic_
Enables support for the standard `.` metacharacter, meaning "any one character." You
usually want this flag on unless you have turned on `ONIG_SYN_OP_VARIABLE_META_CHARACTERS`
@ -84,7 +84,7 @@ so that you can use a metacharacter other than `.` instead.
### 2. ONIG_SYN_OP_ASTERISK_ZERO_INF (enable `r*`)
_Set in: Oniguruma, PosixBasic, PosixExtended, Emacs, Grep, GnuRegex, Perl, Java, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, GnuRegex, Grep, Emacs, PosixExtended, PosixBasic_
Enables support for the standard `r*` metacharacter, meaning "zero or more r's."
You usually want this flag set unless you have turned on `ONIG_SYN_OP_VARIABLE_META_CHARACTERS`
@ -103,7 +103,7 @@ behavior.
### 4. ONIG_SYN_OP_PLUS_ONE_INF (enable `r+`)
_Set in: Oniguruma, PosixExtended, Emacs, GnuRegex, Perl, Java, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, GnuRegex, Emacs, PosixExtended_
Enables support for the standard `r+` metacharacter, meaning "one or more r's."
You usually want this flag set unless you have turned on `ONIG_SYN_OP_VARIABLE_META_CHARACTERS`
@ -122,7 +122,7 @@ behavior.
### 6. ONIG_SYN_OP_QMARK_ZERO_ONE (enable `r?`)
_Set in: Oniguruma, PosixExtended, Emacs, GnuRegex, Perl, Java, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, GnuRegex, Emacs, PosixExtended_
Enables support for the standard `r?` metacharacter, meaning "zero or one r" or "an optional r."
You usually want this flag set unless you have turned on `ONIG_SYN_OP_VARIABLE_META_CHARACTERS`
@ -141,7 +141,7 @@ you want `?` to simply match a literal `?` character, but you still want some wa
### 8. ONIG_SYN_OP_BRACE_INTERVAL (enable `r{l,u}`)
_Set in: Oniguruma, PosixExtended, GnuRegex, Perl, Java, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, GnuRegex, PosixExtended_
Enables support for the `r{lower,upper}` range form, common to more advanced
regex engines, which lets you specify precisely a minimum and maximum range on how many r's
@ -158,7 +158,7 @@ this form also allows `r{,upper}` to be equivalent to `r{0,upper}`; otherwise,
### 9. ONIG_SYN_OP_ESC_BRACE_INTERVAL (enable `\{` and `\}`)
_Set in: PosixBasic, Emacs, Grep_
_Set in: Grep, Emacs, PosixBasic_
Enables support for an escaped `r\{lower,upper\}` range form. This is useful if you
have disabled support for the normal `r{...}` range form and want curly braces to simply
@ -168,7 +168,7 @@ match literal curly brace characters, but you still want some way of activating
### 10. ONIG_SYN_OP_VBAR_ALT (enable `r|s`)
_Set in: Oniguruma, PosixExtended, GnuRegex, Perl, Java, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, GnuRegex, PosixExtended_
Enables support for the common `r|s` alternation operator. You usually want this
flag set.
@ -176,7 +176,7 @@ flag set.
### 11. ONIG_SYN_OP_ESC_VBAR_ALT (enable `\|`)
_Set in: Emacs, Grep_
_Set in: Grep, Emacs_
Enables support for an escaped `r\|s` alternation form. This is useful if you
have disabled support for the normal `r|s` alternation form and want `|` to simply
@ -185,7 +185,7 @@ match a literal `|` character, but you still want some way of activating "altern
### 12. ONIG_SYN_OP_LPAREN_SUBEXP (enable `(r)`)
_Set in: Oniguruma, PosixExtended, GnuRegex, Perl, Java, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, GnuRegex, PosixExtended_
Enables support for the common `(...)` grouping-and-capturing operators. You usually
want this flag set.
@ -193,7 +193,7 @@ want this flag set.
### 13. ONIG_SYN_OP_ESC_LPAREN_SUBEXP (enable `\(` and `\)`)
_Set in: PosixBasic, Emacs, Grep_
_Set in: Grep, Emacs, PosixBasic_
Enables support for escaped `\(...\)` grouping-and-capturing operators. This is useful if you
have disabled support for the normal `(...)` grouping-and-capturing operators and want
@ -203,7 +203,7 @@ activating "grouping" or "capturing" behavior.
### 14. ONIG_SYN_OP_ESC_AZ_BUF_ANCHOR (enable `\A` and `\Z` and `\z`)
_Set in: Oniguruma, GnuRegex, Perl, Java, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, GnuRegex_
Enables support for the anchors `\A` (start-of-string), `\Z` (end-of-string or
newline-at-end-of-string), and `\z` (end-of-string) escapes.
@ -214,7 +214,7 @@ option will recognize that metacharacter instead.)
### 15. ONIG_SYN_OP_ESC_CAPITAL_G_BEGIN_ANCHOR (enable `\G`)
_Set in: Oniguruma, GnuRegex, Perl, Java, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, GnuRegex_
Enables support for the special anchor `\G` (start-of-previous-match).
@ -231,7 +231,7 @@ exactly the same as `\A`.
### 16. ONIG_SYN_OP_DECIMAL_BACKREF (enable `\num`)
_Set in: Oniguruma, PosixBasic, PosixExtended, Emacs, Grep, GnuRegex, Perl, Java, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, GnuRegex, Grep, Emacs, PosixExtended, PosixBasic_
Enables support for subsequent matches to back references to prior capture groups `(...)` using
the common `\num` syntax (like `\3`).
@ -244,7 +244,7 @@ You usually want this enabled, and it is enabled by default in every built-in sy
### 17. ONIG_SYN_OP_BRACKET_CC (enable `[...]`)
_Set in: Oniguruma, PosixBasic, PosixExtended, Emacs, Grep, GnuRegex, Perl, Java, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, GnuRegex, Grep, Emacs, PosixExtended, PosixBasic_
Enables support for recognizing character classes, like `[a-z]`. If this flag is not set, `[`
and `]` will be treated as ordinary literal characters instead of as metacharacters.
@ -254,7 +254,7 @@ You usually want this enabled, and it is enabled by default in every built-in sy
### 18. ONIG_SYN_OP_ESC_W_WORD (enable `\w` and `\W`)
_Set in: Oniguruma, Grep, GnuRegex, Perl, Java, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, GnuRegex, Grep_
Enables support for the common `\w` and `\W` shorthand forms. These match "word characters,"
whose meaning varies depending on the encoding being used.
@ -272,7 +272,7 @@ considered "word characters.")
### 19. ONIG_SYN_OP_ESC_LTGT_WORD_BEGIN_END (enable `\<` and `\>`)
_Set in: Grep, GnuRegex_
_Set in: GnuRegex, Grep_
Enables support for the GNU-specific `\<` and `\>` word-boundary metacharacters. These work like
the `\b` word-boundary metacharacter, but only match at one end of the word or the other: `\<`
@ -285,7 +285,7 @@ Most regex syntaxes do _not_ support these metacharacters.
### 20. ONIG_SYN_OP_ESC_B_WORD_BOUND (enable `\b` and `\B`)
_Set in: Oniguruma, Grep, GnuRegex, Perl, Java, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, GnuRegex, Grep_
Enables support for the common `\b` and `\B` word-boundary metacharacters. The `\b` metacharacter
matches a zero-width position at a transition from word-characters to non-word-characters, or vice
@ -297,7 +297,7 @@ are considered "word characters."
### 21. ONIG_SYN_OP_ESC_S_WHITE_SPACE (enable `\s` and `\S`)
_Set in: Oniguruma, GnuRegex, Perl, Java, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, GnuRegex_
Enables support for the common `\s` and `\S` whitespace-matching metacharacters.
@ -319,7 +319,7 @@ Unicode-equivalent code points, and then matching according to Unicode rules.
### 22. ONIG_SYN_OP_ESC_D_DIGIT (enable `\d` and `\D`)
_Set in: Oniguruma, GnuRegex, Perl, Java, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, GnuRegex_
Enables support for the common `\d` and `\D` digit-matching metacharacters.
@ -337,7 +337,7 @@ Unicode-equivalent code points, and then matching according to Unicode rules.
### 23. ONIG_SYN_OP_LINE_ANCHOR (enable `^r` and `r$`)
_Set in: Oniguruma, Emacs, Grep, GnuRegex, Perl, Java, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, GnuRegex, Grep, Emacs, PosixExtended, PosixBasic_
Enables support for the common `^` and `$` line-anchor metacharacters.
@ -352,7 +352,7 @@ and not any other form.)
### 24. ONIG_SYN_OP_POSIX_BRACKET (enable POSIX `[:xxxx:]`)
_Set in: Oniguruma, PosixBasic, PosixExtended, Grep, GnuRegex, Perl, Java, Perl_NG, Ruby_
_Set in: Oniguruma, Ruby, Perl_NG, Perl, GnuRegex, Grep, PosixExtended, PosixBasic_
Enables support for the POSIX `[:xxxx:]` character classes, like `[:alpha:]` and `[:digit:]`.
The supported POSIX character classes are `alnum`, `alpha`, `blank`, `cntrl`, `digit`,
@ -361,7 +361,7 @@ The supported POSIX character classes are `alnum`, `alpha`, `blank`, `cntrl`, `d
### 25. ONIG_SYN_OP_QMARK_NON_GREEDY (enable `r??`, `r*?`, `r+?`, and `r{n,m}?`)
_Set in: Oniguruma, Perl, Java, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java_
Enables support for lazy (non-greedy) quantifiers: That is, if you append a `?` after
another quantifier such as `?`, `*`, `+`, or `{n,m}`, Oniguruma will try to match
@ -370,17 +370,17 @@ as _little_ as possible instead of as _much_ as possible.
### 26. ONIG_SYN_OP_ESC_CONTROL_CHARS (enable `\n`, `\r`, `\t`, etc.)
_Set in: Oniguruma, PosixBasic, PosixExtended, Java, Perl, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, Emacs, PosixExtended, PosixBasic_
Enables support for C-style control-code escapes, like `\n` and `\r`. Specifically,
this recognizes `\a` (7), `\b` (8), `\t` (9), `\n` (10), `\f` (12), `\r` (13), and
`\e` (27). If ONIG_SYN_OP2_ESC_V_VTAB is enabled (see below), this also enables
`\e` (27). If `ONIG_SYN_OP2_ESC_V_VTAB` is enabled (see below), this also enables
support for recognizing `\v` as code point 11.
### 27. ONIG_SYN_OP_ESC_C_CONTROL (enable `\cx` control codes)
_Set in: Oniguruma, Java, Perl, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java_
Enables support for named control-code escapes, like `\cm` or `\cM` for code-point
13. In this shorthand form, control codes may be specified by `\c` (for "Control")
@ -390,7 +390,7 @@ followed by an alphabetic letter, a-z or A-Z, indicating which code point to rep
### 28. ONIG_SYN_OP_ESC_OCTAL3 (enable `\OOO` octal codes)
_Set in: Oniguruma, Java, Perl, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java_
Enables support for octal-style escapes of up to three digits, like `\1` for code
point 1, and `\177` for code point 127. Octal values greater than 255 will result
@ -399,7 +399,7 @@ in an error message.
### 29. ONIG_SYN_OP_ESC_X_HEX2 (enable `\xHH` hex codes)
_Set in: Oniguruma, Java, Perl, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java_
Enables support for hexadecimal-style escapes of up to two digits, like `\x1` for code
point 1, and `\x7F` for code point 127.
@ -407,7 +407,7 @@ point 1, and `\x7F` for code point 127.
### 30. ONIG_SYN_OP_ESC_X_BRACE_HEX8 (enable `\x{7HHHHHHH}` hex codes)
_Set in: Oniguruma, Perl, Perl_NG, Ruby_
_Set in: Oniguruma, Ruby, Perl_NG, Perl_
Enables support for brace-wrapped hexadecimal-style escapes of up to eight digits,
like `\x{1}` for code point 1, and `\x{FFFE}` for code point 65534.
@ -415,7 +415,7 @@ like `\x{1}` for code point 1, and `\x{FFFE}` for code point 65534.
### 31. ONIG_SYN_OP_ESC_O_BRACE_OCTAL (enable `\o{1OOOOOOOOOO}` octal codes)
_Set in: Oniguruma, Perl, Perl_NG, Ruby_
_Set in: Oniguruma, Ruby, Perl_NG, Perl_
Enables support for brace-wrapped octal-style escapes of up to eleven digits,
like `\o{1}` for code point 1, and `\o{177776}` for code point 65534.
@ -434,7 +434,7 @@ This group contains support for lesser-known regex syntax constructs.
### 0. ONIG_SYN_OP2_ESC_CAPITAL_Q_QUOTE (enable `\Q...\E`)
_Set in: Java, Perl, Perl_NG_
_Set in: Perl_NG, Perl, Java_
Enables support for "quoted" parts of a pattern: Between `\Q` and `\E`, all
syntax parsing is turned off, so that metacharacters like `*` and `+` will no
@ -444,7 +444,7 @@ longer be treated as metacharacters, and instead will be matched as literal
### 1. ONIG_SYN_OP2_QMARK_GROUP_EFFECT (enable `(?...)`)
_Set in: Oniguruma, Java, Perl, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, Emacs_
Enables support for the fairly-common `(?...)` grouping operator, which
controls precedence but which does _not_ capture its contents.
@ -452,7 +452,7 @@ controls precedence but which does _not_ capture its contents.
### 2. ONIG_SYN_OP2_OPTION_PERL (enable options `(?imsx)` and `(?-imsx)`)
_Set in: Java, Perl, Perl_NG_
_Set in: Python, Perl_NG, Perl, Java_
Enables support of regex options. (i,m,s,x)
The supported toggle-able options for this flag are:
@ -465,7 +465,7 @@ The supported toggle-able options for this flag are:
### 3. ONIG_SYN_OP2_OPTION_RUBY (enable options `(?imx)` and `(?-imx)`)
_Set in: Oniguruma, Ruby_
_Set in: Ruby_
Enables support of regex options. (i,m,x)
The supported toggle-able options for this flag are:
@ -477,7 +477,7 @@ The supported toggle-able options for this flag are:
### 4. ONIG_SYN_OP2_PLUS_POSSESSIVE_REPEAT (enable `r?+`, `r*+`, and `r++`)
_Set in: Oniguruma, Ruby_
_Set in: Oniguruma, Ruby, Perl_NG, Perl, Java_
Enables support for the _possessive_ quantifiers `?+`, `*+`, and `++`, which
work similarly to `?` and `*` and `+`, respectively, but which do not backtrack
@ -488,7 +488,7 @@ extent if subsequent parts of the pattern fail to match.
### 5. ONIG_SYN_OP2_PLUS_POSSESSIVE_INTERVAL (enable `r{n,m}+`)
_Set in: Java_
_Set in: Perl_NG, Perl, Java_
Enables support for the _possessive_ quantifier `{n,m}+`, which
works similarly to `{n,m}`, but which does not backtrack
@ -499,7 +499,7 @@ extent if subsequent parts of the pattern fail to match.
### 6. ONIG_SYN_OP2_CCLASS_SET_OP (enable `&&` within `[...]`)
_Set in: Oniguruma, Java, Ruby_
_Set in: Oniguruma, Ruby, Java_
Enables support for character-class _intersection_. For example, with this
feature enabled, you can write `[a-z&&[^aeiou]]` to produce a character class
@ -509,7 +509,7 @@ all control codes _except_ newlines.
### 7. ONIG_SYN_OP2_QMARK_LT_NAMED_GROUP (enable named captures `(?<name>...)`)
_Set in: Oniguruma, Perl_NG, Ruby_
_Set in: Oniguruma, Ruby, Perl_NG_
Enables support for _naming_ capture groups, so that instead of having to
refer to captures by position (like `\3` or `$3`), you can refer to them by names
@ -519,7 +519,7 @@ and `(?'name'...)`, but not the Python `(?P<name>...)` syntax.
### 8. ONIG_SYN_OP2_ESC_K_NAMED_BACKREF (enable named backreferences `\k<name>`)
_Set in: Oniguruma, Perl_NG, Ruby_
_Set in: Oniguruma, Ruby, Perl_NG_
Enables support for substituted backreferences by name, not just by position.
This supports using `\k'name'` in addition to supporting `\k<name>`. This also
@ -530,7 +530,7 @@ the match, if the capture matched multiple times, by writing `\k<name+n>` or
### 9. ONIG_SYN_OP2_ESC_G_SUBEXP_CALL (enable backreferences `\g<name>` and `\g<n>`)
_Set in: Oniguruma, Perl_NG, Ruby_
_Set in: Oniguruma, Ruby, Perl_NG_
Enables support for substituted backreferences by both name and position using
the same syntax. This supports using `\g'name'` and `\g'1'` in addition to
@ -562,7 +562,7 @@ followed by a single character (or equivalent), indicating which code point to r
based on that character's lowest five bits. So, like `\c`, you can represent code-point
10 with `\C-j`, but you can also represent it with `\C-*` as well.
See also ONIG_SYN_OP_ESC_C_CONTROL, which enables the more-common `\cx` syntax.
See also `ONIG_SYN_OP_ESC_C_CONTROL`, which enables the more-common `\cx` syntax.
### 12. ONIG_SYN_OP2_ESC_CAPITAL_M_BAR_META (enable `\M-x`)
@ -577,7 +577,7 @@ with `0x80`). So, for example, you can match `\x81` using `\x81`, or you can wr
### 13. ONIG_SYN_OP2_ESC_V_VTAB (enable `\v` as vertical tab)
_Set in: Oniguruma, Java, Ruby_
_Set in: Oniguruma, Python, Ruby, Java_
Enables support for a C-style `\v` escape code, meaning "vertical tab." If enabled,
`\v` will be equivalent to ASCII code point 11.
@ -585,7 +585,7 @@ Enables support for a C-style `\v` escape code, meaning "vertical tab." If enab
### 14. ONIG_SYN_OP2_ESC_U_HEX4 (enable `\uHHHH` for Unicode)
_Set in: Oniguruma, Java, Ruby_
_Set in: Oniguruma, Python, Ruby, Java_
Enables support for a Java-style `\uHHHH` escape code for representing Unicode
code-points by number, using up to four hexadecimal digits (up to `\uFFFF`). So,
@ -593,8 +593,8 @@ for example, `\u221E` will match an infinity symbol, `∞`.
For code points larger than four digits, like the emoji `🚡` (aerial tramway, or code
point U+1F6A1), you must either represent the character directly using an encoding like
UTF-8, or you must enable support for ONIG_SYN_OP_ESC_X_BRACE_HEX8 or
ONIG_SYN_OP_ESC_O_BRACE_OCTAL, which support more than four digits.
UTF-8, or you must enable support for `ONIG_SYN_OP_ESC_X_BRACE_HEX8` or
`ONIG_SYN_OP_ESC_O_BRACE_OCTAL`, which support more than four digits.
(New feature as of Oniguruma 6.7.)
@ -604,29 +604,29 @@ ONIG_SYN_OP_ESC_O_BRACE_OCTAL, which support more than four digits.
_Set in: Emacs_
This flag makes the ``\` `` and `\'` escapes function identically to
`\A` and `\z`, respectively (when ONIG_SYN_OP_ESC_AZ_BUF_ANCHOR is enabled).
`\A` and `\z`, respectively (when `ONIG_SYN_OP_ESC_AZ_BUF_ANCHOR` is enabled).
These anchor forms are very obscure, and rarely supported by other regex libraries.
### 16. ONIG_SYN_OP2_ESC_P_BRACE_CHAR_PROPERTY (enable `\p{...}` and `\P{...}`)
_Set in: Oniguruma, Java, Perl, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java_
Enables support for an alternate syntax for POSIX character classes; instead of
writing `[:alpha:]` when this is enabled, you can instead write `\p{alpha}`.
See also ONIG_SYN_OP_POSIX_BRACKET for the classic POSIX form.
See also `ONIG_SYN_OP_POSIX_BRACKET` for the classic POSIX form.
### 17. ONIG_SYN_OP2_ESC_P_BRACE_CIRCUMFLEX_NOT (enable `\p{^...}` and `\P{^...}`)
_Set in: Oniguruma, Perl, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl_
Enables support for an alternate syntax for POSIX character classes; instead of
writing `[:^alpha:]` when this is enabled, you can instead write `\p{^alpha}`.
See also ONIG_SYN_OP_POSIX_BRACKET for the classic POSIX form.
See also `ONIG_SYN_OP_POSIX_BRACKET` for the classic POSIX form.
### 18. ONIG_SYN_OP2_CHAR_PROPERTY_PREFIX_IS
@ -647,7 +647,7 @@ characters in `[0-9a-fA-F]`.
### 20. ONIG_SYN_OP2_INEFFECTIVE_ESCAPE (disable `\`)
_Set in: As-is_
_Set in: ASIS_
If set, this disables all escape codes, shorthands, and metacharacters that start
with `\` (or whatever the configured escape character is), allowing `\` to be treated
@ -658,7 +658,7 @@ You usually do not want this flag to be enabled.
### 21. ONIG_SYN_OP2_QMARK_LPAREN_IF_ELSE (enable `(?(...)then|else)`)
_Set in: Oniguruma, Perl, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl_
Enables support for conditional inclusion of subsequent regex patterns based on whether
a prior named or numbered capture matched, or based on whether a pattern will
@ -676,7 +676,7 @@ match. This supports many different forms, including:
### 22. ONIG_SYN_OP2_ESC_CAPITAL_K_KEEP (enable `\K`)
_Set in: Oniguruma, Perl, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl_
Enables support for `\K`, which excludes all content before it from the overall
regex match (i.e., capture #0). So, for example, pattern `foo\Kbar` would match
@ -687,7 +687,7 @@ regex match (i.e., capture #0). So, for example, pattern `foo\Kbar` would match
### 23. ONIG_SYN_OP2_ESC_CAPITAL_R_GENERAL_NEWLINE (enable `\R`)
_Set in: Oniguruma, Perl, Perl_NG, Ruby_
_Set in: Oniguruma, Ruby, Perl_NG, Perl_
Enables support for `\R`, the "general newline" shorthand, which matches
`(\r\n|[\n\v\f\r\u0085\u2028\u2029])` (obviously, the Unicode values are cannot be
@ -698,7 +698,7 @@ matched in ASCII encodings).
### 24. ONIG_SYN_OP2_ESC_CAPITAL_N_O_SUPER_DOT (enable `\N` and `\O`)
_Set in: Oniguruma, Perl, Perl_NG_
_Set in: Oniguruma, Perl_NG, Perl_
Enables support for `\N` and `\O`. `\N` is "not a line break," which is much
like the standard `.` metacharacter, except that while `.` can be affected by
@ -713,7 +713,7 @@ multi-line mode are enabled or disabled.
### 25. ONIG_SYN_OP2_QMARK_TILDE_ABSENT_GROUP (enable `(?~...)`)
_Set in: Oniguruma, Ruby_
_Set in: Oniguruma, Ruby, Perl_NG, Perl_
Enables support for the `(?~r)` "absent operator" syntax, which matches
as much as possible as long as the result _doesn't_ match pattern `r`. This is
@ -731,7 +731,7 @@ excellent article about it is [available on Medium](https://medium.com/rubyinsid
### 26. ONIG_SYN_OP2_ESC_X_Y_TEXT_SEGMENT (enable `\X` and `\Y` and `\y`)
_Set in: Oniguruma, Perl, Perl_NG, Ruby_
_Set in: Oniguruma, Ruby, Perl_NG, Perl_
`\X` is another variation on `.`, designed to support Unicode, in that it matches
a full _grapheme cluster_. In Unicode, `à` can be encoded as one code point,
@ -764,7 +764,7 @@ backreferences.
### 28. ONIG_SYN_OP2_QMARK_BRACE_CALLOUT_CONTENTS (enable `(?{...})`)
_Set in: Oniguruma, Perl, Perl_NG_
_Set in: Oniguruma, Perl_NG, Perl_
Enables support for Perl-style "callouts" — pattern substitutions that result from
invoking a callback method. When `(?{foo})` is reached in a pattern, the callback
@ -779,7 +779,7 @@ Full documentation for this advanced feature can be found in the Oniguruma
### 29. ONIG_SYN_OP2_ASTERISK_CALLOUT_NAME (enable `(*name)`)
_Set in: Oniguruma, Perl, Perl_NG_
_Set in: Oniguruma, Python, Perl_NG, Perl_
Enables support for Perl-style "callouts" — pattern substitutions that result from
invoking a callback method. When `(*foo)` is reached in a pattern, the callback
@ -809,6 +809,13 @@ Enables support of regex options. (i,m,x,W,S,D,P,y)
- `S` - ASCII only space.
- `P` - ASCII only POSIX properties. (includes W,D,S)
### 31. ONIG_SYN_OP2_QMARK_CAPITAL_P_NAME (enable `(?P<name>...)` and `(?P=name)`)
_Set in: Python_
(New feature as of Oniguruma 6.9.7)
----------
@ -820,19 +827,19 @@ some syntaxes but not in others.
### 0. ONIG_SYN_CONTEXT_INDEP_REPEAT_OPS (independent `?`, `*`, `+`, `{n,m}`)
_Set in: Oniguruma, PosixExtended, GnuRegex, Java, Perl, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, GnuRegex, PosixExtended_
This flag specifies how to handle operators like `?` and `*` when they aren't
directly attached to an operand, as in `^*` or `(*)`: Are they an error, are
they discarded, or are they taken as literals? If this flag is clear, they
are taken as literals; otherwise, the ONIG_SYN_CONTEXT_INVALID_REPEAT_OPS flag
are taken as literals; otherwise, the `ONIG_SYN_CONTEXT_INVALID_REPEAT_OPS` flag
determines if they are errors or if they are discarded.
### 1. ONIG_SYN_CONTEXT_INVALID_REPEAT_OPS (error or ignore independent operators)
_Set in: Oniguruma, PosixExtended, GnuRegex, Java, Perl, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, GnuRegex, PosixExtended_
If ONIG_SYN_CONTEXT_INDEP_REPEAT_OPS is set, this flag controls what happens when
If `ONIG_SYN_CONTEXT_INDEP_REPEAT_OPS` is set, this flag controls what happens when
independent operators appear in a pattern: If this flag is set, then independent
operators produce an error message; if this flag is clear, then independent
operators are silently discarded.
@ -847,7 +854,7 @@ character will produce an error message.
### 3. ONIG_SYN_ALLOW_INVALID_INTERVAL (allow `{???`)
_Set in: Oniguruma, GnuRegex, Java, Perl, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, GnuRegex_
This flag, if set, causes an invalid range, like `foo{bar}` or `foo{}`, to be
silently discarded, as if `foo` had been written instead. If clear, an invalid
@ -855,13 +862,13 @@ range will produce an error message.
### 4. ONIG_SYN_ALLOW_INTERVAL_LOW_ABBREV (allow `{,n}` to mean `{0,n}`)
_Set in: Oniguruma, Ruby_
_Set in: Oniguruma, Python, Ruby_
If this flag is set, then `r{,n}` will be treated as equivalent to writing
`{0,n}`. If this flag is clear, then `r{,n}` will produce an error message.
Note that regardless of whether this flag is set or clear, if
ONIG_SYN_OP_BRACE_INTERVAL is enabled, then `r{n,}` will always be legal: This
`ONIG_SYN_OP_BRACE_INTERVAL` is enabled, then `r{n,}` will always be legal: This
flag *only* controls the behavior of the opposite form, `r{,n}`.
### 5. ONIG_SYN_STRICT_CHECK_BACKREF (error on invalid backrefs)
@ -876,7 +883,7 @@ No built-in syntax has this flag enabled.
### 6. ONIG_SYN_DIFFERENT_LEN_ALT_LOOK_BEHIND (allow `(?<=a|bc)`)
_Set in: Oniguruma, Java, Ruby_
_Set in: Oniguruma, Ruby, Java_
If this flag is set, lookbehind patterns with alternate options may have differing
lengths among those options. If this flag is clear, lookbehind patterns with options
@ -888,15 +895,15 @@ depend on this rule.
### 7. ONIG_SYN_CAPTURE_ONLY_NAMED_GROUP (prefer `\k<name>` over `\3`)
_Set in: Oniguruma, Perl_NG, Ruby_
_Set in: Oniguruma, Ruby, Perl_NG_
If this flag is set on the syntax *and* ONIG_OPTION_CAPTURE_GROUP is set when calling
If this flag is set on the syntax *and* `ONIG_OPTION_CAPTURE_GROUP` is set when calling
Oniguruma, then if a name is used on any capture, all captures must also use names: A
single use of a named capture prohibits the use of numbered captures.
### 8. ONIG_SYN_ALLOW_MULTIPLEX_DEFINITION_NAME (allow `(?<x>)...(?<x>)`)
_Set in: Oniguruma, Perl_NG, Ruby_
_Set in: Oniguruma, Ruby, Perl_NG_
If this flag is set, multiple capture groups may use the same name. If this flag is
clear, then reuse of a name will produce an error message.
@ -912,10 +919,10 @@ then `r{n}?` will mean the same as `r{n}`, and the useless `?` will be discarded
### 10. ONIG_SYN_ISOLATED_OPTION_CONTINUE_BRANCH (`..(?i)..`)
_Set in: Perl, Perl_NG, Java_
_Set in: Python, Perl_NG, Perl, Java_
If this flag is set, then an isolated option doesn't break the branch and affects until the end of the group (or end of the pattern).
If this flag is not set, then an isolated option is interpreted as the starting point of a new branch. /a(?i)b|c/ ==> /a(?i:b|c)/
If this flag is not set, then an isolated option is interpreted as the starting point of a new branch. `/a(?i)b|c/` ==> `/a(?i:b|c)/`
### 11. ONIG_SYN_VARIABLE_LEN_LOOK_BEHIND (`(?<=...a+...)`)
@ -923,6 +930,24 @@ _Set in: Oniguruma, Java_
If this flag is set, then a variable length expressions are allowed in look-behind.
### 12. ONIG_SYN_PYTHON (enable `\UHHHHHHHH` for Unicode)
_Set in: Python_
(New feature as of Oniguruma 6.9.7)
### 13. ONIG_SYN_WHOLE_OPTIONS (enable options `(?CLI)`)
_Set in: Oniguruma_
(New feature as of Oniguruma 6.9.8)
### 14. ONIG_SYN_BRE_ANCHOR_AT_EDGE_OF_SUBEXP (enable `\(^abc$\)`)
_Set in: Grep, PosixBasic_
(New feature as of Oniguruma 6.9.9)
### 20. ONIG_SYN_NOT_NEWLINE_IN_NEGATIVE_CC (add `\n` to `[^...]`)
_Set in: Grep_
@ -934,7 +959,7 @@ only exclude those characters and ranges written in them.
### 21. ONIG_SYN_BACKSLASH_ESCAPE_IN_CC (allow `[...\w...]`)
_Set in: Oniguruma, GnuRegex, Java, Perl, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, GnuRegex_
If this flag is set, shorthands like `\w` are allowed to describe characters in character
classes. If this flag is clear, shorthands like `\w` are treated as a redundantly-escaped
@ -942,7 +967,7 @@ literal `w`.
### 22. ONIG_SYN_ALLOW_EMPTY_RANGE_IN_CC (silently discard `[z-a]`)
_Set in: Emacs, Grep_
_Set in: Grep, Emacs_
If this flag is set, then character ranges like `[z-a]` that are broken or contain no
characters will be silently ignored. If this flag is clear, then broken or empty
@ -950,7 +975,7 @@ character ranges will produce an error message.
### 23. ONIG_SYN_ALLOW_DOUBLE_RANGE_OP_IN_CC (treat `[0-9-a]` as `[0-9\-a]`)
_Set in: Oniguruma, PosixExtended, GnuRegex, Java, Perl, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, GnuRegex, PosixExtended_
If this flag is set, then a trailing `-` after a character range will be taken as a
literal `-`, as if it had been escaped as `\-`. If this flag is clear, then a trailing
@ -973,15 +998,21 @@ _Set in: Oniguruma, Ruby_
If this flag is set, Oniguruma will warn about nested repeat operators those have no meaning, like `(?:a*)+`.
If this flag is clear, Oniguruma will allow the nested repeat operators without warning about them.
### 26. ONIG_SYN_ALLOW_INVALID_CODE_END_OF_RANGE_IN_CC (allow [a-\x{7fffffff}])
### 26. ONIG_SYN_ALLOW_INVALID_CODE_END_OF_RANGE_IN_CC (allow `[a-\x{7fffffff}]`)
_Set in: Oniguruma_
If this flag is set, then invalid code points at the end of range in character class are allowed.
### 27. ONIG_SYN_ALLOW_CHAR_TYPE_FOLLOWED_BY_MINUS_IN_CC (allow `[\w-%]` to mean `[\w\-%]`)
_Set in: Perl_NG, Perl, Java_
(New feature as of Oniguruma 6.9.10)
### 31. ONIG_SYN_CONTEXT_INDEP_ANCHORS
_Set in: Oniguruma, PosixExtended, GnuRegex, Java, Perl, Perl_NG, Ruby_
_Set in: Oniguruma, Python, Ruby, Perl_NG, Perl, Java, GnuRegex, PosixExtended_
Not currently used, and does nothing. (But still set in several syntaxes for some
reason.)
@ -994,98 +1025,102 @@ These tables show which of the built-in syntaxes use which flags and options, fo
### Group One Flags (op)
| ID | Option | PosB | PosEx | Emacs | Grep | Gnu | Java | Perl | PeNG | Ruby | Onig |
| ----- | --------------------------------------------- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
| 0 | `ONIG_SYN_OP_VARIABLE_META_CHARACTERS` | - | - | - | - | - | - | - | - | - | - |
| 1 | `ONIG_SYN_OP_DOT_ANYCHAR` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| 2 | `ONIG_SYN_OP_ASTERISK_ZERO_INF` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| 3 | `ONIG_SYN_OP_ESC_ASTERISK_ZERO_INF` | - | - | - | - | - | - | - | - | - | - |
| 4 | `ONIG_SYN_OP_PLUS_ONE_INF` | - | Yes | Yes | - | Yes | Yes | Yes | Yes | Yes | Yes |
| 5 | `ONIG_SYN_OP_ESC_PLUS_ONE_INF` | - | - | - | Yes | - | - | - | - | - | - |
| 6 | `ONIG_SYN_OP_QMARK_ZERO_ONE` | - | Yes | Yes | - | Yes | Yes | Yes | Yes | Yes | Yes |
| 7 | `ONIG_SYN_OP_ESC_QMARK_ZERO_ONE` | - | - | - | Yes | - | - | - | - | - | - |
| 8 | `ONIG_SYN_OP_BRACE_INTERVAL` | - | Yes | - | - | Yes | Yes | Yes | Yes | Yes | Yes |
| 9 | `ONIG_SYN_OP_ESC_BRACE_INTERVAL` | Yes | - | Yes | Yes | - | - | - | - | - | - |
| 10 | `ONIG_SYN_OP_VBAR_ALT` | - | Yes | - | - | Yes | Yes | Yes | Yes | Yes | Yes |
| 11 | `ONIG_SYN_OP_ESC_VBAR_ALT` | - | - | Yes | Yes | - | - | - | - | - | - |
| 12 | `ONIG_SYN_OP_LPAREN_SUBEXP` | - | Yes | - | - | Yes | Yes | Yes | Yes | Yes | Yes |
| 13 | `ONIG_SYN_OP_ESC_LPAREN_SUBEXP` | Yes | - | Yes | Yes | - | - | - | - | - | - |
| 14 | `ONIG_SYN_OP_ESC_AZ_BUF_ANCHOR` | - | - | - | - | Yes | Yes | Yes | Yes | Yes | Yes |
| 15 | `ONIG_SYN_OP_ESC_CAPITAL_G_BEGIN_ANCHOR` | - | - | - | - | Yes | Yes | Yes | Yes | Yes | Yes |
| 16 | `ONIG_SYN_OP_DECIMAL_BACKREF` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| 17 | `ONIG_SYN_OP_BRACKET_CC` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| 18 | `ONIG_SYN_OP_ESC_W_WORD` | - | - | - | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| 19 | `ONIG_SYN_OP_ESC_LTGT_WORD_BEGIN_END` | - | - | - | Yes | Yes | - | - | - | - | - |
| 20 | `ONIG_SYN_OP_ESC_B_WORD_BOUND` | - | - | - | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| 21 | `ONIG_SYN_OP_ESC_S_WHITE_SPACE` | - | - | - | - | Yes | Yes | Yes | Yes | Yes | Yes |
| 22 | `ONIG_SYN_OP_ESC_D_DIGIT` | - | - | - | - | Yes | Yes | Yes | Yes | Yes | Yes |
| 23 | `ONIG_SYN_OP_LINE_ANCHOR` | - | - | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| 24 | `ONIG_SYN_OP_POSIX_BRACKET` | Yes | Yes | Yes | - | Yes | Yes | Yes | Yes | Yes | Yes |
| 25 | `ONIG_SYN_OP_QMARK_NON_GREEDY` | - | - | - | - | - | Yes | Yes | Yes | Yes | Yes |
| 26 | `ONIG_SYN_OP_ESC_CONTROL_CHARS` | Yes | Yes | - | - | - | Yes | Yes | Yes | Yes | Yes |
| 27 | `ONIG_SYN_OP_ESC_C_CONTROL` | - | - | - | - | - | Yes | Yes | Yes | Yes | Yes |
| 28 | `ONIG_SYN_OP_ESC_OCTAL3` | - | - | - | - | - | Yes | Yes | Yes | Yes | Yes |
| 29 | `ONIG_SYN_OP_ESC_X_HEX2` | - | - | - | - | - | Yes | Yes | Yes | Yes | Yes |
| 30 | `ONIG_SYN_OP_ESC_X_BRACE_HEX8` | - | - | - | - | - | - | Yes | Yes | Yes | Yes |
| 31 | `ONIG_SYN_OP_ESC_O_BRACE_OCTAL` | - | - | - | - | - | - | Yes | Yes | Yes | Yes |
| ID | Option | Onig | Pythn | Ruby | PeNG | Perl | Java | Gnu | Grep | Emacs | PosEx | PosB | ASIS |
| ----- | ------------------------------------------ | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
| 0 | `ONIG_SYN_OP_VARIABLE_META_CHARACTERS` | - | - | - | - | - | - | - | - | - | - | - | - |
| 1 | `ONIG_SYN_OP_DOT_ANYCHAR` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - |
| 2 | `ONIG_SYN_OP_ASTERISK_ZERO_INF` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - |
| 3 | `ONIG_SYN_OP_ESC_ASTERISK_ZERO_INF` | - | - | - | - | - | - | - | - | - | - | - | - |
| 4 | `ONIG_SYN_OP_PLUS_ONE_INF` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - | Yes | Yes | - | - |
| 5 | `ONIG_SYN_OP_ESC_PLUS_ONE_INF` | - | - | - | - | - | - | - | Yes | - | - | - | - |
| 6 | `ONIG_SYN_OP_QMARK_ZERO_ONE` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - | Yes | Yes | - | - |
| 7 | `ONIG_SYN_OP_ESC_QMARK_ZERO_ONE` | - | - | - | - | - | - | - | Yes | - | - | - | - |
| 8 | `ONIG_SYN_OP_BRACE_INTERVAL` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - | - | Yes | - | - |
| 9 | `ONIG_SYN_OP_ESC_BRACE_INTERVAL` | - | - | - | - | - | - | - | Yes | Yes | - | Yes | - |
| 10 | `ONIG_SYN_OP_VBAR_ALT` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - | - | Yes | - | - |
| 11 | `ONIG_SYN_OP_ESC_VBAR_ALT` | - | - | - | - | - | - | - | Yes | Yes | - | - | - |
| 12 | `ONIG_SYN_OP_LPAREN_SUBEXP` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - | - | Yes | - | - |
| 13 | `ONIG_SYN_OP_ESC_LPAREN_SUBEXP` | - | - | - | - | - | - | - | Yes | Yes | - | Yes | - |
| 14 | `ONIG_SYN_OP_ESC_AZ_BUF_ANCHOR` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - | - | - | - | - |
| 15 | `ONIG_SYN_OP_ESC_CAPITAL_G_BEGIN_ANCHOR` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - | - | - | - | - |
| 16 | `ONIG_SYN_OP_DECIMAL_BACKREF` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - |
| 17 | `ONIG_SYN_OP_BRACKET_CC` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - |
| 18 | `ONIG_SYN_OP_ESC_W_WORD` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - | - | - | - |
| 19 | `ONIG_SYN_OP_ESC_LTGT_WORD_BEGIN_END` | - | - | - | - | - | - | Yes | Yes | - | - | - | - |
| 20 | `ONIG_SYN_OP_ESC_B_WORD_BOUND` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - | - | - | - |
| 21 | `ONIG_SYN_OP_ESC_S_WHITE_SPACE` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - | - | - | - | - |
| 22 | `ONIG_SYN_OP_ESC_D_DIGIT` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - | - | - | - | - |
| 23 | `ONIG_SYN_OP_LINE_ANCHOR` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - |
| 24 | `ONIG_SYN_OP_POSIX_BRACKET` | Yes | - | Yes | Yes | Yes | - | Yes | Yes | - | Yes | Yes | - |
| 25 | `ONIG_SYN_OP_QMARK_NON_GREEDY` | Yes | Yes | Yes | Yes | Yes | Yes | - | - | - | - | - | - |
| 26 | `ONIG_SYN_OP_ESC_CONTROL_CHARS` | Yes | Yes | Yes | Yes | Yes | Yes | - | - | Yes | Yes | Yes | - |
| 27 | `ONIG_SYN_OP_ESC_C_CONTROL` | Yes | Yes | Yes | Yes | Yes | Yes | - | - | - | - | - | - |
| 28 | `ONIG_SYN_OP_ESC_OCTAL3` | Yes | Yes | Yes | Yes | Yes | Yes | - | - | - | - | - | - |
| 29 | `ONIG_SYN_OP_ESC_X_HEX2` | Yes | Yes | Yes | Yes | Yes | Yes | - | - | - | - | - | - |
| 30 | `ONIG_SYN_OP_ESC_X_BRACE_HEX8` | Yes | - | Yes | Yes | Yes | - | - | - | - | - | - | - |
| 31 | `ONIG_SYN_OP_ESC_O_BRACE_OCTAL` | Yes | - | Yes | Yes | Yes | - | - | - | - | - | - | - |
### Group Two Flags (op2)
| ID | Option | PosB | PosEx | Emacs | Grep | Gnu | Java | Perl | PeNG | Ruby | Onig |
| ----- | --------------------------------------------- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
| 0 | `ONIG_SYN_OP2_ESC_CAPITAL_Q_QUOTE` | - | - | - | - | - | Yes | Yes | Yes | - | - |
| 1 | `ONIG_SYN_OP2_QMARK_GROUP_EFFECT` | - | - | - | - | - | Yes | Yes | Yes | Yes | Yes |
| 2 | `ONIG_SYN_OP2_OPTION_PERL` | - | - | - | - | - | Yes | Yes | Yes | - | - |
| 3 | `ONIG_SYN_OP2_OPTION_RUBY` | - | - | - | - | - | - | - | - | Yes | - |
| 4 | `ONIG_SYN_OP2_PLUS_POSSESSIVE_REPEAT` | - | - | - | - | - | - | - | - | Yes | Yes |
| 5 | `ONIG_SYN_OP2_PLUS_POSSESSIVE_INTERVAL` | - | - | - | - | - | Yes | - | - | - | - |
| 6 | `ONIG_SYN_OP2_CCLASS_SET_OP` | - | - | - | - | - | - | - | Yes | Yes | Yes |
| 7 | `ONIG_SYN_OP2_QMARK_LT_NAMED_GROUP` | - | - | - | - | - | - | - | Yes | Yes | Yes |
| 8 | `ONIG_SYN_OP2_ESC_K_NAMED_BACKREF` | - | - | - | - | - | - | - | Yes | Yes | Yes |
| 9 | `ONIG_SYN_OP2_ESC_G_SUBEXP_CALL` | - | - | - | - | - | - | - | Yes | Yes | Yes |
| 10 | `ONIG_SYN_OP2_ATMARK_CAPTURE_HISTORY` | - | - | - | - | - | - | - | - | - | - |
| 11 | `ONIG_SYN_OP2_ESC_CAPITAL_C_BAR_CONTROL` | - | - | - | - | - | - | - | - | Yes | Yes |
| 12 | `ONIG_SYN_OP2_ESC_CAPITAL_M_BAR_META` | - | - | - | - | - | - | - | - | Yes | Yes |
| 13 | `ONIG_SYN_OP2_ESC_V_VTAB` | - | - | - | - | - | Yes | - | - | Yes | Yes |
| 14 | `ONIG_SYN_OP2_ESC_U_HEX4` | - | - | - | - | - | Yes | - | - | Yes | Yes |
| 15 | `ONIG_SYN_OP2_ESC_GNU_BUF_ANCHOR` | - | - | Yes | - | - | - | - | - | - | - |
| 16 | `ONIG_SYN_OP2_ESC_P_BRACE_CHAR_PROPERTY` | - | - | - | - | - | Yes | Yes | Yes | Yes | Yes |
| 17 | `ONIG_SYN_OP2_ESC_P_BRACE_CIRCUMFLEX_NOT` | - | - | - | - | - | - | Yes | Yes | Yes | Yes |
| 18 | `ONIG_SYN_OP2_CHAR_PROPERTY_PREFIX_IS` | - | - | - | - | - | - | - | - | - | - |
| 19 | `ONIG_SYN_OP2_ESC_H_XDIGIT` | - | - | - | - | - | - | - | - | Yes | Yes |
| 20 | `ONIG_SYN_OP2_INEFFECTIVE_ESCAPE` | - | - | - | - | - | - | - | - | - | - |
| 21 | `ONIG_SYN_OP2_QMARK_LPAREN_IF_ELSE` | - | - | - | - | - | - | Yes | Yes | Yes | Yes |
| 22 | `ONIG_SYN_OP2_ESC_CAPITAL_K_KEEP` | - | - | - | - | - | - | Yes | Yes | Yes | Yes |
| 23 | `ONIG_SYN_OP2_ESC_CAPITAL_R_GENERAL_NEWLINE` | - | - | - | - | - | - | Yes | Yes | Yes | Yes |
| 24 | `ONIG_SYN_OP2_ESC_CAPITAL_N_O_SUPER_DOT` | - | - | - | - | - | - | Yes | Yes | - | Yes |
| 25 | `ONIG_SYN_OP2_QMARK_TILDE_ABSENT_GROUP` | - | - | - | - | - | - | - | - | Yes | Yes |
| 26 | `ONIG_SYN_OP2_ESC_X_Y_TEXT_SEGMENT` | - | - | - | - | - | - | Yes | Yes | Yes | Yes |
| 27 | `ONIG_SYN_OP2_QMARK_PERL_SUBEXP_CALL` | - | - | - | - | - | - | - | Yes | - | - |
| 28 | `ONIG_SYN_OP2_QMARK_BRACE_CALLOUT_CONTENTS` | - | - | - | - | - | - | Yes | Yes | Yes | - |
| 29 | `ONIG_SYN_OP2_ASTERISK_CALLOUT_NAME` | - | - | - | - | - | - | Yes | Yes | Yes | - |
| 30 | `ONIG_SYN_OP2_OPTION_ONIGURUMA` | - | - | - | - | - | - | - | - | - | Yes |
| ID | Option | Onig | Pythn | Ruby | PeNG | Perl | Java | Gnu | Grep | Emacs | PosEx | PosB | ASIS |
| ----- | ---------------------------------------------- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
| 0 | `ONIG_SYN_OP2_ESC_CAPITAL_Q_QUOTE` | - | - | - | Yes | Yes | Yes | - | - | - | - | - | - |
| 1 | `ONIG_SYN_OP2_QMARK_GROUP_EFFECT` | Yes | Yes | Yes | Yes | Yes | Yes | - | - | Yes | - | - | - |
| 2 | `ONIG_SYN_OP2_OPTION_PERL` | - | Yes | - | Yes | Yes | Yes | - | - | - | - | - | - |
| 3 | `ONIG_SYN_OP2_OPTION_RUBY` | - | - | Yes | - | - | - | - | - | - | - | - | - |
| 4 | `ONIG_SYN_OP2_PLUS_POSSESSIVE_REPEAT` | Yes | - | Yes | Yes | Yes | Yes | - | - | - | - | - | - |
| 5 | `ONIG_SYN_OP2_PLUS_POSSESSIVE_INTERVAL` | - | - | - | Yes | Yes | Yes | - | - | - | - | - | - |
| 6 | `ONIG_SYN_OP2_CCLASS_SET_OP` | Yes | - | Yes | - | - | Yes | - | - | - | - | - | - |
| 7 | `ONIG_SYN_OP2_QMARK_LT_NAMED_GROUP` | Yes | - | Yes | Yes | - | - | - | - | - | - | - | - |
| 8 | `ONIG_SYN_OP2_ESC_K_NAMED_BACKREF` | Yes | - | Yes | Yes | - | - | - | - | - | - | - | - |
| 9 | `ONIG_SYN_OP2_ESC_G_SUBEXP_CALL` | Yes | - | Yes | Yes | - | - | - | - | - | - | - | - |
| 10 | `ONIG_SYN_OP2_ATMARK_CAPTURE_HISTORY` | - | - | - | - | - | - | - | - | - | - | - | - |
| 11 | `ONIG_SYN_OP2_ESC_CAPITAL_C_BAR_CONTROL` | Yes | - | Yes | - | - | - | - | - | - | - | - | - |
| 12 | `ONIG_SYN_OP2_ESC_CAPITAL_M_BAR_META` | Yes | - | Yes | - | - | - | - | - | - | - | - | - |
| 13 | `ONIG_SYN_OP2_ESC_V_VTAB` | Yes | Yes | Yes | - | - | Yes | - | - | - | - | - | - |
| 14 | `ONIG_SYN_OP2_ESC_U_HEX4` | Yes | Yes | Yes | - | - | Yes | - | - | - | - | - | - |
| 15 | `ONIG_SYN_OP2_ESC_GNU_BUF_ANCHOR` | - | - | - | - | - | - | - | - | Yes | - | - | - |
| 16 | `ONIG_SYN_OP2_ESC_P_BRACE_CHAR_PROPERTY` | Yes | Yes | Yes | Yes | Yes | Yes | - | - | - | - | - | - |
| 17 | `ONIG_SYN_OP2_ESC_P_BRACE_CIRCUMFLEX_NOT` | Yes | Yes | Yes | Yes | Yes | - | - | - | - | - | - | - |
| 19 | `ONIG_SYN_OP2_ESC_H_XDIGIT` | Yes | - | Yes | - | - | - | - | - | - | - | - | - |
| 20 | `ONIG_SYN_OP2_INEFFECTIVE_ESCAPE` | - | - | - | - | - | - | - | - | - | - | - | Yes |
| 21 | `ONIG_SYN_OP2_QMARK_LPAREN_IF_ELSE` | Yes | Yes | Yes | Yes | Yes | - | - | - | - | - | - | - |
| 22 | `ONIG_SYN_OP2_ESC_CAPITAL_K_KEEP` | Yes | Yes | Yes | Yes | Yes | - | - | - | - | - | - | - |
| 23 | `ONIG_SYN_OP2_ESC_CAPITAL_R_GENERAL_NEWLINE` | Yes | - | Yes | Yes | Yes | - | - | - | - | - | - | - |
| 24 | `ONIG_SYN_OP2_ESC_CAPITAL_N_O_SUPER_DOT` | Yes | - | - | Yes | Yes | - | - | - | - | - | - | - |
| 25 | `ONIG_SYN_OP2_QMARK_TILDE_ABSENT_GROUP` | Yes | - | Yes | Yes | Yes | - | - | - | - | - | - | - |
| 26 | `ONIG_SYN_OP2_ESC_X_Y_TEXT_SEGMENT` | Yes | - | Yes | Yes | Yes | - | - | - | - | - | - | - |
| 27 | `ONIG_SYN_OP2_QMARK_PERL_SUBEXP_CALL` | - | - | - | Yes | - | - | - | - | - | - | - | - |
| 28 | `ONIG_SYN_OP2_QMARK_BRACE_CALLOUT_CONTENTS` | Yes | - | - | Yes | Yes | - | - | - | - | - | - | - |
| 29 | `ONIG_SYN_OP2_ASTERISK_CALLOUT_NAME` | Yes | Yes | - | Yes | Yes | - | - | - | - | - | - | - |
| 30 | `ONIG_SYN_OP2_OPTION_ONIGURUMA` | Yes | - | - | - | - | - | - | - | - | - | - | - |
| 31 | `ONIG_SYN_OP2_QMARK_CAPITAL_P_NAME` | - | Yes | - | - | - | - | - | - | - | - | - | - |
### Syntax Flags (syn)
| ID | Option | PosB | PosEx | Emacs | Grep | Gnu | Java | Perl | PeNG | Ruby | Onig |
| ----- | --------------------------------------------- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
| 0 | `ONIG_SYN_CONTEXT_INDEP_REPEAT_OPS` | - | Yes | - | - | Yes | Yes | Yes | Yes | Yes | Yes |
| 1 | `ONIG_SYN_CONTEXT_INVALID_REPEAT_OPS` | - | - | - | - | Yes | Yes | Yes | Yes | Yes | Yes |
| 2 | `ONIG_SYN_ALLOW_UNMATCHED_CLOSE_SUBEXP` | - | Yes | - | - | - | - | - | - | - | - |
| 3 | `ONIG_SYN_ALLOW_INVALID_INTERVAL` | - | - | - | - | Yes | Yes | Yes | Yes | Yes | Yes |
| 4 | `ONIG_SYN_ALLOW_INTERVAL_LOW_ABBREV` | - | - | - | - | - | - | - | - | Yes | Yes |
| 5 | `ONIG_SYN_STRICT_CHECK_BACKREF` | - | - | - | - | - | - | - | - | - | - |
| 6 | `ONIG_SYN_DIFFERENT_LEN_ALT_LOOK_BEHIND` | - | - | - | - | - | Yes | - | - | Yes | Yes |
| 7 | `ONIG_SYN_CAPTURE_ONLY_NAMED_GROUP` | - | - | - | - | - | - | - | Yes | Yes | Yes |
| 8 | `ONIG_SYN_ALLOW_MULTIPLEX_DEFINITION_NAME` | - | - | - | - | - | - | - | Yes | Yes | Yes |
| 9 | `ONIG_SYN_FIXED_INTERVAL_IS_GREEDY_ONLY` | - | - | - | - | - | - | - | - | Yes | Yes |
| 10 | `ONIG_SYN_ISOLATED_OPTION_CONTINUE_BRANCH` | - | - | - | - | - | Yes | Yes | Yes | - | - |
| 11 | `ONIG_SYN_VARIABLE_LEN_LOOK_BEHIND` | - | - | - | - | - | Yes | - | - | - | Yes |
| 20 | `ONIG_SYN_NOT_NEWLINE_IN_NEGATIVE_CC` | - | - | - | Yes | - | - | - | - | - | - |
| 21 | `ONIG_SYN_BACKSLASH_ESCAPE_IN_CC` | - | - | - | - | Yes | Yes | Yes | Yes | Yes | Yes |
| 22 | `ONIG_SYN_ALLOW_EMPTY_RANGE_IN_CC` | - | - | Yes | Yes | - | - | - | - | - | - |
| 23 | `ONIG_SYN_ALLOW_DOUBLE_RANGE_OP_IN_CC` | - | Yes | - | - | Yes | Yes | Yes | Yes | Yes | Yes |
| 24 | `ONIG_SYN_WARN_CC_OP_NOT_ESCAPED` | - | - | - | - | - | - | - | - | Yes | Yes |
| 25 | `ONIG_SYN_WARN_REDUNDANT_NESTED_REPEAT` | - | - | - | - | - | - | - | - | Yes | Yes |
| 26 | `ONIG_SYN_ALLOW_INVALID_CODE_END_OF_RANGE_IN_CC` | - | - | - | - | - | - | - | - | - | Yes |
| 31 | `ONIG_SYN_CONTEXT_INDEP_ANCHORS` | - | Yes | - | - | Yes | Yes | Yes | Yes | Yes | Yes |
| ID | Option | Onig | Pythn | Ruby | PeNG | Perl | Java | Gnu | Grep | Emacs | PosEx | PosB | ASIS |
| ----- | ---------------------------------------------------- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
| 0 | `ONIG_SYN_CONTEXT_INDEP_REPEAT_OPS` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - | - | Yes | - | - |
| 1 | `ONIG_SYN_CONTEXT_INVALID_REPEAT_OPS` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - | - | Yes | - | - |
| 2 | `ONIG_SYN_ALLOW_UNMATCHED_CLOSE_SUBEXP` | - | - | - | - | - | - | - | - | - | Yes | - | - |
| 3 | `ONIG_SYN_ALLOW_INVALID_INTERVAL` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - | - | - | - | - |
| 4 | `ONIG_SYN_ALLOW_INTERVAL_LOW_ABBREV` | Yes | Yes | Yes | - | - | - | - | - | - | - | - | - |
| 5 | `ONIG_SYN_STRICT_CHECK_BACKREF` | - | - | - | - | - | - | - | - | - | - | - | - |
| 6 | `ONIG_SYN_DIFFERENT_LEN_ALT_LOOK_BEHIND` | Yes | - | Yes | - | - | Yes | - | - | - | - | - | - |
| 7 | `ONIG_SYN_CAPTURE_ONLY_NAMED_GROUP` | Yes | - | Yes | Yes | - | - | - | - | - | - | - | - |
| 8 | `ONIG_SYN_ALLOW_MULTIPLEX_DEFINITION_NAME` | Yes | - | Yes | Yes | - | - | - | - | - | - | - | - |
| 9 | `ONIG_SYN_FIXED_INTERVAL_IS_GREEDY_ONLY` | Yes | - | Yes | - | - | - | - | - | - | - | - | - |
| 10 | `ONIG_SYN_ISOLATED_OPTION_CONTINUE_BRANCH` | - | Yes | - | Yes | Yes | Yes | - | - | - | - | - | - |
| 11 | `ONIG_SYN_VARIABLE_LEN_LOOK_BEHIND` | Yes | - | - | - | - | Yes | - | - | - | - | - | - |
| 12 | `ONIG_SYN_PYTHON` | - | Yes | - | - | - | - | - | - | - | - | - | - |
| 13 | `ONIG_SYN_WHOLE_OPTIONS` | Yes | - | - | - | - | - | - | - | - | - | - | - |
| 14 | `ONIG_SYN_BRE_ANCHOR_AT_EDGE_OF_SUBEXP` | - | - | - | - | - | - | - | Yes | - | - | Yes | - |
| 20 | `ONIG_SYN_NOT_NEWLINE_IN_NEGATIVE_CC` | - | - | - | - | - | - | - | Yes | - | - | - | - |
| 21 | `ONIG_SYN_BACKSLASH_ESCAPE_IN_CC` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - | - | - | - | - |
| 22 | `ONIG_SYN_ALLOW_EMPTY_RANGE_IN_CC` | - | - | - | - | - | - | - | Yes | Yes | - | - | - |
| 23 | `ONIG_SYN_ALLOW_DOUBLE_RANGE_OP_IN_CC` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - | - | Yes | - | - |
| 24 | `ONIG_SYN_WARN_CC_OP_NOT_ESCAPED` | Yes | - | Yes | - | - | - | - | - | - | - | - | - |
| 25 | `ONIG_SYN_WARN_REDUNDANT_NESTED_REPEAT` | Yes | - | Yes | - | - | - | - | - | - | - | - | - |
| 26 | `ONIG_SYN_ALLOW_INVALID_CODE_END_OF_RANGE_IN_CC` | Yes | - | - | - | - | - | - | - | - | - | - | - |
| 27 | `ONIG_SYN_ALLOW_CHAR_TYPE_FOLLOWED_BY_MINUS_IN_CC` | - | - | - | Yes | Yes | Yes | - | - | - | - | - | - |
| 31 | `ONIG_SYN_CONTEXT_INDEP_ANCHORS` | Yes | Yes | Yes | Yes | Yes | Yes | Yes | - | - | Yes | - | - |

View File

@ -1,4 +1,4 @@
Unicode Properties (Unicode Version: 15.1.0, Emoji: 15.1)
Unicode Properties (Unicode Version: 16.0.0, Emoji: 16.0)
ASCII_Hex_Digit
Adlam
@ -68,6 +68,7 @@ Emoji_Presentation
Ethiopic
Extended_Pictographic
Extender
Garay
Georgian
Glagolitic
Gothic
@ -79,6 +80,7 @@ Greek
Gujarati
Gunjala_Gondi
Gurmukhi
Gurung_Khema
Han
Hangul
Hanifi_Rohingya
@ -113,6 +115,7 @@ Khitan_Small_Script
Khmer
Khojki
Khudawadi
Kirat_Rai
L
LC
Lao
@ -150,6 +153,7 @@ Meroitic_Hieroglyphs
Miao
Mn
Modi
Modifier_Combining_Mark
Mongolian
Mro
Multani
@ -169,6 +173,7 @@ Nushu
Nyiakeng_Puachue_Hmong
Ogham
Ol_Chiki
Ol_Onal
Old_Hungarian
Old_Italic
Old_North_Arabian
@ -229,6 +234,7 @@ Sogdian
Sora_Sompeng
Soyombo
Sundanese
Sunuwar
Syloti_Nagri
Syriac
Tagalog
@ -247,7 +253,9 @@ Thai
Tibetan
Tifinagh
Tirhuta
Todhri
Toto
Tulu_Tigalari
Ugaritic
Unified_Ideograph
Unknown
@ -330,6 +338,7 @@ Ext
ExtPict
Final_Punctuation
Format
Gara
Geor
Glag
Gong
@ -341,6 +350,7 @@ Grek
Gr_Ext
Gr_Link
Gujr
Gukh
Guru
Hang
Hani
@ -370,6 +380,7 @@ Khmr
Khoj
Kits
Knda
Krai
Kthi
Lana
Laoo
@ -392,6 +403,7 @@ Mani
Marc
Mark
Math_Symbol
MCM
Medf
Mend
Merc
@ -422,6 +434,7 @@ OIDS
Olck
OLower
OMath
Onao
Open_Punctuation
Orkh
Orya
@ -476,6 +489,7 @@ Space_Separator
Spacing_Mark
STerm
Sund
Sunu
Surrogate
Sylo
Symbol
@ -496,6 +510,8 @@ Tibt
Tirh
Titlecase_Letter
Tnsa
Todr
Tutg
Ugar
UIdeo
Unassigned
@ -701,6 +717,7 @@ In_Osage
In_Elbasan
In_Caucasian_Albanian
In_Vithkuqi
In_Todhri
In_Linear_A
In_Latin_Extended_F
In_Cypriot_Syllabary
@ -723,6 +740,7 @@ In_Psalter_Pahlavi
In_Old_Turkic
In_Old_Hungarian
In_Hanifi_Rohingya
In_Garay
In_Rumi_Numeral_Symbols
In_Yezidi
In_Arabic_Extended_C
@ -742,12 +760,14 @@ In_Khojki
In_Multani
In_Khudawadi
In_Grantha
In_Tulu_Tigalari
In_Newa
In_Tirhuta
In_Siddham
In_Modi
In_Mongolian_Supplement
In_Takri
In_Myanmar_Extended_C
In_Ahom
In_Dogra
In_Warang_Citi
@ -758,6 +778,7 @@ In_Soyombo
In_Unified_Canadian_Aboriginal_Syllabics_Extended_A
In_Pau_Cin_Hau
In_Devanagari_Extended_A
In_Sunuwar
In_Bhaiksuki
In_Marchen
In_Masaram_Gondi
@ -772,12 +793,15 @@ In_Early_Dynastic_Cuneiform
In_Cypro_Minoan
In_Egyptian_Hieroglyphs
In_Egyptian_Hieroglyph_Format_Controls
In_Egyptian_Hieroglyphs_Extended_A
In_Anatolian_Hieroglyphs
In_Gurung_Khema
In_Bamum_Supplement
In_Mro
In_Tangsa
In_Bassa_Vah
In_Pahawh_Hmong
In_Kirat_Rai
In_Medefaidrin
In_Miao
In_Ideographic_Symbols_and_Punctuation
@ -792,6 +816,7 @@ In_Small_Kana_Extension
In_Nushu
In_Duployan
In_Shorthand_Format_Controls
In_Symbols_for_Legacy_Computing_Supplement
In_Znamenny_Musical_Notation
In_Byzantine_Musical_Symbols
In_Musical_Symbols
@ -809,6 +834,7 @@ In_Nyiakeng_Puachue_Hmong
In_Toto
In_Wancho
In_Nag_Mundari
In_Ol_Onal
In_Ethiopic_Extended_B
In_Mende_Kikakui
In_Adlam

View File

@ -2,7 +2,7 @@
ascii.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2019 K.Kosako
* Copyright (c) 2002-2025 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -44,6 +44,9 @@ init(void)
name = "FAIL"; BC0_P(name, fail);
name = "MISMATCH"; BC0_P(name, mismatch);
#ifdef USE_SKIP_SEARCH
name = "SKIP"; BC0_P(name, skip);
#endif
name = "MAX";
args[0] = ONIG_TYPE_TAG | ONIG_TYPE_LONG;
@ -87,8 +90,12 @@ is_initialized(void)
static int
ascii_is_code_ctype(OnigCodePoint code, unsigned int ctype)
{
if (code < 128)
return ONIGENC_IS_ASCII_CODE_CTYPE(code, ctype);
if (code < 128) {
if (ctype > ONIGENC_MAX_STD_CTYPE)
return FALSE;
else
return ONIGENC_IS_ASCII_CODE_CTYPE(code, ctype);
}
else
return FALSE;
}

View File

@ -1,7 +1,7 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# gperf_fold_key_conv.py
# Copyright (c) 2016-2023 K.Kosako
# Copyright (c) 2016-2025 K.Kosako
import sys
import re
@ -17,7 +17,8 @@ REG_GET_HASH = re.compile('(?:register\s+)?(?:unsigned\s+)?int\s+key\s*=\s*hash\
REG_GET_CODE = re.compile('(?:register\s+)?const\s+char\s*\*\s*s\s*=\s*wordlist\[key\]\.name;')
REG_CODE_CHECK = re.compile('if\s*\(\*str\s*==\s*\*s\s*&&\s*!strncmp.+\)')
REG_RETURN_WL = re.compile('return\s+&wordlist\[key\];')
REG_RETURN_0 = re.compile('return 0;')
REG_RETURN_0 = re.compile('^\s*return\s*\([^)]+\)\s*0;')
REG_VOID_LEN = re.compile('^\s*\(void\s*\)\s*len\s*;')
def parse_line(s, key_len):
s = s.rstrip()
@ -46,7 +47,9 @@ def parse_line(s, key_len):
r = re.sub(REG_RETURN_WL, 'return index;', s)
if r != s: return r
r = re.sub(REG_RETURN_0, 'return -1;', s)
r = re.sub(REG_RETURN_0, ' return -1;', s)
if r != s: return r
r = re.sub(REG_VOID_LEN, '', s)
if r != s: return r
return s

View File

@ -1,7 +1,7 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# gperf_unfold_key_conv.py
# Copyright (c) 2016-2023 K.Kosako
# Copyright (c) 2016-2025 K.Kosako
import sys
import re
@ -16,6 +16,7 @@ REG_IF_LEN = re.compile('\s*if\s*\(\s*len\s*<=\s*MAX_WORD_LENGTH.+')
REG_GET_HASH = re.compile('(?:register\s+)?(?:unsigned\s+)?int\s+key\s*=\s*hash\s*\(str,\s*len\);')
REG_GET_CODE = re.compile('(?:register\s+)?const\s+char\s*\*\s*s\s*=\s*wordlist\[key\]\.name;')
REG_CODE_CHECK = re.compile('if\s*\(\*str\s*==\s*\*s\s*&&\s*!strncmp.+\)')
REG_VOID_LEN = re.compile('^\s*\(void\s*\)\s*len\s*;')
def parse_line(s):
s = s.rstrip()
@ -40,6 +41,8 @@ def parse_line(s):
if r != s: return r
r = re.sub(REG_CODE_CHECK, 'if (code == gcode && wordlist[key].index >= 0)', s)
if r != s: return r
r = re.sub(REG_VOID_LEN, '', s)
if r != s: return r
return s

View File

@ -1,7 +1,7 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# make_unicode_egcb_data.py
# Copyright (c) 2017-2023 K.Kosako
# Copyright (c) 2017-2024 K.Kosako
import sys
import re
@ -196,7 +196,7 @@ print('/* unicode_egcb_data.c: Generated by make_unicode_egcb_data.py. */')
COPYRIGHT = '''
/*-
* Copyright (c) 2017-2023 K.Kosako
* Copyright (c) 2017-2024 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,7 +1,7 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# make_unicode_fold_data.py
# Copyright (c) 2016-2023 K.Kosako
# Copyright (c) 2016-2024 K.Kosako
import sys
import re
@ -30,7 +30,7 @@ LOCALE_UNFOLDS = {}
COPYRIGHT = '''
/*-
* Copyright (c) 2017-2023 K.Kosako
* Copyright (c) 2017-2024 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,7 +1,7 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# make_unicode_property_data.py
# Copyright (c) 2016-2023 K.Kosako
# Copyright (c) 2016-2024 K.Kosako
import sys
import re
@ -427,7 +427,7 @@ argc = len(argv)
COPYRIGHT = '''
/*-
* Copyright (c) 2016-2023 K.Kosako
* Copyright (c) 2016-2024 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without

View File

@ -4,7 +4,7 @@
oniguruma.h - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2022 K.Kosako
* Copyright (c) 2002-2025 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -36,9 +36,9 @@ extern "C" {
#define ONIGURUMA
#define ONIGURUMA_VERSION_MAJOR 6
#define ONIGURUMA_VERSION_MINOR 9
#define ONIGURUMA_VERSION_TEENY 9
#define ONIGURUMA_VERSION_TEENY 10
#define ONIGURUMA_VERSION_INT 60909
#define ONIGURUMA_VERSION_INT 60910
#ifndef P_
#if defined(__STDC__) || defined(_WIN32)
@ -545,6 +545,7 @@ ONIG_EXTERN OnigSyntaxType* OnigDefaultSyntax;
#define ONIG_SYN_PYTHON (1U<<12) /* \UHHHHHHHH */
#define ONIG_SYN_WHOLE_OPTIONS (1U<<13) /* (?Ie) */
#define ONIG_SYN_BRE_ANCHOR_AT_EDGE_OF_SUBEXP (1U<<14) /* \(^abc$\) */
#define ONIG_SYN_ESC_P_WITH_ONE_CHAR_PROP (1U<<15) /* \pL */
/* syntax (behavior) in char class [...] */
#define ONIG_SYN_NOT_NEWLINE_IN_NEGATIVE_CC (1U<<20) /* [^...] */
@ -552,6 +553,7 @@ ONIG_EXTERN OnigSyntaxType* OnigDefaultSyntax;
#define ONIG_SYN_ALLOW_EMPTY_RANGE_IN_CC (1U<<22)
#define ONIG_SYN_ALLOW_DOUBLE_RANGE_OP_IN_CC (1U<<23) /* [0-9-a]=[0-9\-a] */
#define ONIG_SYN_ALLOW_INVALID_CODE_END_OF_RANGE_IN_CC (1U<<26)
#define ONIG_SYN_ALLOW_CHAR_TYPE_FOLLOWED_BY_MINUS_IN_CC (1U<<27) /* [\w-%]=[\w\-%] */
/* syntax (behavior) warning */
#define ONIG_SYN_WARN_CC_OP_NOT_ESCAPED (1U<<24) /* [,-,] */
#define ONIG_SYN_WARN_REDUNDANT_NESTED_REPEAT (1U<<25) /* (?:a*)+ */
@ -588,6 +590,7 @@ ONIG_EXTERN OnigSyntaxType* OnigDefaultSyntax;
#define ONIGERR_RETRY_LIMIT_IN_MATCH_OVER -17
#define ONIGERR_RETRY_LIMIT_IN_SEARCH_OVER -18
#define ONIGERR_SUBEXP_CALL_LIMIT_IN_SEARCH_OVER -19
#define ONIGERR_TIME_LIMIT_OVER -20
#define ONIGERR_DEFAULT_ENCODING_IS_NOT_SETTED -21 /*dont use*/
#define ONIGERR_DEFAULT_ENCODING_IS_NOT_SET -21
#define ONIGERR_SPECIFIED_ENCODING_CANT_CONVERT_TO_WIDE_CHAR -22
@ -946,6 +949,10 @@ unsigned long onig_get_retry_limit_in_search P_((void));
ONIG_EXTERN
int onig_set_retry_limit_in_search P_((unsigned long n));
ONIG_EXTERN
unsigned long onig_get_time_limit P_((void));
ONIG_EXTERN
int onig_set_time_limit P_((unsigned long n /* msec. */));
ONIG_EXTERN
unsigned int onig_get_parse_depth_limit P_((void));
ONIG_EXTERN
int onig_set_capture_num_limit P_((int num));
@ -990,6 +997,8 @@ int onig_set_retry_limit_in_match_of_match_param P_((OnigMatchParam* param, unsi
ONIG_EXTERN
int onig_set_retry_limit_in_search_of_match_param P_((OnigMatchParam* param, unsigned long limit));
ONIG_EXTERN
int onig_set_time_limit_of_match_param P_((OnigMatchParam* param, unsigned long limit /* msec. */));
ONIG_EXTERN
int onig_set_progress_callout_of_match_param P_((OnigMatchParam* param, OnigCalloutFunc f));
ONIG_EXTERN
int onig_set_retraction_callout_of_match_param P_((OnigMatchParam* param, OnigCalloutFunc f));
@ -1084,6 +1093,8 @@ int onig_builtin_mismatch P_((OnigCalloutArgs* args, void* user_data));
ONIG_EXTERN
int onig_builtin_error P_((OnigCalloutArgs* args, void* user_data));
ONIG_EXTERN
int onig_builtin_skip P_((OnigCalloutArgs* args, void* user_data));
ONIG_EXTERN
int onig_builtin_count P_((OnigCalloutArgs* args, void* user_data));
ONIG_EXTERN
int onig_builtin_total_count P_((OnigCalloutArgs* args, void* user_data));

View File

@ -2,7 +2,7 @@
regcomp.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2023 K.Kosako
* Copyright (c) 2002-2024 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -1150,7 +1150,7 @@ compile_string_node(Node* node, regex_t* reg)
for (; p < end; ) {
len = enclen(enc, p);
if (p + len > end) len = end - p;
if (p + len > end) len = end - p; // PR #5392 - CVE-2019-9023
if (len == prev_len) {
slen++;
}
@ -5198,12 +5198,18 @@ check_call_reference(CallNode* cn, ParseEnv* env, int state)
#ifdef USE_WHOLE_OPTIONS
static int
check_whole_options_position(Node* node /* root */)
check_whole_options_position(Node* node /* root */, ParseEnv* env)
{
int is_list;
is_list = FALSE;
#ifdef USE_CALL
if ((env->flags & PE_FLAG_HAS_CALL_ZERO) != 0) {
node = ND_BODY(node);
}
#endif
start:
switch (ND_TYPE(node)) {
case ND_LIST:
@ -7395,7 +7401,7 @@ static int parse_and_tune(regex_t* reg, const UChar* pattern,
#ifdef USE_WHOLE_OPTIONS
if ((scan_env->flags & PE_FLAG_HAS_WHOLE_OPTIONS) != 0) {
r = check_whole_options_position(root);
r = check_whole_options_position(root, scan_env);
if (r != 0) goto err;
}
#endif

View File

@ -2,7 +2,7 @@
regenc.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2021 K.Kosako
* Copyright (c) 2002-2025 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -931,8 +931,10 @@ extern int
onigenc_mb2_is_code_ctype(OnigEncoding enc, OnigCodePoint code,
unsigned int ctype)
{
if (code < 128)
return ONIGENC_IS_ASCII_CODE_CTYPE(code, ctype);
if (code < 128) {
if (ctype <= ONIGENC_MAX_STD_CTYPE)
return ONIGENC_IS_ASCII_CODE_CTYPE(code, ctype);
}
else {
if (CTYPE_IS_WORD_GRAPH_PRINT(ctype)) {
return (ONIGENC_CODE_TO_MBCLEN(enc, code) > 1 ? TRUE : FALSE);
@ -946,8 +948,10 @@ extern int
onigenc_mb4_is_code_ctype(OnigEncoding enc, OnigCodePoint code,
unsigned int ctype)
{
if (code < 128)
return ONIGENC_IS_ASCII_CODE_CTYPE(code, ctype);
if (code < 128) {
if (ctype <= ONIGENC_MAX_STD_CTYPE)
return ONIGENC_IS_ASCII_CODE_CTYPE(code, ctype);
}
else {
if (CTYPE_IS_WORD_GRAPH_PRINT(ctype)) {
return (ONIGENC_CODE_TO_MBCLEN(enc, code) > 1 ? TRUE : FALSE);

View File

@ -2,7 +2,7 @@
regerror.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2022 K.Kosako
* Copyright (c) 2002-2025 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -58,6 +58,8 @@ onig_error_code_to_format(OnigPos code)
p = "retry-limit-in-search over"; break;
case ONIGERR_SUBEXP_CALL_LIMIT_IN_SEARCH_OVER:
p = "subexp-call-limit-in-search over"; break;
case ONIGERR_TIME_LIMIT_OVER:
p = "time limit over"; break;
case ONIGERR_TYPE_BUG:
p = "undefined type (bug)"; break;
case ONIGERR_PARSER_BUG:
@ -352,7 +354,7 @@ onig_error_code_to_str(UChar* s, OnigPos code, ...)
void ONIG_VARIADIC_FUNC_ATTR
onig_snprintf_with_pattern(UChar buf[], int bufsize, OnigEncoding enc,
UChar* pat, UChar* pat_end, const UChar *fmt, ...)
UChar* pat, UChar* pat_end, const char *fmt, ...)
{
int n, need, len;
UChar *p, *s, *bp;
@ -360,7 +362,7 @@ onig_snprintf_with_pattern(UChar buf[], int bufsize, OnigEncoding enc,
va_list args;
va_start(args, fmt);
n = xvsnprintf((char* )buf, bufsize, (const char* )fmt, args);
n = xvsnprintf((char* )buf, bufsize, fmt, args);
va_end(args);
need = (int )(pat_end - pat) * 4 + 4;

View File

@ -2,7 +2,7 @@
regexec.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2022 K.Kosako
* Copyright (c) 2002-2025 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -66,6 +66,112 @@ static int forward_search(regex_t* reg, const UChar* str, const UChar* end, UCha
static int
search_in_range(regex_t* reg, const UChar* str, const UChar* end, const UChar* start, const UChar* range, /* match range */ const UChar* data_range, /* subject string range */ OnigRegion* region, OnigOptionType option, OnigMatchParam* mp);
#ifdef USE_TIME_LIMIT
#if defined(_WIN32) && !defined(__GNUC__)
#include <windows.h>
typedef __int64 TIME_TYPE;
static void
set_limit_end_time(TIME_TYPE* t, unsigned long limit /* msec. */)
{
TIME_TYPE limit_10nsec;
if ((__int64 )limit < INT64_MAX / 10000) {
limit_10nsec = limit * 10000; /* 10 nsec. */
GetSystemTimeAsFileTime((FILETIME* )t);
if (*t < INT64_MAX - limit_10nsec) {
*t += limit_10nsec;
return ;
}
}
*t = INT64_MAX;
}
static int
time_is_running_out(TIME_TYPE* t)
{
TIME_TYPE now;
GetSystemTimeAsFileTime((FILETIME* )&now);
if (now > *t)
return 1;
else
return 0;
}
#else /* defined(_WIN32) && !defined(__GNUC__) */
#include <time.h>
#ifndef TIME_T_MAX
#ifdef SIZEOF_TIME_T
#if SIZEOF_TIME_T == SIZEOF_LONG_LONG
#define TIME_T_MAX LLONG_MAX
#elif SIZEOF_TIME_T == SIZEOF_LONG
#define TIME_T_MAX LONG_MAX
#elif SIZEOF_TIME_T == SIZEOF_INT
#define TIME_T_MAX INT_MAX
#endif
#endif
#endif
typedef struct timespec TIME_TYPE;
static void
set_limit_end_time(TIME_TYPE* t, unsigned long limit /* msec. */)
{
time_t limit_sec;
long limit_nsec;
clock_gettime(CLOCK_REALTIME, t);
limit_sec = limit / 1000;
limit_nsec = (limit % 1000) * 1000000L;
if (t->tv_nsec > LONG_MAX - limit_nsec)
t->tv_nsec = LONG_MAX;
else
t->tv_nsec += limit_nsec;
if (t->tv_nsec > 999999999L) {
limit_sec += (time_t )1;
t->tv_nsec -= 1000000000L;
}
#ifdef TIME_T_MAX
if (t->tv_sec > TIME_T_MAX - limit_sec)
t->tv_sec = TIME_T_MAX;
else
t->tv_sec += limit_sec;
#else
t->tv_sec += limit_sec;
#endif
}
static int
time_is_running_out(TIME_TYPE* t)
{
TIME_TYPE now;
time_t diff;
clock_gettime(CLOCK_REALTIME, &now);
diff = now.tv_sec - t->tv_sec;
if (diff > 0)
return 1;
else if (diff == 0)
return now.tv_nsec > t->tv_nsec;
else
return 0;
}
#endif /* defined(_WIN32) && !defined(__GNUC__) */
#endif /* USE_TIME_LIMIT */
#ifdef USE_CALLOUT
typedef struct {
@ -83,6 +189,9 @@ struct OnigMatchParamStruct {
unsigned long retry_limit_in_match;
unsigned long retry_limit_in_search;
#endif
#ifdef USE_TIME_LIMIT
unsigned long time_limit;
#endif
void* callout_user_data; /* used in callback each match */
#ifdef USE_CALLOUT
@ -126,6 +235,18 @@ onig_set_retry_limit_in_search_of_match_param(OnigMatchParam* param,
#endif
}
extern int
onig_set_time_limit_of_match_param(OnigMatchParam* param,
unsigned long limit /* msec. */)
{
#ifdef USE_TIME_LIMIT
param->time_limit = limit;
return ONIG_NORMAL;
#else
return ONIG_NO_SUPPORT_CONFIG;
#endif
}
extern int
onig_set_progress_callout_of_match_param(OnigMatchParam* param, OnigCalloutFunc f)
{
@ -169,6 +290,13 @@ typedef struct {
unsigned long retry_limit_in_search;
unsigned long retry_limit_in_search_counter;
#endif
#ifdef USE_TIME_LIMIT
int time_counter;
unsigned long time_limit;
TIME_TYPE time_end;
#endif
OnigMatchParam* mp;
#ifdef USE_FIND_LONGEST_SEARCH_ALL_OF_RANGE
int best_len; /* for ONIG_OPTION_FIND_LONGEST */
@ -177,6 +305,9 @@ typedef struct {
#ifdef USE_CALL
unsigned long subexp_call_in_search_counter;
#endif
#ifdef USE_SKIP_SEARCH
UChar* skip_search;
#endif
} MatchArg;
@ -1045,6 +1176,7 @@ onig_region_copy(OnigRegion* to, OnigRegion* from)
}
#ifdef USE_CALLOUT
#ifdef USE_RETRY_LIMIT
#define CALLOUT_BODY(func, ain, aname_id, anum, user, args, result) do { \
args.in = (ain);\
args.name_id = (aname_id);\
@ -1063,6 +1195,25 @@ onig_region_copy(OnigRegion* to, OnigRegion* from)
args.mem_end_stk = mem_end_stk;\
result = (func)(&args, user);\
} while (0)
#else
#define CALLOUT_BODY(func, ain, aname_id, anum, user, args, result) do { \
args.in = (ain);\
args.name_id = (aname_id);\
args.num = anum;\
args.regex = reg;\
args.string = str;\
args.string_end = end;\
args.start = sstart;\
args.right_range = right_range;\
args.current = s;\
args.msa = msa;\
args.stk_base = stk_base;\
args.stk = stk;\
args.mem_start_stk = mem_start_stk;\
args.mem_end_stk = mem_end_stk;\
result = (func)(&args, user);\
} while (0)
#endif
#define RETRACTION_CALLOUT(func, aname_id, anum, user) do {\
int result;\
@ -1250,6 +1401,17 @@ struct OnigCalloutArgsStruct {
#define RETRY_IN_MATCH_ARG_INIT(msa,mpv)
#endif
#ifdef USE_TIME_LIMIT
#define TIME_LIMIT_INIT(msa,mpv) \
(msa).time_counter = 0;\
(msa).time_limit = (mpv)->time_limit;\
if ((msa).time_limit != 0) {\
set_limit_end_time(&((msa).time_end), (msa).time_limit);\
}
#else
#define TIME_LIMIT_INIT(msa,mpv)
#endif
#if defined(USE_CALL)
#define SUBEXP_CALL_IN_MATCH_ARG_INIT(msa,mpv) \
(msa).subexp_call_in_search_counter = 0;
@ -1261,6 +1423,7 @@ struct OnigCalloutArgsStruct {
#endif
#ifdef USE_FIND_LONGEST_SEARCH_ALL_OF_RANGE
#ifdef USE_SKIP_SEARCH
#define MATCH_ARG_INIT(msa, reg, arg_option, arg_region, arg_start, mpv) do { \
(msa).stack_p = (void* )0;\
(msa).options = (arg_option)|(reg)->options;\
@ -1268,10 +1431,12 @@ struct OnigCalloutArgsStruct {
(msa).start = (arg_start);\
(msa).match_stack_limit = (mpv)->match_stack_limit;\
RETRY_IN_MATCH_ARG_INIT(msa,mpv)\
TIME_LIMIT_INIT(msa,mpv)\
SUBEXP_CALL_IN_MATCH_ARG_INIT(msa,mpv)\
(msa).mp = mpv;\
(msa).best_len = ONIG_MISMATCH;\
(msa).ptr_num = PTR_NUM_SIZE(reg);\
(msa).skip_search = (UChar* )(arg_start);\
} while(0)
#else
#define MATCH_ARG_INIT(msa, reg, arg_option, arg_region, arg_start, mpv) do { \
@ -1281,11 +1446,43 @@ struct OnigCalloutArgsStruct {
(msa).start = (arg_start);\
(msa).match_stack_limit = (mpv)->match_stack_limit;\
RETRY_IN_MATCH_ARG_INIT(msa,mpv)\
TIME_LIMIT_INIT(msa,mpv)\
SUBEXP_CALL_IN_MATCH_ARG_INIT(msa,mpv)\
(msa).mp = mpv;\
(msa).best_len = ONIG_MISMATCH;\
(msa).ptr_num = PTR_NUM_SIZE(reg);\
} while(0)
#endif
#else
#ifdef USE_SKIP_SEARCH
#define MATCH_ARG_INIT(msa, reg, arg_option, arg_region, arg_start, mpv) do { \
(msa).stack_p = (void* )0;\
(msa).options = (arg_option)|(reg)->options;\
(msa).region = (arg_region);\
(msa).start = (arg_start);\
(msa).match_stack_limit = (mpv)->match_stack_limit;\
RETRY_IN_MATCH_ARG_INIT(msa,mpv)\
TIME_LIMIT_INIT(msa,mpv)\
SUBEXP_CALL_IN_MATCH_ARG_INIT(msa,mpv)\
(msa).mp = mpv;\
(msa).ptr_num = PTR_NUM_SIZE(reg);\
(msa).skip_search = (UChar* )(arg_start);\
} while(0)
#else
#define MATCH_ARG_INIT(msa, reg, arg_option, arg_region, arg_start, mpv) do { \
(msa).stack_p = (void* )0;\
(msa).options = (arg_option)|(reg)->options;\
(msa).region = (arg_region);\
(msa).start = (arg_start);\
(msa).match_stack_limit = (mpv)->match_stack_limit;\
RETRY_IN_MATCH_ARG_INIT(msa,mpv)\
TIME_LIMIT_INIT(msa,mpv)\
SUBEXP_CALL_IN_MATCH_ARG_INIT(msa,mpv)\
(msa).mp = mpv;\
(msa).ptr_num = PTR_NUM_SIZE(reg);\
} while(0)
#endif
#endif
#define MATCH_ARG_FREE(msa) if ((msa).stack_p) xfree((msa).stack_p)
@ -1359,8 +1556,9 @@ static unsigned long RetryLimitInMatch = DEFAULT_RETRY_LIMIT_IN_MATCH;
static unsigned long RetryLimitInSearch = DEFAULT_RETRY_LIMIT_IN_SEARCH;
#define CHECK_RETRY_LIMIT_IN_MATCH do {\
if (++retry_in_match_counter > retry_limit_in_match) {\
MATCH_AT_ERROR_RETURN(retry_in_match_counter > msa->retry_limit_in_match ? ONIGERR_RETRY_LIMIT_IN_MATCH_OVER : ONIGERR_RETRY_LIMIT_IN_SEARCH_OVER); \
if (++retry_in_match_counter >= retry_limit_in_match && \
retry_limit_in_match != 0) {\
MATCH_AT_ERROR_RETURN((retry_in_match_counter >= msa->retry_limit_in_match && msa->retry_limit_in_match != 0) ? ONIGERR_RETRY_LIMIT_IN_MATCH_OVER : ONIGERR_RETRY_LIMIT_IN_SEARCH_OVER); \
}\
} while (0)
@ -1370,6 +1568,24 @@ static unsigned long RetryLimitInSearch = DEFAULT_RETRY_LIMIT_IN_SEARCH;
#endif /* USE_RETRY_LIMIT */
#ifdef USE_TIME_LIMIT
static unsigned long TimeLimit = DEFAULT_TIME_LIMIT_MSEC;
#define TIME_LIMIT_CHECK_COUNT 512
#define CHECK_TIME_LIMIT_IN_MATCH do {\
if ((msa->time_limit != 0) && ++msa->time_counter == TIME_LIMIT_CHECK_COUNT) {\
msa->time_counter = 0;\
if (time_is_running_out(&(msa->time_end))) {\
MATCH_AT_ERROR_RETURN(ONIGERR_TIME_LIMIT_OVER);\
}\
}\
} while (0)
#else
#define CHECK_TIME_LIMIT_IN_MATCH
#endif
extern unsigned long
onig_get_retry_limit_in_match(void)
{
@ -1412,6 +1628,27 @@ onig_set_retry_limit_in_search(unsigned long n)
#endif
}
extern unsigned long
onig_get_time_limit(void)
{
#ifdef USE_TIME_LIMIT
return TimeLimit;
#else
return 0;
#endif
}
extern int
onig_set_time_limit(unsigned long n)
{
#ifdef USE_TIME_LIMIT
TimeLimit = n;
return 0;
#else
return ONIG_NO_SUPPORT_CONFIG;
#endif
}
#ifdef USE_CALL
static unsigned long SubexpCallLimitInSearch = DEFAULT_SUBEXP_CALL_LIMIT_IN_SEARCH;
@ -1478,6 +1715,10 @@ onig_initialize_match_param(OnigMatchParam* mp)
mp->retry_limit_in_search = RetryLimitInSearch;
#endif
#ifdef USE_TIME_LIMIT
mp->time_limit = TimeLimit;
#endif
mp->callout_user_data = 0;
#ifdef USE_CALLOUT
@ -3012,7 +3253,7 @@ match_at(regex_t* reg, const UChar* str, const UChar* end,
if (msa->retry_limit_in_search != 0) {
unsigned long rem = msa->retry_limit_in_search
- msa->retry_limit_in_search_counter;
if (rem < retry_limit_in_match)
if (rem < retry_limit_in_match || retry_limit_in_match == 0)
retry_limit_in_match = rem;
}
#endif
@ -4434,6 +4675,7 @@ match_at(regex_t* reg, const UChar* str, const UChar* end,
p = stk->u.state.pcode;
s = stk->u.state.pstr;
CHECK_RETRY_LIMIT_IN_MATCH;
CHECK_TIME_LIMIT_IN_MATCH;
JUMP_OUT;
DEFAULT_OP
@ -4442,9 +4684,18 @@ match_at(regex_t* reg, const UChar* str, const UChar* end,
} BYTECODE_INTERPRETER_END;
match_at_end:
#ifdef USE_RETRY_LIMIT
if (msa->retry_limit_in_search != 0) {
#ifdef ONIG_DEBUG
if (retry_in_match_counter >
ULONG_MAX - msa->retry_limit_in_search_counter) {
fprintf(DBGFP, "retry limit counter overflow: %8lu/%8lu\n",
retry_in_match_counter, msa->retry_limit_in_search_counter);
}
#endif
msa->retry_limit_in_search_counter += retry_in_match_counter;
}
#endif
#ifdef ONIG_DEBUG_MATCH_COUNTER
MATCH_COUNTER_OUT("END");
@ -5427,6 +5678,7 @@ search_in_range(regex_t* reg, const UChar* str, const UChar* end,
OnigOptionType option, OnigMatchParam* mp)
{
int r;
int forward;
UChar *s;
MatchArg msa;
const UChar *orig_start = start;
@ -5474,6 +5726,8 @@ search_in_range(regex_t* reg, const UChar* str, const UChar* end,
else goto finish; /* error */ \
}
forward = (range > start);
/* anchor optimize: resume search range */
if (reg->anchor != 0 && str < end) {
UChar *min_semi_end, *max_semi_end;
@ -5595,7 +5849,7 @@ search_in_range(regex_t* reg, const UChar* str, const UChar* end,
MATCH_ARG_INIT(msa, reg, option, region, orig_start, mp);
s = (UChar* )start;
if (range > start) { /* forward search */
if (forward != 0) { /* forward search */
if (reg->optimize != OPTIMIZE_NONE) {
UChar *sch_range, *low, *high;
@ -5626,6 +5880,9 @@ search_in_range(regex_t* reg, const UChar* str, const UChar* end,
while (s <= high) {
MATCH_AND_RETURN_CHECK(data_range);
s += enclen(reg->enc, s);
#ifdef USE_SKIP_SEARCH
if (s < msa.skip_search) s = msa.skip_search;
#endif
}
} while (s < range);
goto mismatch;
@ -5636,30 +5893,42 @@ search_in_range(regex_t* reg, const UChar* str, const UChar* end,
if ((reg->anchor & ANCR_ANYCHAR_INF) != 0 &&
(reg->anchor & (ANCR_LOOK_BEHIND | ANCR_PREC_READ_NOT)) == 0) {
do {
while (s < range) {
UChar* prev;
MATCH_AND_RETURN_CHECK(data_range);
prev = s;
s += enclen(reg->enc, s);
while (!ONIGENC_IS_MBC_NEWLINE(reg->enc, prev, end) && s < range) {
prev = s;
s += enclen(reg->enc, s);
#ifdef USE_SKIP_SEARCH
if (s < msa.skip_search) s = msa.skip_search;
else {
#endif
while (!ONIGENC_IS_MBC_NEWLINE(reg->enc, prev, end) &&
s < range) {
prev = s;
s += enclen(reg->enc, s);
}
#ifdef USE_SKIP_SEARCH
}
} while (s < range);
#endif
}
goto mismatch;
}
}
}
do {
while (1) {
MATCH_AND_RETURN_CHECK(data_range);
if (s >= range) break;
s += enclen(reg->enc, s);
} while (s < range);
if (s == range) { /* because empty match with /$/. */
MATCH_AND_RETURN_CHECK(data_range);
#ifdef USE_SKIP_SEARCH
if (s < msa.skip_search) {
s = msa.skip_search;
if (s > range) break;
}
#endif
}
}
else { /* backward search */
@ -6368,6 +6637,17 @@ onig_builtin_error(OnigCalloutArgs* args, void* user_data ARG_UNUSED)
return n;
}
#ifdef USE_SKIP_SEARCH
extern int
onig_builtin_skip(OnigCalloutArgs* args, void* user_data ARG_UNUSED)
{
if (args->current > args->msa->skip_search)
args->msa->skip_search = (UChar* )args->current;
return ONIG_NORMAL;
}
#endif
extern int
onig_builtin_count(OnigCalloutArgs* args, void* user_data)
{

View File

@ -4,7 +4,7 @@
regint.h - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2023 K.Kosako
* Copyright (c) 2002-2025 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -37,6 +37,7 @@
/* #define ONIG_DEBUG_MATCH_COUNTER */
/* #define ONIG_DEBUG_CALL */
/* #define ONIG_DONT_OPTIMIZE */
/* #define ONIG_DEBUG */
/* for byte-code statistical data. */
/* #define ONIG_DEBUG_STATISTICS */
@ -47,27 +48,40 @@
defined(ONIG_DEBUG_STATISTICS)
#ifndef ONIG_DEBUG
#define ONIG_DEBUG
#define DBGFP stderr
#endif
#endif
#ifdef ONIG_DEBUG
#define DBGFP stderr
#endif
#ifndef ONIG_DISABLE_DIRECT_THREADING
#ifdef __GNUC__
#define USE_GOTO_LABELS_AS_VALUES
#endif
#endif
#ifndef ONIG_PRINTFLIKE
#if defined(__clang__) || defined(__GNUC__)
#define ONIG_PRINTFLIKE(x, y) __attribute__((format(printf, x, y)))
#else
#define ONIG_PRINTFLIKE(x, y)
#endif
#endif
/* config */
/* spec. config */
#define USE_REGSET
#define USE_CALL
#define USE_CALLOUT
#define USE_SKIP_SEARCH
#define USE_BACKREF_WITH_LEVEL /* \k<name+n>, \k<name-n> */
#define USE_WHOLE_OPTIONS
#define USE_RIGID_CHECK_CAPTURES_IN_EMPTY_REPEAT /* /(?:()|())*\2/ */
#define USE_NEWLINE_AT_END_OF_STRING_HAS_EMPTY_LINE /* /\n$/ =~ "\n" */
#define USE_WARNING_REDUNDANT_NESTED_REPEAT_OPERATOR
#define USE_RETRY_LIMIT
/* #define USE_TIME_LIMIT */
#ifdef USE_GOTO_LABELS_AS_VALUES
#define USE_THREADED_CODE
#define USE_DIRECT_THREADED_CODE
@ -96,12 +110,19 @@
#define DEFAULT_MATCH_STACK_LIMIT_SIZE 0 /* unlimited */
#define DEFAULT_RETRY_LIMIT_IN_MATCH 10000000
#define DEFAULT_RETRY_LIMIT_IN_SEARCH 0 /* unlimited */
#define DEFAULT_TIME_LIMIT_MSEC 0 /* unlimited (msec.) */
#define DEFAULT_SUBEXP_CALL_LIMIT_IN_SEARCH 0 /* unlimited */
#define DEFAULT_SUBEXP_CALL_MAX_NEST_LEVEL 20
#include "regenc.h"
#if !defined(_WIN32) || defined(__GNUC__)
#if !defined(HAVE_TIME_H) || !defined(HAVE_CLOCK_GETTIME)
#undef USE_TIME_LIMIT
#endif
#endif
#ifndef ONIG_NO_STANDARD_C_HEADERS
#include <stddef.h>
@ -115,7 +136,7 @@
#include <stdint.h>
#endif
#if defined(HAVE_ALLOCA_H) && !defined(__GNUC__)
#if defined(HAVE_ALLOCA_H)
#include <alloca.h>
#endif
@ -288,14 +309,17 @@ typedef unsigned __int64 uint64_t;
#endif
#endif /* _WIN32 */
typedef size_t OnigSize;
typedef size_t OnigSize;
#define INFINITE_SIZE ~((OnigSize)0)
#define INFINITE_SIZE ~((OnigSize )0)
#if SIZEOF_VOIDP == SIZEOF_LONG
#if SIZEOF_VOIDP == SIZEOF_INTPTR_T
typedef intptr_t hash_data_type;
#elif SIZEOF_VOIDP == SIZEOF_LONG
typedef unsigned long hash_data_type;
#elif SIZEOF_VOIDP == SIZEOF_LONG_LONG
typedef unsigned long long hash_data_type;
#else
#error SIZEOF_VOIDP has unexpected value
#endif
/* strend hash */
@ -943,8 +967,8 @@ struct re_pattern_buffer {
extern void onig_add_end_call(void (*func)(void));
extern void onig_warning(const char* s);
extern UChar* onig_error_code_to_format P_((OnigPos code));
extern void ONIG_VARIADIC_FUNC_ATTR onig_snprintf_with_pattern PV_((UChar buf[], int bufsize, OnigEncoding enc, UChar* pat, UChar* pat_end, const UChar *fmt, ...));
extern UChar* onig_error_code_to_format P_((int code));
extern void ONIG_VARIADIC_FUNC_ATTR ONIG_PRINTFLIKE(6, 7) onig_snprintf_with_pattern PV_((UChar buf[], int bufsize, OnigEncoding enc, UChar* pat, UChar* pat_end, const char *fmt, ...));
extern int onig_compile P_((regex_t* reg, const UChar* pattern, const UChar* pattern_end, OnigErrorInfo* einfo));
extern int onig_is_code_in_cc_len P_((int enclen, OnigCodePoint code, void* /* CClassNode* */ cc));
extern RegexExt* onig_get_regex_ext(regex_t* reg);

View File

@ -2,7 +2,7 @@
regparse.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2023 K.Kosako
* Copyright (c) 2002-2025 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -105,6 +105,7 @@ OnigSyntaxType OnigSyntaxOniguruma = {
#ifdef USE_WHOLE_OPTIONS
ONIG_SYN_WHOLE_OPTIONS |
#endif
ONIG_SYN_ESC_P_WITH_ONE_CHAR_PROP |
ONIG_SYN_WARN_REDUNDANT_NESTED_REPEAT
)
, ONIG_OPTION_NONE
@ -326,6 +327,18 @@ onig_set_parse_depth_limit(unsigned int depth)
#define DEC_PARSE_DEPTH(d) (d)--
static OnigCodePoint enc_sb_out(OnigEncoding enc)
{
if (ONIGENC_IS_UNICODE_ENCODING(enc)) {
if (ONIGENC_MBC_MINLEN(enc) == 1)
return ASCII_LIMIT + 1;
else
return 0;
}
else {
return 0x100;
}
}
static int
bbuf_init(BBuf* buf, int size)
@ -758,10 +771,14 @@ typedef st_data_t HashDataType; /* 1.6 st.h doesn't define st_data_t type */
#ifdef ONIG_DEBUG
static int
i_print_name_entry(UChar* key, NameEntry* e, void* arg)
i_print_name_entry(st_data_t akey, st_data_t ae, st_data_t arg)
{
int i;
FILE* fp = (FILE* )arg;
FILE* fp;
NameEntry* e;
e = (NameEntry* )ae;
fp = (FILE* )arg;
fprintf(fp, "%s: ", e->name);
if (e->back_num == 0)
@ -793,8 +810,13 @@ onig_print_names(FILE* fp, regex_t* reg)
#endif /* ONIG_DEBUG */
static int
i_free_name_entry(UChar* key, NameEntry* e, void* arg ARG_UNUSED)
i_free_name_entry(st_data_t akey, st_data_t ae, st_data_t arg ARG_UNUSED)
{
UChar* key;
NameEntry* e;
key = (UChar* )akey;
e = (NameEntry* )ae;
xfree(e->name);
if (IS_NOT_NULL(e->back_refs)) xfree(e->back_refs);
xfree(key);
@ -850,8 +872,14 @@ typedef struct {
} INamesArg;
static int
i_names(UChar* key ARG_UNUSED, NameEntry* e, INamesArg* arg)
i_names(st_data_t key ARG_UNUSED, st_data_t ae, st_data_t aarg)
{
NameEntry* e;
INamesArg* arg;
e = (NameEntry* )ae;
arg = (INamesArg* )aarg;
int r = (*(arg->func))(e->name,
e->name + e->name_len,
e->back_num,
@ -883,9 +911,14 @@ onig_foreach_name(regex_t* reg,
}
static int
i_renumber_name(UChar* key ARG_UNUSED, NameEntry* e, GroupNumMap* map)
i_renumber_name(st_data_t key ARG_UNUSED, st_data_t ae, st_data_t amap)
{
int i;
NameEntry* e;
GroupNumMap* map;
e = (NameEntry* )ae;
map = (GroupNumMap* )amap;
if (e->back_num > 1) {
for (i = 0; i < e->back_num; i++) {
@ -1374,9 +1407,14 @@ static int CalloutNameIDCounter;
#ifdef USE_ST_LIBRARY
static int
i_free_callout_name_entry(st_callout_name_key* key, CalloutNameEntry* e,
void* arg ARG_UNUSED)
i_free_callout_name_entry(st_data_t akey, st_data_t ae, st_data_t arg ARG_UNUSED)
{
st_callout_name_key* key;
CalloutNameEntry* e;
key = (st_callout_name_key* )akey;
e = (CalloutNameEntry* )ae;
if (IS_NOT_NULL(e)) {
xfree(e->name);
}
@ -1870,10 +1908,14 @@ typedef intptr_t CalloutTagVal;
#define CALLOUT_TAG_LIST_FLAG_TAG_EXIST (1<<0)
static int
i_callout_callout_list_set(UChar* key, CalloutTagVal e, void* arg)
i_callout_callout_list_set(st_data_t key ARG_UNUSED, st_data_t ae, st_data_t arg)
{
int num;
RegexExt* ext = (RegexExt* )arg;
CalloutTagVal e;
RegexExt* ext;
e = (CalloutTagVal )ae;
ext = (RegexExt* )arg;
num = (int )e - 1;
ext->callout_list[num].flag |= CALLOUT_TAG_LIST_FLAG_TAG_EXIST;
@ -1926,8 +1968,11 @@ onig_callout_tag_is_exist_at_callout_num(regex_t* reg, int callout_num)
}
static int
i_free_callout_tag_entry(UChar* key, CalloutTagVal e, void* arg ARG_UNUSED)
i_free_callout_tag_entry(st_data_t akey, st_data_t e ARG_UNUSED, st_data_t arg ARG_UNUSED)
{
UChar* key;
key = (UChar* )akey;
xfree(key);
return ST_DELETE;
}
@ -3388,6 +3433,34 @@ onig_node_str_set(Node* node, const UChar* s, const UChar* end, int need_free)
return onig_node_str_cat(node, s, end);
}
static int
node_str_remove_char(Node* node, UChar c)
{
UChar* p;
int n;
n = 0;
p = STR_(node)->s;
while (p < STR_(node)->end) {
if (*p == c) {
UChar *q, *q1;
q = q1 = p;
q1++;
while (q1 < STR_(node)->end) {
*q = *q1;
q++; q1++;
}
n++;
STR_(node)->end--;
}
else {
p++;
}
}
return n;
}
static int
node_str_cat_char(Node* node, UChar c)
{
@ -4548,6 +4621,7 @@ typedef struct {
struct {
int ctype;
int not;
int braces;
} prop;
} u;
} PToken;
@ -4807,6 +4881,7 @@ fetch_name_with_level(OnigCodePoint start_code, UChar** src, UChar* end,
end_code = get_name_end_code_point(start_code);
*rlevel = 0;
digit_count = 0;
name_end = end;
r = 0;
@ -5061,7 +5136,7 @@ CC_ESC_WARN(ParseEnv* env, UChar *c)
UChar buf[WARN_BUFSIZE];
onig_snprintf_with_pattern(buf, WARN_BUFSIZE, env->enc,
env->pattern, env->pattern_end,
(UChar* )"character class has '%s' without escape",
"character class has '%s' without escape",
c);
(*onig_warn)((char* )buf);
}
@ -5076,7 +5151,7 @@ CLOSE_BRACKET_WITHOUT_ESC_WARN(ParseEnv* env, UChar* c)
UChar buf[WARN_BUFSIZE];
onig_snprintf_with_pattern(buf, WARN_BUFSIZE, (env)->enc,
(env)->pattern, (env)->pattern_end,
(UChar* )"regular expression has '%s' without escape", c);
"regular expression has '%s' without escape", c);
(*onig_warn)((char* )buf);
}
}
@ -5309,23 +5384,27 @@ fetch_token_cc(PToken* tok, UChar** src, UChar* end, ParseEnv* env, int state)
case 'p':
case 'P':
if (PEND) break;
if (! PEND && PPEEK_IS('{')) {
if (IS_SYNTAX_OP2(syn, ONIG_SYN_OP2_ESC_P_BRACE_CHAR_PROPERTY)) {
PINC;
tok->type = TK_CHAR_PROPERTY;
tok->u.prop.not = c == 'P';
tok->u.prop.braces = 1;
c2 = PPEEK;
if (c2 == '{' &&
IS_SYNTAX_OP2(syn, ONIG_SYN_OP2_ESC_P_BRACE_CHAR_PROPERTY)) {
PINC;
if (!PEND && IS_SYNTAX_OP2(syn, ONIG_SYN_OP2_ESC_P_BRACE_CIRCUMFLEX_NOT)) {
PFETCH(c2);
if (c2 == '^') {
tok->u.prop.not = tok->u.prop.not == 0;
}
else
PUNFETCH;
}
}
}
else if (IS_SYNTAX_BV(syn, ONIG_SYN_ESC_P_WITH_ONE_CHAR_PROP)) {
tok->type = TK_CHAR_PROPERTY;
tok->u.prop.not = c == 'P';
if (!PEND && IS_SYNTAX_OP2(syn, ONIG_SYN_OP2_ESC_P_BRACE_CIRCUMFLEX_NOT)) {
PFETCH(c2);
if (c2 == '^') {
tok->u.prop.not = tok->u.prop.not == 0;
}
else
PUNFETCH;
}
tok->u.prop.braces = 0;
}
break;
@ -5349,10 +5428,8 @@ fetch_token_cc(PToken* tok, UChar** src, UChar* end, ParseEnv* env, int state)
break;
case 'x':
if (PEND) break;
prev = p;
if (PPEEK_IS('{') && IS_SYNTAX_OP(syn, ONIG_SYN_OP_ESC_X_BRACE_HEX8)) {
if (! PEND && PPEEK_IS('{') && IS_SYNTAX_OP(syn, ONIG_SYN_OP_ESC_X_BRACE_HEX8)) {
PINC;
r = scan_hexadecimal_number(&p, end, 0, 8, enc, &code);
if (r < 0) return r;
@ -5400,16 +5477,13 @@ fetch_token_cc(PToken* tok, UChar** src, UChar* end, ParseEnv* env, int state)
break;
case 'u':
if (PEND) break;
prev = p;
if (IS_SYNTAX_OP2(syn, ONIG_SYN_OP2_ESC_U_HEX4)) {
mindigits = maxdigits = 4;
u_hex_digits:
r = scan_hexadecimal_number(&p, end, mindigits, maxdigits, enc, &code);
if (r < 0) return r;
if (p == prev) { /* can't read nothing. */
code = 0; /* but, it's not error */
}
tok->type = TK_CODE_POINT;
tok->base_num = 16;
tok->u.code = code;
@ -5797,10 +5871,8 @@ fetch_token(PToken* tok, UChar** src, UChar* end, ParseEnv* env)
break;
case 'x':
if (PEND) break;
prev = p;
if (PPEEK_IS('{') && IS_SYNTAX_OP(syn, ONIG_SYN_OP_ESC_X_BRACE_HEX8)) {
if (! PEND && PPEEK_IS('{') && IS_SYNTAX_OP(syn, ONIG_SYN_OP_ESC_X_BRACE_HEX8)) {
PINC;
r = scan_hexadecimal_number(&p, end, 0, 8, enc, &code);
if (r < 0) return r;
@ -5843,16 +5915,13 @@ fetch_token(PToken* tok, UChar** src, UChar* end, ParseEnv* env)
break;
case 'u':
if (PEND) break;
prev = p;
mindigits = maxdigits = 4;
if (IS_SYNTAX_OP2(syn, ONIG_SYN_OP2_ESC_U_HEX4)) {
u_hex_digits:
r = scan_hexadecimal_number(&p, end, mindigits, maxdigits, enc, &code);
if (r < 0) return r;
if (p == prev) { /* can't read nothing. */
code = 0; /* but, it's not error */
}
tok->type = TK_CODE_POINT;
tok->base_num = 16;
tok->u.code = code;
@ -5890,6 +5959,7 @@ fetch_token(PToken* tok, UChar** src, UChar* end, ParseEnv* env)
tok->u.backref.by_name = 0;
#ifdef USE_BACKREF_WITH_LEVEL
tok->u.backref.exist_level = 0;
tok->u.backref.level = 0;
#endif
break;
}
@ -6049,21 +6119,28 @@ fetch_token(PToken* tok, UChar** src, UChar* end, ParseEnv* env)
case 'p':
case 'P':
if (!PEND && PPEEK_IS('{') &&
IS_SYNTAX_OP2(syn, ONIG_SYN_OP2_ESC_P_BRACE_CHAR_PROPERTY)) {
PINC;
if (! PEND && PPEEK_IS('{')) {
if (IS_SYNTAX_OP2(syn, ONIG_SYN_OP2_ESC_P_BRACE_CHAR_PROPERTY)) {
PINC;
tok->type = TK_CHAR_PROPERTY;
tok->u.prop.not = c == 'P';
tok->u.prop.braces = 1;
if (! PEND &&
IS_SYNTAX_OP2(syn, ONIG_SYN_OP2_ESC_P_BRACE_CIRCUMFLEX_NOT)) {
PFETCH(c);
if (c == '^') {
tok->u.prop.not = tok->u.prop.not == 0;
}
else
PUNFETCH;
}
}
}
else if (IS_SYNTAX_BV(syn, ONIG_SYN_ESC_P_WITH_ONE_CHAR_PROP)) {
tok->type = TK_CHAR_PROPERTY;
tok->u.prop.not = c == 'P';
if (!PEND &&
IS_SYNTAX_OP2(syn, ONIG_SYN_OP2_ESC_P_BRACE_CIRCUMFLEX_NOT)) {
PFETCH(c);
if (c == '^') {
tok->u.prop.not = tok->u.prop.not == 0;
}
else
PUNFETCH;
}
tok->u.prop.braces = 0;
}
break;
@ -6689,7 +6766,7 @@ prs_posix_bracket(CClassNode* cc, UChar** src, UChar* end, ParseEnv* env)
}
static int
fetch_char_property_to_ctype(UChar** src, UChar* end, ParseEnv* env)
fetch_char_property_to_ctype(UChar** src, UChar* end, int braces, ParseEnv* env)
{
int r;
OnigCodePoint c;
@ -6698,10 +6775,25 @@ fetch_char_property_to_ctype(UChar** src, UChar* end, ParseEnv* env)
p = *src;
enc = env->enc;
r = ONIGERR_END_PATTERN_WITH_UNMATCHED_PARENTHESIS;
start = prev = p;
start = p;
while (!PEND) {
if (braces == 0) {
if (PEND) return ONIGERR_INVALID_CHAR_PROPERTY_NAME;
PFETCH_S(c);
r = ONIGENC_PROPERTY_NAME_TO_CTYPE(enc, start, p);
if (r >= 0) {
*src = p;
}
else {
onig_scan_env_set_error_string(env, r, *src, p);
}
return r;
}
r = ONIGERR_END_PATTERN_WITH_UNMATCHED_PARENTHESIS;
while (! PEND) {
prev = p;
PFETCH_S(c);
if (c == '}') {
@ -6730,7 +6822,7 @@ prs_char_property(Node** np, PToken* tok, UChar** src, UChar* end,
int r, ctype;
CClassNode* cc;
ctype = fetch_char_property_to_ctype(src, end, env);
ctype = fetch_char_property_to_ctype(src, end, tok->u.prop.braces, env);
if (ctype < 0) return ctype;
if (ctype == ONIGENC_CTYPE_WORD) {
@ -6820,9 +6912,17 @@ cc_char_next(CClassNode* cc, OnigCodePoint *from, OnigCodePoint to,
else
return ONIGERR_EMPTY_RANGE_IN_CHAR_CLASS;
}
bitset_set_range(cc->bs, (int )*from, (int )(to < 0xff ? to : 0xff));
r = add_code_range(&(cc->mbuf), env, (OnigCodePoint )*from, to);
if (r < 0) return r;
OnigCodePoint sbout = enc_sb_out(env->enc);
if (*from < sbout)
bitset_set_range(cc->bs, (int )*from, (int )(to < sbout ? to : sbout - 1));
if (to >= sbout) {
r = add_code_range(&(cc->mbuf), env,
(OnigCodePoint )(*from > sbout ? *from : sbout), to);
if (r < 0) return r;
}
}
ccs_range_end:
*state = CS_COMPLETE;
@ -6970,16 +7070,16 @@ prs_cc(Node** np, PToken* tok, UChar** src, UChar* end, ParseEnv* env)
fetched = 0;
}
if (i == 1) {
if (! ONIGENC_IS_VALID_MBC_STRING(env->enc, buf, buf + len)) {
r = ONIGERR_INVALID_WIDE_CHAR_VALUE;
goto err;
}
if (len == 1) {
in_code = (OnigCodePoint )buf[0];
goto crude_single;
}
else {
if (! ONIGENC_IS_VALID_MBC_STRING(env->enc, buf, buf + len)) {
r = ONIGERR_INVALID_WIDE_CHAR_VALUE;
goto err;
}
in_code = ONIGENC_MBC_TO_CODE(env->enc, buf, bufe);
in_type = CV_MB;
}
@ -7038,7 +7138,7 @@ prs_cc(Node** np, PToken* tok, UChar** src, UChar* end, ParseEnv* env)
case TK_CHAR_PROPERTY:
{
int ctype = fetch_char_property_to_ctype(&p, end, env);
int ctype = fetch_char_property_to_ctype(&p, end, tok->u.prop.braces, env);
if (ctype < 0) {
r = ctype;
goto err;
@ -7062,11 +7162,16 @@ prs_cc(Node** np, PToken* tok, UChar** src, UChar* end, ParseEnv* env)
goto val_entry;
}
else if (r == TK_CC_AND) {
range_end_val_with_warning:
CC_ESC_WARN(env, (UChar* )"-");
goto range_end_val;
}
if (curr_type == CV_CPROP) {
if (IS_SYNTAX_BV(env->syntax,
ONIG_SYN_ALLOW_CHAR_TYPE_FOLLOWED_BY_MINUS_IN_CC)) {
goto range_end_val_with_warning;
}
r = ONIGERR_UNMATCHED_RANGE_SPECIFIER_IN_CHAR_CLASS;
goto err;
}
@ -7097,16 +7202,16 @@ prs_cc(Node** np, PToken* tok, UChar** src, UChar* end, ParseEnv* env)
if (r < 0) goto err;
fetched = 1;
if (r == TK_CC_CLOSE)
if (r == TK_CC_CLOSE) {
goto range_end_val; /* allow [a-b-] */
}
else if (r == TK_CC_AND) {
CC_ESC_WARN(env, (UChar* )"-");
goto range_end_val;
goto range_end_val_with_warning;
}
if (IS_SYNTAX_BV(env->syntax, ONIG_SYN_ALLOW_DOUBLE_RANGE_OP_IN_CC)) {
CC_ESC_WARN(env, (UChar* )"-");
goto range_end_val; /* [0-9-a] is allowed as [0-9\-a] */
/* [0-9-a] is allowed as [0-9\-a] */
goto range_end_val_with_warning;
}
r = ONIGERR_UNMATCHED_RANGE_SPECIFIER_IN_CHAR_CLASS;
goto err;
@ -8518,7 +8623,7 @@ assign_quantifier_body(Node* qnode, Node* target, int group, ParseEnv* env)
if (onig_verb_warn != onig_null_warn) {
onig_snprintf_with_pattern(buf, WARN_BUFSIZE, env->enc,
env->pattern, env->pattern_end,
(UChar* )"redundant nested repeat operator");
"redundant nested repeat operator");
(*onig_verb_warn)((char* )buf);
}
goto warn_exit;
@ -8528,7 +8633,7 @@ assign_quantifier_body(Node* qnode, Node* target, int group, ParseEnv* env)
if (onig_verb_warn != onig_null_warn) {
onig_snprintf_with_pattern(buf, WARN_BUFSIZE, env->enc,
env->pattern, env->pattern_end,
(UChar* )"nested repeat operator %s and %s was replaced with '%s'",
"nested repeat operator %s and %s was replaced with '%s'",
PopularQStr[targetq_num], PopularQStr[nestq_num],
ReduceQStr[ReduceTypeTable[targetq_num][nestq_num]]);
(*onig_verb_warn)((char* )buf);
@ -8824,6 +8929,7 @@ prs_exp(Node** np, PToken* tok, int term, UChar** src, UChar* end,
tk_byte:
{
*np = node_new_str_with_options(tok->backp, *src, env->options);
tk_byte2:
CHECK_NULL_RETURN_MEMERR(*np);
while (1) {
@ -9040,7 +9146,15 @@ prs_exp(Node** np, PToken* tok, int term, UChar** src, UChar* end,
}
}
else {
goto tk_byte;
if (tok->type == TK_INTERVAL &&
IS_SYNTAX_OP(env->syntax, ONIG_SYN_OP_ESC_BRACE_INTERVAL)) {
*np = node_new_str_with_options(tok->backp, *src, env->options);
node_str_remove_char(*np, (UChar )'\\');
goto tk_byte2;
}
else {
goto tk_byte;
}
}
break;
@ -9085,8 +9199,14 @@ prs_exp(Node** np, PToken* tok, int term, UChar** src, UChar* end,
if (r == TK_REPEAT || r == TK_INTERVAL) {
Node* target;
if (is_invalid_quantifier_target(*tp))
return ONIGERR_TARGET_OF_REPEAT_OPERATOR_INVALID;
if (is_invalid_quantifier_target(*tp)) {
if (IS_SYNTAX_BV(env->syntax, ONIG_SYN_CONTEXT_INDEP_REPEAT_OPS)) {
if (IS_SYNTAX_BV(env->syntax, ONIG_SYN_CONTEXT_INVALID_REPEAT_OPS))
return ONIGERR_TARGET_OF_REPEAT_OPERATOR_INVALID;
}
return r;
}
INC_PARSE_DEPTH(parse_depth);

View File

@ -2,7 +2,7 @@
regposix.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2022 K.Kosako
* Copyright (c) 2002-2025 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -77,6 +77,7 @@ onig2posix_error_code(OnigPos code)
{ ONIGERR_RETRY_LIMIT_IN_MATCH_OVER, REG_EONIG_INTERNAL },
{ ONIGERR_RETRY_LIMIT_IN_SEARCH_OVER, REG_EONIG_INTERNAL },
{ ONIGERR_SUBEXP_CALL_LIMIT_IN_SEARCH_OVER, REG_EONIG_INTERNAL },
{ ONIGERR_TIME_LIMIT_OVER, REG_EONIG_INTERNAL },
{ ONIGERR_TYPE_BUG, REG_EONIG_INTERNAL },
{ ONIGERR_PARSER_BUG, REG_EONIG_INTERNAL },
{ ONIGERR_STACK_BUG, REG_EONIG_INTERNAL },

View File

@ -2,7 +2,7 @@
regsyntax.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2021 K.Kosako
* Copyright (c) 2002-2025 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -90,7 +90,7 @@ OnigSyntaxType OnigSyntaxEmacs = {
ONIG_SYN_OP_ASTERISK_ZERO_INF | ONIG_SYN_OP_PLUS_ONE_INF |
ONIG_SYN_OP_QMARK_ZERO_ONE | ONIG_SYN_OP_DECIMAL_BACKREF |
ONIG_SYN_OP_LINE_ANCHOR | ONIG_SYN_OP_ESC_CONTROL_CHARS )
, ONIG_SYN_OP2_ESC_GNU_BUF_ANCHOR
, ( ONIG_SYN_OP2_ESC_GNU_BUF_ANCHOR | ONIG_SYN_OP2_QMARK_GROUP_EFFECT )
, ONIG_SYN_ALLOW_EMPTY_RANGE_IN_CC
, ONIG_OPTION_NONE
,
@ -147,7 +147,7 @@ OnigSyntaxType OnigSyntaxJava = {
(( SYN_GNU_REGEX_OP | ONIG_SYN_OP_QMARK_NON_GREEDY |
ONIG_SYN_OP_ESC_CONTROL_CHARS | ONIG_SYN_OP_ESC_C_CONTROL |
ONIG_SYN_OP_ESC_OCTAL3 | ONIG_SYN_OP_ESC_X_HEX2 )
& ~ONIG_SYN_OP_ESC_LTGT_WORD_BEGIN_END )
& ~(ONIG_SYN_OP_ESC_LTGT_WORD_BEGIN_END | ONIG_SYN_OP_POSIX_BRACKET) )
, ( ONIG_SYN_OP2_ESC_CAPITAL_Q_QUOTE | ONIG_SYN_OP2_QMARK_GROUP_EFFECT |
ONIG_SYN_OP2_OPTION_PERL | ONIG_SYN_OP2_PLUS_POSSESSIVE_REPEAT |
ONIG_SYN_OP2_PLUS_POSSESSIVE_INTERVAL | ONIG_SYN_OP2_CCLASS_SET_OP |
@ -155,7 +155,8 @@ OnigSyntaxType OnigSyntaxJava = {
ONIG_SYN_OP2_ESC_P_BRACE_CHAR_PROPERTY )
, ( SYN_GNU_REGEX_BV | ONIG_SYN_ISOLATED_OPTION_CONTINUE_BRANCH |
ONIG_SYN_DIFFERENT_LEN_ALT_LOOK_BEHIND |
ONIG_SYN_VARIABLE_LEN_LOOK_BEHIND )
ONIG_SYN_VARIABLE_LEN_LOOK_BEHIND |
ONIG_SYN_ALLOW_CHAR_TYPE_FOLLOWED_BY_MINUS_IN_CC )
, ONIG_OPTION_SINGLELINE
,
{
@ -188,7 +189,9 @@ OnigSyntaxType OnigSyntaxPerl = {
ONIG_SYN_OP2_ESC_CAPITAL_K_KEEP |
ONIG_SYN_OP2_ESC_CAPITAL_R_GENERAL_NEWLINE |
ONIG_SYN_OP2_ESC_CAPITAL_N_O_SUPER_DOT )
, SYN_GNU_REGEX_BV | ONIG_SYN_ISOLATED_OPTION_CONTINUE_BRANCH
, (SYN_GNU_REGEX_BV | ONIG_SYN_ISOLATED_OPTION_CONTINUE_BRANCH |
ONIG_SYN_ALLOW_CHAR_TYPE_FOLLOWED_BY_MINUS_IN_CC |
ONIG_SYN_ESC_P_WITH_ONE_CHAR_PROP )
, ONIG_OPTION_SINGLELINE
,
{
@ -228,7 +231,9 @@ OnigSyntaxType OnigSyntaxPerl_NG = {
ONIG_SYN_OP2_QMARK_PERL_SUBEXP_CALL )
, ( SYN_GNU_REGEX_BV | ONIG_SYN_ISOLATED_OPTION_CONTINUE_BRANCH |
ONIG_SYN_CAPTURE_ONLY_NAMED_GROUP |
ONIG_SYN_ALLOW_MULTIPLEX_DEFINITION_NAME )
ONIG_SYN_ALLOW_MULTIPLEX_DEFINITION_NAME |
ONIG_SYN_ALLOW_CHAR_TYPE_FOLLOWED_BY_MINUS_IN_CC |
ONIG_SYN_ESC_P_WITH_ONE_CHAR_PROP )
, ONIG_OPTION_SINGLELINE
,
{
@ -247,7 +252,7 @@ OnigSyntaxType OnigSyntaxPython = {
ONIG_SYN_OP_ESC_OCTAL3 | ONIG_SYN_OP_ESC_X_HEX2 |
ONIG_SYN_OP_ESC_CONTROL_CHARS |
ONIG_SYN_OP_ESC_C_CONTROL )
& ~ONIG_SYN_OP_ESC_LTGT_WORD_BEGIN_END )
& ~(ONIG_SYN_OP_ESC_LTGT_WORD_BEGIN_END | ONIG_SYN_OP_POSIX_BRACKET) )
, ( ONIG_SYN_OP2_QMARK_GROUP_EFFECT | ONIG_SYN_OP2_OPTION_PERL |
ONIG_SYN_OP2_QMARK_LPAREN_IF_ELSE |
ONIG_SYN_OP2_ASTERISK_CALLOUT_NAME |

View File

@ -5,10 +5,14 @@
#ifndef ST_INCLUDED
#define ST_INCLUDED
#if SIZEOF_VOIDP == SIZEOF_LONG
#if SIZEOF_VOIDP == SIZEOF_INTPTR_T
typedef intptr_t st_data_t;
#elif SIZEOF_VOIDP == SIZEOF_LONG
typedef unsigned long st_data_t;
#elif SIZEOF_VOIDP == SIZEOF_LONG_LONG
typedef unsigned long long st_data_t;
#else
#error SIZEOF_VOIDP has unexpected value
#endif
#define ST_DATA_T_DEFINED
@ -34,13 +38,6 @@ enum st_retval {ST_CONTINUE, ST_STOP, ST_DELETE, ST_CHECK};
#ifndef _
# define _(args) args
#endif
#ifndef ANYARGS
# ifdef __cplusplus
# define ANYARGS ...
# else
# define ANYARGS
# endif
#endif
st_table *st_init_table _((struct st_hash_type *));
st_table *st_init_table_with_size _((struct st_hash_type *, int));
@ -52,7 +49,7 @@ int st_delete _((st_table *, st_data_t *, st_data_t *));
int st_delete_safe _((st_table *, st_data_t *, st_data_t *, st_data_t));
int st_insert _((st_table *, st_data_t, st_data_t));
int st_lookup _((st_table *, st_data_t, st_data_t *));
int st_foreach _((st_table *, int (*)(ANYARGS), st_data_t));
int st_foreach _((st_table *, int (*)(st_data_t, st_data_t, st_data_t), st_data_t));
void st_add_direct _((st_table *, st_data_t, st_data_t));
void st_free_table _((st_table *));
void st_cleanup_safe _((st_table *, st_data_t));

View File

@ -2,7 +2,7 @@
unicode.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2020 K.Kosako
* Copyright (c) 2002-2025 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -1141,13 +1141,14 @@ onig_unicode_define_user_property(const char* name, OnigCodePoint* ranges)
extern int
onigenc_unicode_is_code_ctype(OnigCodePoint code, unsigned int ctype)
{
if (
#ifdef USE_UNICODE_PROPERTIES
ctype <= ONIGENC_MAX_STD_CTYPE &&
#endif
code < 256) {
if (ctype <= ONIGENC_MAX_STD_CTYPE && code < 256) {
return ONIGENC_IS_UNICODE_ISO_8859_1_CTYPE(code, ctype);
}
#ifndef USE_UNICODE_PROPERTIES
else {
return FALSE;
}
#endif
if (ctype >= CODE_RANGES_NUM) {
int index = ctype - CODE_RANGES_NUM;

View File

@ -1,6 +1,6 @@
/* unicode_egcb_data.c: Generated by make_unicode_egcb_data.py. */
/*-
* Copyright (c) 2017-2023 K.Kosako
* Copyright (c) 2017-2024 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -25,7 +25,7 @@
* SUCH DAMAGE.
*/
#define GRAPHEME_BREAK_PROPERTY_VERSION 150100
#define GRAPHEME_BREAK_PROPERTY_VERSION 160000
/*
CR
@ -43,7 +43,7 @@ V
ZWJ
*/
static int EGCB_RANGE_NUM = 1371;
static int EGCB_RANGE_NUM = 1376;
static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x000000, 0x000009, EGCB_Control },
{0x00000a, 0x00000a, EGCB_LF },
@ -81,7 +81,7 @@ static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x000829, 0x00082d, EGCB_Extend },
{0x000859, 0x00085b, EGCB_Extend },
{0x000890, 0x000891, EGCB_Prepend },
{0x000898, 0x00089f, EGCB_Extend },
{0x000897, 0x00089f, EGCB_Extend },
{0x0008ca, 0x0008e1, EGCB_Extend },
{0x0008e2, 0x0008e2, EGCB_Prepend },
{0x0008e3, 0x000902, EGCB_Extend },
@ -163,14 +163,12 @@ static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x000c82, 0x000c83, EGCB_SpacingMark },
{0x000cbc, 0x000cbc, EGCB_Extend },
{0x000cbe, 0x000cbe, EGCB_SpacingMark },
{0x000cbf, 0x000cbf, EGCB_Extend },
{0x000cc0, 0x000cc1, EGCB_SpacingMark },
{0x000cbf, 0x000cc0, EGCB_Extend },
{0x000cc1, 0x000cc1, EGCB_SpacingMark },
{0x000cc2, 0x000cc2, EGCB_Extend },
{0x000cc3, 0x000cc4, EGCB_SpacingMark },
{0x000cc6, 0x000cc6, EGCB_Extend },
{0x000cc7, 0x000cc8, EGCB_SpacingMark },
{0x000cca, 0x000ccb, EGCB_SpacingMark },
{0x000ccc, 0x000ccd, EGCB_Extend },
{0x000cc6, 0x000cc8, EGCB_Extend },
{0x000cca, 0x000ccd, EGCB_Extend },
{0x000cd5, 0x000cd6, EGCB_Extend },
{0x000ce2, 0x000ce3, EGCB_Extend },
{0x000cf3, 0x000cf3, EGCB_SpacingMark },
@ -235,10 +233,8 @@ static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x001160, 0x0011a7, EGCB_V },
{0x0011a8, 0x0011ff, EGCB_T },
{0x00135d, 0x00135f, EGCB_Extend },
{0x001712, 0x001714, EGCB_Extend },
{0x001715, 0x001715, EGCB_SpacingMark },
{0x001732, 0x001733, EGCB_Extend },
{0x001734, 0x001734, EGCB_SpacingMark },
{0x001712, 0x001715, EGCB_Extend },
{0x001732, 0x001734, EGCB_Extend },
{0x001752, 0x001753, EGCB_Extend },
{0x001772, 0x001773, EGCB_Extend },
{0x0017b4, 0x0017b5, EGCB_Extend },
@ -278,29 +274,23 @@ static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x001ab0, 0x001ace, EGCB_Extend },
{0x001b00, 0x001b03, EGCB_Extend },
{0x001b04, 0x001b04, EGCB_SpacingMark },
{0x001b34, 0x001b3a, EGCB_Extend },
{0x001b3b, 0x001b3b, EGCB_SpacingMark },
{0x001b3c, 0x001b3c, EGCB_Extend },
{0x001b3d, 0x001b41, EGCB_SpacingMark },
{0x001b42, 0x001b42, EGCB_Extend },
{0x001b43, 0x001b44, EGCB_SpacingMark },
{0x001b34, 0x001b3d, EGCB_Extend },
{0x001b3e, 0x001b41, EGCB_SpacingMark },
{0x001b42, 0x001b44, EGCB_Extend },
{0x001b6b, 0x001b73, EGCB_Extend },
{0x001b80, 0x001b81, EGCB_Extend },
{0x001b82, 0x001b82, EGCB_SpacingMark },
{0x001ba1, 0x001ba1, EGCB_SpacingMark },
{0x001ba2, 0x001ba5, EGCB_Extend },
{0x001ba6, 0x001ba7, EGCB_SpacingMark },
{0x001ba8, 0x001ba9, EGCB_Extend },
{0x001baa, 0x001baa, EGCB_SpacingMark },
{0x001bab, 0x001bad, EGCB_Extend },
{0x001ba8, 0x001bad, EGCB_Extend },
{0x001be6, 0x001be6, EGCB_Extend },
{0x001be7, 0x001be7, EGCB_SpacingMark },
{0x001be8, 0x001be9, EGCB_Extend },
{0x001bea, 0x001bec, EGCB_SpacingMark },
{0x001bed, 0x001bed, EGCB_Extend },
{0x001bee, 0x001bee, EGCB_SpacingMark },
{0x001bef, 0x001bf1, EGCB_Extend },
{0x001bf2, 0x001bf3, EGCB_SpacingMark },
{0x001bef, 0x001bf3, EGCB_Extend },
{0x001c24, 0x001c2b, EGCB_SpacingMark },
{0x001c2c, 0x001c33, EGCB_Extend },
{0x001c34, 0x001c35, EGCB_SpacingMark },
@ -344,7 +334,8 @@ static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x00a8ff, 0x00a8ff, EGCB_Extend },
{0x00a926, 0x00a92d, EGCB_Extend },
{0x00a947, 0x00a951, EGCB_Extend },
{0x00a952, 0x00a953, EGCB_SpacingMark },
{0x00a952, 0x00a952, EGCB_SpacingMark },
{0x00a953, 0x00a953, EGCB_Extend },
{0x00a960, 0x00a97c, EGCB_L },
{0x00a980, 0x00a982, EGCB_Extend },
{0x00a983, 0x00a983, EGCB_SpacingMark },
@ -353,7 +344,8 @@ static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x00a9b6, 0x00a9b9, EGCB_Extend },
{0x00a9ba, 0x00a9bb, EGCB_SpacingMark },
{0x00a9bc, 0x00a9bd, EGCB_Extend },
{0x00a9be, 0x00a9c0, EGCB_SpacingMark },
{0x00a9be, 0x00a9bf, EGCB_SpacingMark },
{0x00a9c0, 0x00a9c0, EGCB_Extend },
{0x00a9e5, 0x00a9e5, EGCB_Extend },
{0x00aa29, 0x00aa2e, EGCB_Extend },
{0x00aa2f, 0x00aa30, EGCB_SpacingMark },
@ -1197,8 +1189,9 @@ static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x010a3f, 0x010a3f, EGCB_Extend },
{0x010ae5, 0x010ae6, EGCB_Extend },
{0x010d24, 0x010d27, EGCB_Extend },
{0x010d69, 0x010d6d, EGCB_Extend },
{0x010eab, 0x010eac, EGCB_Extend },
{0x010efd, 0x010eff, EGCB_Extend },
{0x010efc, 0x010eff, EGCB_Extend },
{0x010f46, 0x010f50, EGCB_Extend },
{0x010f82, 0x010f85, EGCB_Extend },
{0x011000, 0x011000, EGCB_SpacingMark },
@ -1226,7 +1219,8 @@ static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x011182, 0x011182, EGCB_SpacingMark },
{0x0111b3, 0x0111b5, EGCB_SpacingMark },
{0x0111b6, 0x0111be, EGCB_Extend },
{0x0111bf, 0x0111c0, EGCB_SpacingMark },
{0x0111bf, 0x0111bf, EGCB_SpacingMark },
{0x0111c0, 0x0111c0, EGCB_Extend },
{0x0111c2, 0x0111c3, EGCB_Prepend },
{0x0111c9, 0x0111cc, EGCB_Extend },
{0x0111ce, 0x0111ce, EGCB_SpacingMark },
@ -1234,9 +1228,7 @@ static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x01122c, 0x01122e, EGCB_SpacingMark },
{0x01122f, 0x011231, EGCB_Extend },
{0x011232, 0x011233, EGCB_SpacingMark },
{0x011234, 0x011234, EGCB_Extend },
{0x011235, 0x011235, EGCB_SpacingMark },
{0x011236, 0x011237, EGCB_Extend },
{0x011234, 0x011237, EGCB_Extend },
{0x01123e, 0x01123e, EGCB_Extend },
{0x011241, 0x011241, EGCB_Extend },
{0x0112df, 0x0112df, EGCB_Extend },
@ -1250,11 +1242,24 @@ static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x011340, 0x011340, EGCB_Extend },
{0x011341, 0x011344, EGCB_SpacingMark },
{0x011347, 0x011348, EGCB_SpacingMark },
{0x01134b, 0x01134d, EGCB_SpacingMark },
{0x01134b, 0x01134c, EGCB_SpacingMark },
{0x01134d, 0x01134d, EGCB_Extend },
{0x011357, 0x011357, EGCB_Extend },
{0x011362, 0x011363, EGCB_SpacingMark },
{0x011366, 0x01136c, EGCB_Extend },
{0x011370, 0x011374, EGCB_Extend },
{0x0113b8, 0x0113b8, EGCB_Extend },
{0x0113b9, 0x0113ba, EGCB_SpacingMark },
{0x0113bb, 0x0113c0, EGCB_Extend },
{0x0113c2, 0x0113c2, EGCB_Extend },
{0x0113c5, 0x0113c5, EGCB_Extend },
{0x0113c7, 0x0113c9, EGCB_Extend },
{0x0113ca, 0x0113ca, EGCB_SpacingMark },
{0x0113cc, 0x0113cd, EGCB_SpacingMark },
{0x0113ce, 0x0113d0, EGCB_Extend },
{0x0113d1, 0x0113d1, EGCB_Prepend },
{0x0113d2, 0x0113d2, EGCB_Extend },
{0x0113e1, 0x0113e2, EGCB_Extend },
{0x011435, 0x011437, EGCB_SpacingMark },
{0x011438, 0x01143f, EGCB_Extend },
{0x011440, 0x011441, EGCB_SpacingMark },
@ -1291,10 +1296,10 @@ static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x0116ac, 0x0116ac, EGCB_SpacingMark },
{0x0116ad, 0x0116ad, EGCB_Extend },
{0x0116ae, 0x0116af, EGCB_SpacingMark },
{0x0116b0, 0x0116b5, EGCB_Extend },
{0x0116b6, 0x0116b6, EGCB_SpacingMark },
{0x0116b7, 0x0116b7, EGCB_Extend },
{0x01171d, 0x01171f, EGCB_Extend },
{0x0116b0, 0x0116b7, EGCB_Extend },
{0x01171d, 0x01171d, EGCB_Extend },
{0x01171e, 0x01171e, EGCB_SpacingMark },
{0x01171f, 0x01171f, EGCB_Extend },
{0x011722, 0x011725, EGCB_Extend },
{0x011726, 0x011726, EGCB_SpacingMark },
{0x011727, 0x01172b, EGCB_Extend },
@ -1305,9 +1310,7 @@ static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x011930, 0x011930, EGCB_Extend },
{0x011931, 0x011935, EGCB_SpacingMark },
{0x011937, 0x011938, EGCB_SpacingMark },
{0x01193b, 0x01193c, EGCB_Extend },
{0x01193d, 0x01193d, EGCB_SpacingMark },
{0x01193e, 0x01193e, EGCB_Extend },
{0x01193b, 0x01193e, EGCB_Extend },
{0x01193f, 0x01193f, EGCB_Prepend },
{0x011940, 0x011940, EGCB_SpacingMark },
{0x011941, 0x011941, EGCB_Prepend },
@ -1364,28 +1367,29 @@ static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x011f34, 0x011f35, EGCB_SpacingMark },
{0x011f36, 0x011f3a, EGCB_Extend },
{0x011f3e, 0x011f3f, EGCB_SpacingMark },
{0x011f40, 0x011f40, EGCB_Extend },
{0x011f41, 0x011f41, EGCB_SpacingMark },
{0x011f42, 0x011f42, EGCB_Extend },
{0x011f40, 0x011f42, EGCB_Extend },
{0x011f5a, 0x011f5a, EGCB_Extend },
{0x013430, 0x01343f, EGCB_Control },
{0x013440, 0x013440, EGCB_Extend },
{0x013447, 0x013455, EGCB_Extend },
{0x01611e, 0x016129, EGCB_Extend },
{0x01612a, 0x01612c, EGCB_SpacingMark },
{0x01612d, 0x01612f, EGCB_Extend },
{0x016af0, 0x016af4, EGCB_Extend },
{0x016b30, 0x016b36, EGCB_Extend },
{0x016d63, 0x016d63, EGCB_V },
{0x016d67, 0x016d6a, EGCB_V },
{0x016f4f, 0x016f4f, EGCB_Extend },
{0x016f51, 0x016f87, EGCB_SpacingMark },
{0x016f8f, 0x016f92, EGCB_Extend },
{0x016fe4, 0x016fe4, EGCB_Extend },
{0x016ff0, 0x016ff1, EGCB_SpacingMark },
{0x016ff0, 0x016ff1, EGCB_Extend },
{0x01bc9d, 0x01bc9e, EGCB_Extend },
{0x01bca0, 0x01bca3, EGCB_Control },
{0x01cf00, 0x01cf2d, EGCB_Extend },
{0x01cf30, 0x01cf46, EGCB_Extend },
{0x01d165, 0x01d165, EGCB_Extend },
{0x01d166, 0x01d166, EGCB_SpacingMark },
{0x01d167, 0x01d169, EGCB_Extend },
{0x01d16d, 0x01d16d, EGCB_SpacingMark },
{0x01d16e, 0x01d172, EGCB_Extend },
{0x01d165, 0x01d169, EGCB_Extend },
{0x01d16d, 0x01d172, EGCB_Extend },
{0x01d173, 0x01d17a, EGCB_Control },
{0x01d17b, 0x01d182, EGCB_Extend },
{0x01d185, 0x01d18b, EGCB_Extend },
@ -1407,6 +1411,7 @@ static EGCB_RANGE_TYPE EGCB_RANGES[] = {
{0x01e2ae, 0x01e2ae, EGCB_Extend },
{0x01e2ec, 0x01e2ef, EGCB_Extend },
{0x01e4ec, 0x01e4ef, EGCB_Extend },
{0x01e5ee, 0x01e5ef, EGCB_Extend },
{0x01e8d0, 0x01e8d6, EGCB_Extend },
{0x01e944, 0x01e94a, EGCB_Extend },
{0x01f1e6, 0x01f1ff, EGCB_Regional_Indicator },

File diff suppressed because it is too large Load Diff

View File

@ -1,6 +1,6 @@
/* This file was converted by gperf_fold_key_conv.py
from gperf output file. */
/* ANSI-C code produced by gperf version 3.1 */
/* ANSI-C code produced by gperf version 3.2.1 */
/* Command-line: gperf -n -C -T -c -t -j1 -L ANSI-C -F,-1 -N onigenc_unicode_fold2_key unicode_fold2_key.gperf */
/* Computed positions: -k'3,6' */
@ -9,7 +9,7 @@
/* This gperf source file was generated by make_unicode_fold_data.py */
/*-
* Copyright (c) 2017-2023 K.Kosako
* Copyright (c) 2017-2024 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -82,6 +82,7 @@ hash(OnigCodePoint codes[])
59, 59, 59, 59, 59, 59, 59, 59, 59, 59,
59, 59, 59, 59, 59, 59
};
return asso_values[(unsigned char)onig_codes_byte_at(codes, 5)] + asso_values[(unsigned char)onig_codes_byte_at(codes, 2)];
}

View File

@ -1,6 +1,6 @@
/* This file was converted by gperf_fold_key_conv.py
from gperf output file. */
/* ANSI-C code produced by gperf version 3.1 */
/* ANSI-C code produced by gperf version 3.2.1 */
/* Command-line: gperf -n -C -T -c -t -j1 -L ANSI-C -F,-1 -N onigenc_unicode_fold3_key unicode_fold3_key.gperf */
/* Computed positions: -k'3,6,9' */
@ -9,7 +9,7 @@
/* This gperf source file was generated by make_unicode_fold_data.py */
/*-
* Copyright (c) 2017-2023 K.Kosako
* Copyright (c) 2017-2024 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -82,6 +82,7 @@ hash(OnigCodePoint codes[])
14, 14, 14, 14, 14, 14, 14, 14, 14, 14,
14, 14, 14, 14, 14, 14
};
return asso_values[(unsigned char)onig_codes_byte_at(codes, 8)] + asso_values[(unsigned char)onig_codes_byte_at(codes, 5)] + asso_values[(unsigned char)onig_codes_byte_at(codes, 2)];
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,6 +1,6 @@
/* unicode_wb_data.c: Generated by make_unicode_wb_data.py. */
/*-
* Copyright (c) 2019-2023 K.Kosako
* Copyright (c) 2019-2024 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -25,7 +25,7 @@
* SUCH DAMAGE.
*/
#define WORD_BREAK_PROPERTY_VERSION 150100
#define WORD_BREAK_PROPERTY_VERSION 160000
/*
ALetter
@ -48,7 +48,7 @@ WSegSpace
ZWJ
*/
static int WB_RANGE_NUM = 1052;
static int WB_RANGE_NUM = 1085;
static WB_RANGE_TYPE WB_RANGES[] = {
{0x00000a, 0x00000a, WB_LF },
{0x00000b, 0x00000c, WB_Newline },
@ -156,7 +156,7 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x000870, 0x000887, WB_ALetter },
{0x000889, 0x00088e, WB_ALetter },
{0x000890, 0x000891, WB_Numeric },
{0x000898, 0x00089f, WB_Extend },
{0x000897, 0x00089f, WB_Extend },
{0x0008a0, 0x0008c9, WB_ALetter },
{0x0008ca, 0x0008e1, WB_Extend },
{0x0008e2, 0x0008e2, WB_Numeric },
@ -418,7 +418,7 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x001920, 0x00192b, WB_Extend },
{0x001930, 0x00193b, WB_Extend },
{0x001946, 0x00194f, WB_Numeric },
{0x0019d0, 0x0019d9, WB_Numeric },
{0x0019d0, 0x0019da, WB_Numeric },
{0x001a00, 0x001a16, WB_ALetter },
{0x001a17, 0x001a1b, WB_Extend },
{0x001a55, 0x001a5e, WB_Extend },
@ -446,7 +446,7 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x001c4d, 0x001c4f, WB_ALetter },
{0x001c50, 0x001c59, WB_Numeric },
{0x001c5a, 0x001c7d, WB_ALetter },
{0x001c80, 0x001c88, WB_ALetter },
{0x001c80, 0x001c8a, WB_ALetter },
{0x001c90, 0x001cba, WB_ALetter },
{0x001cbd, 0x001cbf, WB_ALetter },
{0x001cd0, 0x001cd2, WB_Extend },
@ -564,10 +564,10 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x00a69e, 0x00a69f, WB_Extend },
{0x00a6a0, 0x00a6ef, WB_ALetter },
{0x00a6f0, 0x00a6f1, WB_Extend },
{0x00a708, 0x00a7ca, WB_ALetter },
{0x00a708, 0x00a7cd, WB_ALetter },
{0x00a7d0, 0x00a7d1, WB_ALetter },
{0x00a7d3, 0x00a7d3, WB_ALetter },
{0x00a7d5, 0x00a7d9, WB_ALetter },
{0x00a7d5, 0x00a7dc, WB_ALetter },
{0x00a7f2, 0x00a801, WB_ALetter },
{0x00a802, 0x00a802, WB_Extend },
{0x00a803, 0x00a805, WB_ALetter },
@ -647,9 +647,7 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x00fd92, 0x00fdc7, WB_ALetter },
{0x00fdf0, 0x00fdfb, WB_ALetter },
{0x00fe00, 0x00fe0f, WB_Extend },
{0x00fe10, 0x00fe10, WB_MidNum },
{0x00fe13, 0x00fe13, WB_MidLetter },
{0x00fe14, 0x00fe14, WB_MidNum },
{0x00fe20, 0x00fe2f, WB_Extend },
{0x00fe33, 0x00fe34, WB_ExtendNumLet },
{0x00fe4d, 0x00fe4f, WB_ExtendNumLet },
@ -711,6 +709,7 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x0105a3, 0x0105b1, WB_ALetter },
{0x0105b3, 0x0105b9, WB_ALetter },
{0x0105bb, 0x0105bc, WB_ALetter },
{0x0105c0, 0x0105f3, WB_ALetter },
{0x010600, 0x010736, WB_ALetter },
{0x010740, 0x010755, WB_ALetter },
{0x010760, 0x010767, WB_ALetter },
@ -755,10 +754,15 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x010d00, 0x010d23, WB_ALetter },
{0x010d24, 0x010d27, WB_Extend },
{0x010d30, 0x010d39, WB_Numeric },
{0x010d40, 0x010d49, WB_Numeric },
{0x010d4a, 0x010d65, WB_ALetter },
{0x010d69, 0x010d6d, WB_Extend },
{0x010d6f, 0x010d85, WB_ALetter },
{0x010e80, 0x010ea9, WB_ALetter },
{0x010eab, 0x010eac, WB_Extend },
{0x010eb0, 0x010eb1, WB_ALetter },
{0x010efd, 0x010eff, WB_Extend },
{0x010ec2, 0x010ec4, WB_ALetter },
{0x010efc, 0x010eff, WB_Extend },
{0x010f00, 0x010f1c, WB_ALetter },
{0x010f27, 0x010f27, WB_ALetter },
{0x010f30, 0x010f45, WB_ALetter },
@ -834,6 +838,20 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x011362, 0x011363, WB_Extend },
{0x011366, 0x01136c, WB_Extend },
{0x011370, 0x011374, WB_Extend },
{0x011380, 0x011389, WB_ALetter },
{0x01138b, 0x01138b, WB_ALetter },
{0x01138e, 0x01138e, WB_ALetter },
{0x011390, 0x0113b5, WB_ALetter },
{0x0113b7, 0x0113b7, WB_ALetter },
{0x0113b8, 0x0113c0, WB_Extend },
{0x0113c2, 0x0113c2, WB_Extend },
{0x0113c5, 0x0113c5, WB_Extend },
{0x0113c7, 0x0113ca, WB_Extend },
{0x0113cc, 0x0113d0, WB_Extend },
{0x0113d1, 0x0113d1, WB_ALetter },
{0x0113d2, 0x0113d2, WB_Extend },
{0x0113d3, 0x0113d3, WB_ALetter },
{0x0113e1, 0x0113e2, WB_Extend },
{0x011400, 0x011434, WB_ALetter },
{0x011435, 0x011446, WB_Extend },
{0x011447, 0x01144a, WB_ALetter },
@ -858,6 +876,7 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x0116ab, 0x0116b7, WB_Extend },
{0x0116b8, 0x0116b8, WB_ALetter },
{0x0116c0, 0x0116c9, WB_Numeric },
{0x0116d0, 0x0116e3, WB_Numeric },
{0x01171d, 0x01172b, WB_Extend },
{0x011730, 0x011739, WB_Numeric },
{0x011800, 0x01182b, WB_ALetter },
@ -897,6 +916,8 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x011a8a, 0x011a99, WB_Extend },
{0x011a9d, 0x011a9d, WB_ALetter },
{0x011ab0, 0x011af8, WB_ALetter },
{0x011bc0, 0x011be0, WB_ALetter },
{0x011bf0, 0x011bf9, WB_Numeric },
{0x011c00, 0x011c08, WB_ALetter },
{0x011c0a, 0x011c2e, WB_ALetter },
{0x011c2f, 0x011c36, WB_Extend },
@ -934,6 +955,7 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x011f34, 0x011f3a, WB_Extend },
{0x011f3e, 0x011f42, WB_Extend },
{0x011f50, 0x011f59, WB_Numeric },
{0x011f5a, 0x011f5a, WB_Extend },
{0x011fb0, 0x011fb0, WB_ALetter },
{0x012000, 0x012399, WB_ALetter },
{0x012400, 0x01246e, WB_ALetter },
@ -944,7 +966,11 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x013440, 0x013440, WB_Extend },
{0x013441, 0x013446, WB_ALetter },
{0x013447, 0x013455, WB_Extend },
{0x013460, 0x0143fa, WB_ALetter },
{0x014400, 0x014646, WB_ALetter },
{0x016100, 0x01611d, WB_ALetter },
{0x01611e, 0x01612f, WB_Extend },
{0x016130, 0x016139, WB_Numeric },
{0x016800, 0x016a38, WB_ALetter },
{0x016a40, 0x016a5e, WB_ALetter },
{0x016a60, 0x016a69, WB_Numeric },
@ -958,6 +984,8 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x016b50, 0x016b59, WB_Numeric },
{0x016b63, 0x016b77, WB_ALetter },
{0x016b7d, 0x016b8f, WB_ALetter },
{0x016d40, 0x016d6c, WB_ALetter },
{0x016d70, 0x016d79, WB_Numeric },
{0x016e40, 0x016e7f, WB_ALetter },
{0x016f00, 0x016f4a, WB_ALetter },
{0x016f4f, 0x016f4f, WB_Extend },
@ -982,6 +1010,7 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x01bc90, 0x01bc99, WB_ALetter },
{0x01bc9d, 0x01bc9e, WB_Extend },
{0x01bca0, 0x01bca3, WB_Format },
{0x01ccf0, 0x01ccf9, WB_Numeric },
{0x01cf00, 0x01cf2d, WB_Extend },
{0x01cf30, 0x01cf46, WB_Extend },
{0x01d165, 0x01d169, WB_Extend },
@ -1050,6 +1079,10 @@ static WB_RANGE_TYPE WB_RANGES[] = {
{0x01e4d0, 0x01e4eb, WB_ALetter },
{0x01e4ec, 0x01e4ef, WB_Extend },
{0x01e4f0, 0x01e4f9, WB_Numeric },
{0x01e5d0, 0x01e5ed, WB_ALetter },
{0x01e5ee, 0x01e5ef, WB_Extend },
{0x01e5f0, 0x01e5f0, WB_ALetter },
{0x01e5f1, 0x01e5fa, WB_Numeric },
{0x01e7e0, 0x01e7e6, WB_ALetter },
{0x01e7e8, 0x01e7eb, WB_ALetter },
{0x01e7ed, 0x01e7ee, WB_ALetter },

View File

@ -2,7 +2,7 @@
utf8.c - Oniguruma (regular expression library)
**********************************************************************/
/*-
* Copyright (c) 2002-2019 K.Kosako
* Copyright (c) 2002-2025 K.Kosako
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -57,7 +57,7 @@ static const int EncLen_UTF8[] = {
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
#ifdef USE_RFC3629_RANGE
@ -79,7 +79,7 @@ is_valid_mbc_string(const UChar* p, const UChar* end)
int i, len;
while (p < end) {
if (! utf8_islead(*p))
if (*p > 0xf4 || (*p > 0x7f && *p < 0xc2))
return FALSE;
len = mbc_enc_len(p++);