Index of Section 3 Manual Pages
| Interix / SUA | Tcl_RegExpRange.3 | Interix / SUA |
Tcl_RegExpMatch(3) Tcl Library Procedures Tcl_RegExpMatch(3)
_________________________________________________________________
NAME
Tcl_RegExpMatch, Tcl_RegExpCompile, Tcl_RegExpExec,
Tcl_RegExpRange, Tcl_GetRegExpFromObj, Tcl_RegExpMatchObj,
Tcl_RegExpExecObj, Tcl_RegExpGetInfo - Pattern matching
with regular expressions
SYNOPSIS
#include
int
Tcl_RegExpMatchObj(interp, strObj, patObj)
int
Tcl_RegExpMatch(interp, string, pattern)
Tcl_RegExp
Tcl_RegExpCompile(interp, pattern)
int
Tcl_RegExpExec(interp, regexp, string, start)
Tcl_RegExpRange(regexp, index, startPtr, endPtr)
Tcl_RegExp |
Tcl_GetRegExpFromObj(interp, patObj, cflags) |
int |
Tcl_RegExpExecObj(interp, regexp, objPtr, offset, nmatches, eflags)|
Tcl_RegExpGetInfo(regexp, infoPtr) |
ARGUMENTS
Tcl_Interp *interp (in) Tcl interpreter to use
for error reporting. The
interpreter may be NULL
if no error reporting is
desired. |
Tcl_Obj *strObj (in/out) ||
Refers to the object from |
which to get the string |
to search. The internal |
representation of the |
object may be converted |
to a form that can be |
efficiently searched. |
Tcl_Obj *patObj (in/out) ||
Refers to the object from |
which to get a regular |
expression. The compiled |
regular expression is |
cached in the object.
char *string (in) String to check for a
match with a regular
expression.
CONST char *pattern (in) String in the form of a
regular expression pat-
tern.
Tcl_RegExp regexp (in) Compiled regular expres-
sion. Must have been
returned previously by
Tcl_GetRegExpFromObj or
Tcl_RegExpCompile.
char *start (in) If string is just a por-
tion of some other
string, this argument
identifies the beginning
of the larger string. If
it isn't the same as
string, then no ^ matches
will be allowed.
int index (in) Specifies which range is
desired: 0 means the
range of the entire
match, 1 or greater means
the range that matched a
parenthesized sub-expres-
sion. |
CONST |
char **startPtr(out) | |
The address of the first |
character in the range is |
stored here, or NULL if |
there is no such range. |
CONST |
char **endPtr (out) | |
The address of the char- |
acter just after the last |
one in the range is |
stored here, or NULL if |
there is no such range. |
int cflags (in) ||
OR-ed combination of com- |
pilation flags. See below |
for more information. |
Tcl_Obj *objPtr (in/out) ||
An object which contains |
the string to check for a |
match with a regular |
expression. |
int off- |
set (in) | |
The character offset into |
the string where matching |
should begin. The value |
of the offset has no |
impact on ^ matches. |
This behavior is con- |
trolled by eflags. |
int nmatches (in) ||
The number of matching |
subexpressions that |
should be remembered for |
later use. If this value |
is 0, then no subexpres- |
sion match information |
will be computed. If the |
value is -1, then all of |
the matching subexpres- |
sions will be remembered. |
Any other value will be |
taken as the maximum num- |
ber of subexpressions to |
remember. |
int eflags (in) ||
OR-ed combination of the |
values TCL_REG_NOTBOL and |
TCL_REG_NOTEOL. See |
below for more informa- |
tion. |
Tcl_RegEx- |
pInfo *infoPtr(out) | |
The address of the loca- |
tion where information |
about a previous match |
should be stored by |
Tcl_RegExpGetInfo.
_________________________________________________________________
DESCRIPTION
Tcl_RegExpMatch determines whether its pattern argument
matches regexp, where regexp is interpreted as a regular
expression using the rules in the re_syntax reference
page. If there is a match then Tcl_RegExpMatch returns 1.
If there is no match then Tcl_RegExpMatch returns 0. If
an error occurs in the matching process (e.g. pattern is
not a valid regular expression) then Tcl_RegExpMatch
returns -1 and leaves an error message in the interpreter
result. Tcl_RegExpMatchObj is similar to Tcl_RegExpMatch |
except it operates on the Tcl objects strObj and patObj |
instead of UTF strings. Tcl_RegExpMatchObj is generally |
more efficient than Tcl_RegExpMatch, so it is the pre- |
ferred interface.
Tcl_RegExpCompile, Tcl_RegExpExec, and Tcl_RegExpRange
provide lower-level access to the regular expression pat-
tern matcher. Tcl_RegExpCompile compiles a regular
expression string into the internal form used for effi-
cient pattern matching. The return value is a token for
this compiled form, which can be used in subsequent calls
to Tcl_RegExpExec or Tcl_RegExpRange. If an error occurs
while compiling the regular expression then Tcl_RegExpCom-
pile returns NULL and leaves an error message in the
interpreter result. Note: the return value from Tcl_Reg-
ExpCompile is only valid up to the next call to Tcl_RegEx-
pCompile; it is not safe to retain these values for long
periods of time.
Tcl_RegExpExec executes the regular expression pattern
matcher. It returns 1 if string contains a range of char-
acters that match regexp, 0 if no match is found, and -1
if an error occurs. In the case of an error, Tcl_RegEx-
pExec leaves an error message in the interpreter result.
When searching a string for multiple matches of a pattern,
it is important to distinguish between the start of the
original string and the start of the current search. For
example, when searching for the second occurrence of a
match, the string argument might point to the character
just after the first match; however, it is important for
the pattern matcher to know that this is not the start of
the entire string, so that it doesn't allow ^ atoms in the
pattern to match. The start argument provides this infor-
mation by pointing to the start of the overall string con-
taining string. Start will be less than or equal to
string; if it is less than string then no ^ matches will
be allowed.
Tcl_RegExpRange may be invoked after Tcl_RegExpExec
returns; it provides detailed information about what
ranges of the string matched what parts of the pattern.
Tcl_RegExpRange returns a pair of pointers in *startPtr
and *endPtr that identify a range of characters in the
source string for the most recent call to Tcl_RegExpExec.
Index indicates which of several ranges is desired: if
index is 0, information is returned about the overall
range of characters that matched the entire pattern; oth-
erwise, information is returned about the range of charac-
ters that matched the index'th parenthesized subexpression
within the pattern. If there is no range corresponding to
index then NULL is stored in *startPtr and *endPtr.
Tcl_GetRegExpFromObj, Tcl_RegExpExecObj, and Tcl_Reg- |
ExpGetInfo are object interfaces that provide the most |
direct control of Henry Spencer's regular expression |
library. For users that need to modify compilation and |
execution options directly, it is recommended that you use |
these interfaces instead of calling the internal regexp |
functions. These interfaces handle the details of UTF to |
Unicode translations as well as providing improved perfor- |
mance through caching in the pattern and string objects. |
Tcl_GetRegExpFromObj attempts to return a compiled regular |
expression from the patObj. If the object does not |
already contain a compiled regular expression it will |
attempt to create one from the string in the object and |
assign it to the internal representation of the patObj. |
The return value of this function is of type Tcl_RegExp. |
The return value is a token for this compiled form, which |
can be used in subsequent calls to Tcl_RegExpExecObj or |
Tcl_RegExpGetInfo. If an error occurs while compiling the |
regular expression then Tcl_GetRegExpFromObj returns NULL |
and leaves an error message in the interpreter result. |
The regular expression token can be used as long as the |
internal representation of patObj refers to the compiled |
form. The eflags argument is a bitwise OR of zero or more |
of the following flags that control the compilation of |
patObj: |
TCL_REG_ADVANCED ||
Compile advanced regular expressions (`AREs'). |
This mode corresponds to the normal regular |
expression syntax accepted by the Tcl regexp and |
regsub commands. |
TCL_REG_EXTENDED ||
Compile extended regular expressions (`EREs'). |
This mode corresponds to the regular expression |
syntax recognized by Tcl 8.0 and earlier ver- |
sions. |
TCL_REG_BASIC ||
Compile basic regular expressions (`BREs'). This |
mode corresponds to the regular expression syntax |
recognized by common Unix utilities like sed and |
grep. This is the default if no flags are speci- |
fied. |
TCL_REG_EXPANDED ||
Compile the regular expression (basic, extended, |
or advanced) using an expanded syntax that allows |
comments and whitespace. This mode causes non- |
backslashed non-bracket-expression white space |
and #-to-end-of-line comments to be ignored. |
TCL_REG_QUOTE ||
Compile a literal string, with all characters |
treated as ordinary characters. |
TCL_REG_NOCASE ||
Compile for matching that ignores upper/lower |
case distinctions. |
TCL_REG_NEW- |
LINE | |
Compile for newline-sensitive matching. By |
default, newline is a completely ordinary charac- |
ter with no special meaning in either regular |
expressions or strings. With this flag, `[^' |
bracket expressions and `.' never match newline, |
`^' matches an empty string after any newline in |
addition to its normal function, and `$' matches |
an empty string before any newline in addition to |
its normal function. REG_NEWLINE is the bitwise |
OR of REG_NLSTOP and REG_NLANCH. |
TCL_REG_NLSTOP ||
Compile for partial newline-sensitive matching, |
with the behavior of `[^' bracket expressions and |
`.' affected, but not the behavior of `^' and |
`$'. In this mode, `[^' bracket expressions and |
`.' never match newline. |
TCL_REG_NLANCH ||
Compile for inverse partial newline-sensitive |
matching, with the behavior of of `^' and `$' |
(the ``anchors'') affected, but not the behavior |
of `[^' bracket expressions and `.'. In this |
mode `^' matches an empty string after any new- |
line in addition to its normal function, and `$' |
matches an empty string before any newline in |
addition to its normal function. |
TCL_REG_NOSUB ||
Compile for matching that reports only success or |
failure, not what was matched. This reduces com- |
pile overhead and may improve performance. Sub- |
sequent calls to Tcl_RegExpGetInfo or Tcl_RegEx- |
pRange will not report any match information. |
TCL_REG_CAN- |
MATCH | |
Compile for matching that reports the potential |
to complete a partial match given more text (see |
below). |
Only one of TCL_REG_EXTENDED, TCL_REG_ADVANCED, |
TCL_REG_BASIC, and TCL_REG_QUOTE may be specified. |
Tcl_RegExpExecObj executes the regular expression pattern |
matcher. It returns 1 if objPtr contains a range of char- |
acters that match regexp, 0 if no match is found, and -1 |
if an error occurs. In the case of an error, Tcl_RegEx- |
pExecObj leaves an error message in the interpreter |
result. The nmatches value indicates to the matcher how |
many subexpressions are of interest. If nmatches is 0, |
then no subexpression match information is recorded, which |
may allow the matcher to make various optimizations. If |
the value is -1, then all of the subexpressions in the |
pattern are remembered. If the value is a positive inte- |
ger, then only that number of subexpressions will be |
remembered. Matching begins at the specified Unicode |
character index given by offset. Unlike Tcl_RegExpExec, |
the behavior of anchors is not affected by the offset |
value. Instead the behavior of the anchors is explicitly |
controlled by the eflags argument, which is a bitwise OR |
of zero or more of the following flags: |
TCL_REG_NOT- |
BOL | |
The starting character will not be treated as the |
beginning of a line or the beginning of the |
string, so `^' will not match there. Note that |
this flag has no effect on how `\A' matches. |
TCL_REG_NOTEOL ||
The last character in the string will not be |
treated as the end of a line or the end of the |
string, so '$' will not match there. Note that |
this flag has no effect on how `\Z' matches. |
Tcl_RegExpGetInfo retrieves information about the last |
match performed with a given regular expression regexp. |
The infoPtr argument contains a pointer to a structure |
that is defined as follows: |
typedef struct Tcl_RegExpInfo { |
int nsubs; |
Tcl_RegExpIndices *matches; |
long extendStart; |
} Tcl_RegExpInfo; |
The nsubs field contains a count of the number of paren- |
thesized subexpressions within the regular expression. If |
the TCL_REG_NOSUB was used, then this value will be zero. |
The matches field points to an array of nsubs values that |
indicate the bounds of each subexpression matched. The |
first element in the array refers to the range matched by |
the entire regular expression, and subsequent elements |
refer to the parenthesized subexpressions in the order |
that they appear in the pattern. Each element is a struc- |
ture that is defined as follows: |
typedef struct Tcl_RegExpIndices { |
long start; |
long end; |
} Tcl_RegExpIndices; |
The start and end values are Unicode character indices |
relative to the offset location within objPtr where match- |
ing began. The start index identifies the first character |
of the matched subexpression. The end index identifies |
the first character after the matched subexpression. If |
the subexpression matched the empty string, then start and |
end will be equal. If the subexpression did not |
participate in the match, then start and end will be set |
to -1. |
The extendStart field in Tcl_RegExpInfo is only set if the |
TCL_REG_CANMATCH flag was used. It indicates the first |
character in the string where a match could occur. If a |
match was found, this will be the same as the beginning of |
the current match. If no match was found, then it indi- |
cates the earliest point at which a match might occur if |
additional text is appended to the string. If it is no |
match is possible even with further text, this field will |
be set to -1.
SEE ALSO
re_syntax(n)
KEYWORDS
match, pattern, regular expression, string, subexpression,
Tcl_RegExpIndices, Tcl_RegExpInfo
Tcl 8.1 Tcl_RegExpMatch(3)