Index of Section 3 Manual Pages

Interix / SUApcrecompat.3Interix / SUA

PCRE(3)                                                   PCRE(3)



NAME
       PCRE - Perl-compatible regular expressions

DIFFERENCES FROM PERL

       This  document  describes the differences in the ways that
       PCRE and Perl handle regular expressions. The  differences
       described here are with respect to Perl 5.8.

       1.  PCRE does not have full UTF-8 support. Details of what
       it does have are given in the section on UTF-8 support  in
       the main pcre page.

       2.  PCRE  does  not  allow repeat quantifiers on lookahead
       assertions. Perl permits them, but they do not  mean  what
       you  might  think.  For  example, (?!a){3} does not assert
       that the next  three  characters  are  not  "a".  It  just
       asserts that the next character is not "a" three times.

       3. Capturing subpatterns that occur inside negative looka-
       head assertions are counted, but their entries in the off-
       sets  vector  are never set. Perl sets its numerical vari-
       ables from any such patterns that are matched  before  the
       assertion  fails  to match something (thereby succeeding),
       but only if the negative lookahead assertion contains just
       one branch.

       4. Though binary zero characters are supported in the sub-
       ject string, they are not  allowed  in  a  pattern  string
       because  it  is passed as a normal C string, terminated by
       zero. The escape sequence "\0" can be used in the  pattern
       to represent a binary zero.

       5.  The following Perl escape sequences are not supported:
       \l, \u, \L, \U, \P, \p, \N, and  \X.  In  fact  these  are
       implemented  by Perl's general string-handling and are not
       part of its pattern matching engine. If any of  these  are
       encountered by PCRE, an error is generated.

       6.  PCRE  does support the \Q...\E escape for quoting sub-
       strings. Characters in between are  treated  as  literals.
       This  is  slightly different from Perl in that $ and @ are
       also handled as literals inside the quotes. In Perl,  they
       cause  variable interpolation (but of course PCRE does not
       have variables). Note the following examples:

           Pattern            PCRE matches      Perl matches

           \Qabc$xyz\E        abc$xyz           abc  followed  by
       the
                                                  contents     of
       $xyz
           \Qabc\$xyz\E       abc\$xyz          abc\$xyz
           \Qabc\E\$\Qxyz\E   abc$xyz           abc$xyz

       The \Q...\E sequence is recognized both inside and outside
       character classes.

       7.  Fairly  obviously, PCRE does not support the (?{code})
       and  (?p{code})  constructions.  However,  there  is  some
       experimental support for recursive patterns using the non-
       Perl items (?R), (?number) and (?P>name). Also,  the  PCRE
       "callout" feature allows an external function to be called
       during pattern matching.

       8. There are some differences that are concerned with  the
       settings  of  captured  strings  when part of a pattern is
       repeated. For example, matching "aba" against the  pattern
       /^(a(b)?)+$/  in  Perl  leaves $2 unset, but in PCRE it is
       set to "b".

       9. PCRE provides  some  extensions  to  the  Perl  regular
       expression facilities:

       (a) Although lookbehind assertions must match fixed length
       strings, each alternative branch of a lookbehind assertion
       can match a different length of string. Perl requires them
       all to have the same length.

       (b) If PCRE_DOLLAR_ENDONLY is set  and  PCRE_MULTILINE  is
       not set, the $ meta-character matches only at the very end
       of the string.

       (c) If PCRE_EXTRA is set, a backslash followed by a letter
       with no special meaning is faulted.

       (d) If PCRE_UNGREEDY is set, the greediness of the repeti-
       tion quantifiers is inverted, that is, by default they are
       not greedy, but if followed by a question mark they are.

       (e)  PCRE_ANCHORED  can  be  used to force a pattern to be
       tried only at the first matching position in  the  subject
       string.

       (f)   The  PCRE_NOTBOL,  PCRE_NOTEOL,  PCRE_NOTEMPTY,  and
       PCRE_NO_AUTO_CAPTURE options for pcre_exec() have no  Perl
       equivalents.

       (g)  The  (?R), (?number), and (?P>name) constructs allows
       for recursive pattern matching (Perl can do this using the
       (?p{code}) construct, which PCRE cannot support.)

       (h)  PCRE  supports  named capturing substrings, using the
       Python syntax.

       (i) PCRE supports the possessive quantifier  "++"  syntax,
       taken from Sun's Java package.

       (j)  The  (R)  condition, for testing recursion, is a PCRE
       extension.

       (k) The callout facility is PCRE-specific.

Last updated: 09 December 2003
Copyright (c) 1997-2003 University of Cambridge.



                                                          PCRE(3)

Interix / SUAHosted at SUA Community for Interix, SUA and SFUInterix / SUA