_uvchr>, but is used for UTF-8
encoded strings. Each call classifies one character, even if the string
contains many. This variant takes two parameters. The first, C, is a
pointer to the first byte of the character to be classified. (Recall that it
may take more than one byte to represent a character in UTF-8 strings.) The
second parameter, C, points to anywhere in the string beyond the first
character, up to one byte past the end of the entire string. The suffix
C<_safe> in the function's name indicates that it will not attempt to read
beyond S>, provided that the constraint S e>> is true (this
is asserted for in C<-DDEBUGGING> builds). If the UTF-8 for the input
character is malformed in some way, the program may croak, or the function may
return FALSE, at the discretion of the implementation, and subject to change in
future releases.
Variant C_utf8> is like C_utf8_safe>, but takes just a single
parameter, C, which has the same meaning as the corresponding parameter does
in C_utf8_safe>. The function therefore can't check if it is reading
beyond the end of the string. Starting in Perl v5.30, it will take a second
parameter, becoming a synonym for C_utf8_safe>. At that time every
program that uses it will have to be changed to successfully compile. In the
meantime, the first runtime call to C_utf8> from each call point in the
program will raise a deprecation warning, enabled by default. You can convert
your program now to use C_utf8_safe>, and avoid the warnings, and get an
extra measure of protection, or you can wait until v5.30, when you'll be forced
to add the C parameter.
Variant C_LC> is like the C_A> and C_L1> variants, but the
result is based on the current locale, which is what C in the name stands
for. If Perl can determine that the current locale is a UTF-8 locale, it uses
the published Unicode rules; otherwise, it uses the C library function that
gives the named classification. For example, C when not in a
UTF-8 locale returns the result of calling C. FALSE is always
returned if the input won't fit into an octet. On some platforms where the C
library function is known to be defective, Perl changes its result to follow
the POSIX standard's rules.
Variant C_LC_uvchr> is like C_LC>, but is defined on any UV. It
returns the same as C_LC> for input code points less than 256, and
returns the hard-coded, not-affected-by-locale, Unicode results for larger ones.
Variant C_LC_utf8_safe> is like C_LC_uvchr>, but is used for UTF-8
encoded strings. Each call classifies one character, even if the string
contains many. This variant takes two parameters. The first, C, is a
pointer to the first byte of the character to be classified. (Recall that it
may take more than one byte to represent a character in UTF-8 strings.) The
second parameter, C, points to anywhere in the string beyond the first
character, up to one byte past the end of the entire string. The suffix
C<_safe> in the function's name indicates that it will not attempt to read
beyond S>, provided that the constraint S e>> is true (this
is asserted for in C<-DDEBUGGING> builds). If the UTF-8 for the input
character is malformed in some way, the program may croak, or the function may
return FALSE, at the discretion of the implementation, and subject to change in
future releases.
Variant C_LC_utf8> is like C_LC_utf8_safe>, but takes just a single
parameter, C, which has the same meaning as the corresponding parameter does
in C_LC_utf8_safe>. The function therefore can't check if it is reading
beyond the end of the string. Starting in Perl v5.30, it will take a second
parameter, becoming a synonym for C_LC_utf8_safe>. At that time every
program that uses it will have to be changed to successfully compile. In the
meantime, the first runtime call to C_LC_utf8> from each call point in
the program will raise a deprecation warning, enabled by default. You can
convert your program now to use C_LC_utf8_safe>, and avoid the warnings,
and get an extra measure of protection, or you can wait until v5.30, when
you'll be forced to add the C parameter.
=for apidoc Am|bool|isALPHA|char ch
Returns a boolean indicating whether the specified character is an
alphabetic character, analogous to C.
See the L for an explanation of
variants
C, C, C, C,
C, C, and C.
=for apidoc Am|bool|isALPHANUMERIC|char ch
Returns a boolean indicating whether the specified character is a either an
alphabetic character or decimal digit, analogous to C.
See the L for an explanation of
variants
C, C, C,
C, C, C,
and C.
=for apidoc Am|bool|isASCII|char ch
Returns a boolean indicating whether the specified character is one of the 128
characters in the ASCII character set, analogous to C.
On non-ASCII platforms, it returns TRUE iff this
character corresponds to an ASCII character. Variants C and
C are identical to C.
See the L for an explanation of
variants
C, C, C, C, and
C. Note, however, that some platforms do not have the C
library routine C. In these cases, the variants whose names contain
C are the same as the corresponding ones without.
Also note, that because all ASCII characters are UTF-8 invariant (meaning they
have the exact same representation (always a single byte) whether encoded in
UTF-8 or not), C will give the correct results when called with any
byte in any string encoded or not in UTF-8. And similarly C
will work properly on any string encoded or not in UTF-8.
=for apidoc Am|bool|isBLANK|char ch
Returns a boolean indicating whether the specified character is a
character considered to be a blank, analogous to C.
See the L for an explanation of
variants
C, C, C, C,
C, C, and C. Note,
however, that some platforms do not have the C library routine
C. In these cases, the variants whose names contain C are
the same as the corresponding ones without.
=for apidoc Am|bool|isCNTRL|char ch
Returns a boolean indicating whether the specified character is a
control character, analogous to C.
See the L for an explanation of
variants
C, C, C, C,
C, C, and C On EBCDIC
platforms, you almost always want to use the C variant.
=for apidoc Am|bool|isDIGIT|char ch
Returns a boolean indicating whether the specified character is a
digit, analogous to C.
Variants C and C are identical to C.
See the L for an explanation of
variants
C, C, C, C, and
C.
=for apidoc Am|bool|isGRAPH|char ch
Returns a boolean indicating whether the specified character is a
graphic character, analogous to C.
See the L for an explanation of
variants C, C, C, C,
C, C, and C.
=for apidoc Am|bool|isLOWER|char ch
Returns a boolean indicating whether the specified character is a
lowercase character, analogous to C.
See the L for an explanation of
variants
C, C, C, C,
C, C, and C.
=for apidoc Am|bool|isOCTAL|char ch
Returns a boolean indicating whether the specified character is an
octal digit, [0-7].
The only two variants are C and C; each is identical to
C.
=for apidoc Am|bool|isPUNCT|char ch
Returns a boolean indicating whether the specified character is a
punctuation character, analogous to C.
Note that the definition of what is punctuation isn't as
straightforward as one might desire. See L for details.
See the L for an explanation of
variants C, C, C, C,
C, C, and C.
=for apidoc Am|bool|isSPACE|char ch
Returns a boolean indicating whether the specified character is a
whitespace character. This is analogous
to what C matches in a regular expression. Starting in Perl 5.18
this also matches what C does. Prior to 5.18, only the
locale forms of this macro (the ones with C in their names) matched
precisely what C does. In those releases, the only difference,
in the non-locale variants, was that C did not match a vertical tab.
(See L for a macro that matches a vertical tab in all releases.)
See the L for an explanation of
variants
C, C, C, C,
C, C, and C.
=for apidoc Am|bool|isPSXSPC|char ch
(short for Posix Space)
Starting in 5.18, this is identical in all its forms to the
corresponding C macros.
The locale forms of this macro are identical to their corresponding
C forms in all Perl releases. In releases prior to 5.18, the
non-locale forms differ from their C forms only in that the
C forms don't match a Vertical Tab, and the C forms do.
Otherwise they are identical. Thus this macro is analogous to what
C matches in a regular expression.
See the L for an explanation of
variants C, C, C, C,
C, C, and C.
=for apidoc Am|bool|isUPPER|char ch
Returns a boolean indicating whether the specified character is an
uppercase character, analogous to C.
See the L for an explanation of
variants C, C, C, C,
C, C, and C.
=for apidoc Am|bool|isPRINT|char ch
Returns a boolean indicating whether the specified character is a
printable character, analogous to C.
See the L for an explanation of
variants
C, C, C, C,
C, C, and C.
=for apidoc Am|bool|isWORDCHAR|char ch
Returns a boolean indicating whether the specified character is a character
that is a word character, analogous to what C and C match
in a regular expression. A word character is an alphabetic character, a
decimal digit, a connecting punctuation character (such as an underscore), or
a "mark" character that attaches to one of those (like some sort of accent).
C is a synonym provided for backward compatibility, even though a
word character includes more than the standard C language meaning of
alphanumeric.
See the L for an explanation of
variants C, C, C, and
C. C, C, and
C are also as described there, but additionally
include the platform's native underscore.
=for apidoc Am|bool|isXDIGIT|char ch
Returns a boolean indicating whether the specified character is a hexadecimal
digit. In the ASCII range these are C<[0-9A-Fa-f]>. Variants C
and C are identical to C.
See the L for an explanation of
variants
C, C, C, C,
and C.
=for apidoc Am|bool|isIDFIRST|char ch
Returns a boolean indicating whether the specified character can be the first
character of an identifier. This is very close to, but not quite the same as
the official Unicode property C. The difference is that this
returns true only if the input character also matches L.
See the L for an explanation of
variants
C, C, C, C,
C, C, and C.
=for apidoc Am|bool|isIDCONT|char ch
Returns a boolean indicating whether the specified character can be the
second or succeeding character of an identifier. This is very close to, but
not quite the same as the official Unicode property C. The
difference is that this returns true only if the input character also matches
L. See the L for
an
explanation of variants C, C, C,
C, C